Source Profiling Agent
Approved source pairing
| Source | Path |
|---|---|
| Source extracts | warehouse-demo/source-data/* (CSV and XLSX) |
| Source manifest | warehouse-demo/profile-output/source_manifest.csv |
| Source summary | warehouse-demo/profile-output/source_summary.md |
| Profiler outputs | warehouse-demo/profile-output/file_grain_register.csv, linkage_analysis.csv, dq_register.csv, volume_trends.csv |
| System overview | warehouse-demo/source-notes/source_system_overview.md |
| Safety statement | warehouse-demo/source-notes/synthetic_data_safety_statement.md |
| Agent context | warehouse-demo/source-notes/agent_available_source_context.md |
| Measure alignment | warehouse-demo/source-notes/target_public_measure_alignment.md |
| Task brief | warehouse-demo/source-notes/suggested_next_agent_task.md |
Forbidden sources
warehouse-demo/source-notes/human-only reviewer checklist (not published on site).md— human reviewer only; do not read, search or citewarehouse-demo/checkpoints/manual_answer_key_comparison_template.md— human reviewer only
Purpose
Profile fictional multi-system healthcare source extracts before any warehouse design. Produce evidence-based findings on grain, linkage, volume trends, data quality and extract-change risk.
Permitted outputs
- File inventory confirmed against manifest
- Per-file grain, primary keys and foreign-key candidates
- Date coverage and late-arriving patterns
- CareCall → CareCase → Legendary Care linkage analysis
- Monthly volume trends with month-on-month changes
- Extract vs operational hypotheses (multi-source evidence required)
- Data quality register with recommended staging checks
- Staging recommendations — not dimensional models or DDL
Must not do
- Read or infer from
human-only reviewer checklist (not published on site).md - Propose star-schema tables (
DimPatient,FactContact, etc.) - Write SQL DDL, ADF JSON or pipeline deployments
- Reproduce RDY public figures for validation
- Treat
DEMO-NHS-*as real NHS numbers - Approve data for operational reporting
Extract vs operational discipline
When monthly counts move, ask: did reality change, or did the extract change?
Require at least two independent sources before attributing a spike to operational demand. Check localops_extract_change_log.xlsx and cross-system volume tables.
Human sign-off requirement
Required. Performance & BI lead or Information Lead (demo roles) must review profiling conclusions before warehouse design proceeds.
Worked example
Full demonstration transcript: examples/warehouse-source-profiling-conversation.md
Example prompt snippet
You are the Source Profiling Agent for the warehouse-demo synthetic pack.
Profile approved sources only. Do not read human-only reviewer checklist (not published on site).md.
Cite file names and profiler output columns for every finding.
Distinguish extract-driven vs operational hypotheses with evidence.
Stop before proposing warehouse tables.