Medallion architecture (Bronze, Silver, Gold, Copper)

Important

This article is primarily for informational purposes. As a downstream consumer of data you will mostly work with tables in the Gold (and sometimes Bronze) layers, so you don’t need to understand the architecture in great detail. The goal here is simply to familiarise you with the terminology — Bronze, Silver, Gold, Copper — so that when these terms come up in conversations, documentation, or table names, you have a clear mental model of what they mean and where your data will sit in the pipeline.

Medallion architecture is a data design pattern used to organise data in a lakehouse. It structures your data into three progressive layers — Bronze, Silver, and Gold — each representing a higher level of quality and refinement.

Note

Note that Copper is not mentioned above. This is because the Copper layer is Department for Education specific. Further details are provided in the section below Copper Layer – Migrated Analytical Modelling Areas (DfE-Specific).

The core idea is simple: raw data flows in at one end and clean, business-ready data comes out the other.

  Source Systems  ──▶  Bronze (Raw)  ──▶  Silver (Cleaned)  ──▶  Gold (Business)

This layered approach gives you reproducibility (you always keep the raw data), traceability (you can track how data was transformed), and flexibility (different teams can work at different layers).

Bronze Layer – Raw Ingestion

Aspect	Detail
Purpose	Land data exactly as it arrives from source systems
Format	Raw, unvalidated, append-only
Typical content	JSON payloads, CSV dumps, CDC streams, API responses
Schema	Matches the source; may include metadata columns (ingestion timestamp, source file name)

Key principles:

No transformations — store the data as-is so you have a complete audit trail.
Append-only — never overwrite; this protects you if upstream schemas change or bad data arrives.
Metadata enrichment — add columns like _ingested_at, _source_file, or _batch_id to aid debugging.

Example use case: A daily extract from a school attendance records system lands as CSV files. The Bronze table stores every row from every file, including duplicates, with a timestamp of when each batch was loaded.

Silver Layer – Cleaned & Conformed

Aspect	Detail
Purpose	Clean, deduplicate, validate, and conform the data
Format	Structured, typed, quality-checked
Typical content	Deduplicated records, joined reference data, standardised column names
Schema	Enforced and consistent; data types are correct

Key principles:

Deduplication — remove exact duplicates and apply business logic to resolve conflicts.
Data quality checks — filter or flag records that fail validation rules (nulls in required fields, values out of range, etc.).
Conformance — standardise naming conventions, date formats, code lookups, and units of measure.
Joins — enrich records by joining with reference/dimension tables (e.g. mapping provider codes to provider names).

Example use case: The raw student records from Bronze are deduplicated on student ID + academic year, date columns are cast to proper DATE types, and provider codes are joined to a reference table to add provider names.

Gold Layer – Business-Ready

Aspect	Detail
Purpose	Serve curated, aggregated / derived, business-level datasets
Format	Aggregated, or with additional derived fields, modelled, optimised for consumption
Typical content	KPI tables, summary statistics, dimensional models, feature stores, analytical layer datasets
Schema	Tailored to specific business questions or reporting needs

Key principles:

Business logic lives here — aggregations, calculated metrics, and business rules are applied at this layer.
Consumption-optimised — tables are structured for statistical analysis, dashboards, reports, ML models, or APIs.
Multiple Gold tables from one Silver table — different teams may create different Gold views of the same underlying data.

Example use case: A Gold table aggregates student completions by provider, qualification level, and academic year — ready to power a dashboard or feed into a statistical publication.

Copper Layer – Migrated Analytical Modelling Areas (DfE-Specific)

This layer is specific to the Department for Education and is not part of the standard medallion architecture.

Aspect	Detail
Purpose	Host pre-existing analytical modelling areas that have been migrated to Databricks during onboarding
Format	As-is from the original source environment; structure and conventions vary
Typical content	Legacy analytical models, derived datasets, and reporting tables migrated from previous platforms
Schema	Inherited from the original modelling area; not yet conformed to medallion standards

Why does Copper exist?

When the Department onboarded analytical teams to Databricks, there was a practical need to make existing data and processes available quickly — without waiting for everything to be fully remodelled into the Bronze → Silver → Gold pipeline. The Copper layer serves this transitional purpose.

Key principles:

Continuity — analysts can continue working with familiar datasets and models while the migration to Databricks is underway.
Transitional by design — Copper is not intended to be a permanent layer. The longer-term Departmental goal is to remodel the data within Copper into the standard medallion architecture.
Pragmatic onboarding — getting teams onto the platform quickly provides immediate benefits (collaboration, compute, governance) even before full data remodelling is complete.

What does this mean for analysts?

If you are working with tables in a Copper schema, you are using data that has been migrated from a legacy environment. It will work as expected, but over time these tables will be progressively replaced by properly modelled Bronze → Silver → Gold equivalents.

The first example of this is the Longitudinal ILR which takes 20+ years of further education learner and aims data and creates a ‘Learners’ and ‘Aims’ table which are specifically designed to enable easy analysis of the data across time and with standardised output-ready labels for fields ready to be dropped into files for Explore education statistics.

When the data you’re working with is transitioned to medallion architecture, you may need to update your queries to point to the new Gold layer tables. This is a long term and incremental project and you as analysts will be heavily involved in the process due to your subject matter expertise when your datasets are re-modelled.

Putting It All Together

	Bronze	Silver	Gold	Copper
Data quality	Raw / as-is	Cleaned / validated	Business-ready	Varies (as migrated)
Audience	Data engineers	Data engineers / analysts	Analysts / stakeholders	Analysts (transitional)
Update pattern	Append-only	Merge / upsert	Rebuild / incremental	Inherited from source
Longevity	Permanent	Permanent	Permanent	Transitional

Why use it?

Recoverability — if a transformation has a bug, you still have the raw Bronze data to reprocess from.
Modularity — each layer has a clear responsibility, making pipelines easier to maintain and debug.
Governance — you can apply different access controls at each layer (e.g. restrict Bronze to engineers, open Gold to analysts).
Incremental processing — each layer can process only new/changed data rather than reprocessing everything.

In Databricks

Medallion architecture maps naturally onto Delta Lake tables organised into Unity Catalog schemas:

  catalog.bronze.raw_student_records
  catalog.silver.cleaned_student_records
  catalog.gold.student_completions_summary
  catalog.copper.legacy_modelling_area        ← migrated, transitional

This pattern is a guideline, not a rigid rule. Some organisations add additional layers (e.g. a “landing” zone before Bronze, or a “platinum” layer for ML features). The important thing is the principle: progressively refine your data from raw to business-ready.