Analytical Data Access (ADA)

Guidance for analysts on how to interact with and use data stored in ADA using Databricks


Important

Please be aware that the Databricks platform is regularly updated and may look different from the guidance included on this site. If you notice any discrepancies between the content on this site and the Databricks platform, please let us know by contacting statistics.development@education.gov.uk.

What is the ADA project?

The Analytical Data Access (ADA) service has been created to support analysts, engineers and policy team members.

It brings together:

  • a searchable data catalogue - the Data Discovery Platform (DDP)
  • a workbench of tools for analysing data - including Databricks
  • a form for requesting access to data
  • a support section - with walkthrough instructions
  • a collection of data reports and dashboards
  • a news and updates area

As long as you’re connected to the network - it’s available to anyone who works for the Department of Education.

Tip

There is an Analytical Data Access jargon buster to help you understand different ADA and Databricks terms.

The ADA project is currently in its onboarding phase. The ADA team will work with analyst teams to plan migrations. We encourage teams to engage early so that you have time to migrate your work and receive support from the project team.

Data migration will replace three legacy systems:

  • Pupil Data Repository (PDR)
  • Analysis and Modelling (A&M)
  • Enterprise Data and Analysis Platform (EDAP1)

Decommissioning of legacy servers will be completed in 2026.

See these Databricks support notebooks for more detailed reference guides into how the service has been set up at DfE, including more information on workspaces, environments, notebooks, clusters, catalogs etc. Please note that you will need a Databricks account to access these.

From the ADA homepage you will be able to find data and access cloud analytical tools. This includes:

  • Navigation to the Data Discovery Platform to find information regarding metadata of the datasets, including how to request access and information about the quality of the data. The catalogue will eventually include all Department data, as well as tagging of datasets available through ADA.

  • An analysis workbench with access to cloud computing and Databricks. This will be extended to include POSIT workbench, which will allow RStudio to be used in the cloud.

  • A repository of Reports and Dashboards where outputs of data and analytical work will be saved. This will allow colleagues across DfE to interrogate and visualise data. In time this catalogue will grow and the ADA team are currently building out a formal strategy for this area.


ADA support


The ADA team has established an analyst Community of Practice that:

  • Increases the breadth and depth of knowledge about new tools

  • Builds confidence in using core functionality

  • Shares learning experiences and best practice

  • Identifies future opportunities to support the Department’s Strategic Data Transformation

You can access support in the following places:

  • The ADA user group on Microsoft Teams

  • The ADA website has a list of support resources

  • You can find a list of current ADA champions can be found on the ADA intranet page, and they are able to provide advice and support if you run into any issues. If you’re passionate about helping others with the migration and using innovative data tools, why not become an ADA champion? To get involved, please contact the ADA team


What does the ADA project mean for analysts?

Benefits of the ADA project and Databricks


Migration to Databricks offers a lot of potential benefits to analysts. These include:

  • Having all data together in one place and being able to access it via one interface, rather than split across separate areas requiring separate access permissions and pieces of software

  • If your scripts usually take a long time to run, cloud computing can speed up processing and reduce code running time. This is ideal for big data, scripts that require a lot of data transformation (e.g. complicated or large joins), and machine learning projects

  • If you regularly have to run the same code, access to workflows and code scheduling mean that you can set code to run at certain days or times to improve efficiency. Scheduled workflows will run even if your laptop is switched off

  • Better transparency of work being undertaken in the department by making use of the ADA shared reports area.


Making the most of Databricks


Note

The earlier your data is migrated, the more time you will have to dual run code from your existing methodology against code run on the Databricks platform, and the more time you have to influence the project and its offer for analysts. We encourage all analysts to engage with training and ask the ADA team about any questions or concerns as soon as possible so that you are prepared for any upcoming changes.

In order to take advantage of the benefits listed above, we recommend that you and your team:

  • Arrange data migration and access to Databricks.The earlier you move, the more time you will have to get familiar with the system and undertake training. Get in touch with the ADA team to discuss onboarding with them.

  • Take part in Databricks training for R and / or SQL. R training is currently offered by the ADA team on a monthly basis and SQL training is in the process of being developed. Upcoming training is advertised on the ADA user group Teams channel.

  • Be proactive in determining whether any changes will be necessary for your existing code and set time aside to make these changes - see the guidance in the what this means for existing code section below.

Important

For teams that have invested in RAP principles and already have existing code pipelines, it is possible to take advantage of cloud computing and use of data in the Databricks platform with minimal changes. This is covered in more detail in the what this means for existing code on the Databricks Fundamentals Page.


Back to top