RAP for general analysis
Guidance for how to implement the principles of Reproducible Analytical Pipelines (RAP) into general analysis processes
This page sets out the RAP for general analysis principles. For Official Statistics, we have the RAP for statistics guidance which covers how RAP can be applied to Official Statistics production. The same 15 core principles apply across both RAP for statistics and RAP for general analysis, with four additional principles specific to statistics production.
If you’re unfamiliar with RAP, or want a refresher on what RAP is and why it matters, take a look at our getting started with RAP page. It covers the background and benefits of RAP, as well as an introduction to the RAP principles.
Applying RAP principles proportionally
RAP should be proportional to the analysis in the same way that QA is proportional.
Applying RAP principles proportionately means tailoring your approach to the scale, complexity and potential impact of the analysis, just as you would with QA. Not every piece of work will require a fully automated pipeline, but all analysis should aim to be reproducible where possible.
When deciding what RAP principles to apply, consider factors like who will use the analysis, how it will be used, the level of public or ministerial interest and the potential risks involved. Use the guidance below as a starting point, but apply judgement based on the specific context of your work, as you would when determining proportionate QA.
For example, whilst you might not create a full RAP process for an ad hoc piece of work, you could still version control your code so that it could be reused if similar requests came in, and you should get your code peer reviewed by someone before sending out any results.
Getting started with RAP for analysis
To get started with RAP for analysis, explore the RAP principles in the diagram below, which outlines what good, great and best practice looks like.
Once you are familiar with the principles, take a look at the different types of analysis commonly carried out across the department. This will help you understand which RAP principles are recommended for your type of analysis.
Implementing RAP may involve combining the use of Databricks, R and using clear, consistent version control to increase efficiency and accuracy in our work. For more information on what these tools are, why we are using them, and resources to help upskill in those areas, see our learning resources page.
Core principles
RAP has three core principles:
Preparing data: Data sources for a publication are stored in the same database
Writing code: Underlying data files are produced using code, with no manual steps
Version control: Files and scripts should be appropriately version controlled
Within each of these principles are separate elements of RAP. Each of these is discussed in detail below so that you know what is expected of you as an analyst.
RAP for general analysis principles
The diagram below highlights what RAP means for us, and the varying levels at which it can be applied across all types of analysis. You can click on each of the hexagons in the diagram to learn more about each of the RAP principles and how to use them in practice.
Choosing the right RAP principles for your analysis
Whilst all RAP principles can be applied to any type of analysis, we recognise that different types of analysis conducted across the department may benefit from a tailored approach. To support this, we’ve outlined a selection of common analysis types below, along with the RAP principles we suggest as recommended for each. This guidance is intended to help teams prioritise their efforts and embed RAP in a way that is both practical and impactful.
Analysis that is performed outside of regular reporting processes, often to answer a specific need or question, but is unlikely to be requested again in any form.
We recommend the following principles for this type of analysis. Analysts should also consider the full set of principles for analysis and apply any others that are appropriate, useful, and proportionate to their project.
Recommended principles
- Source data is acquired and stored sensibly
- Sensible folder and file structure
- Processing is done with code, where possible
- Use appropriate tools e.g., R, SQL, Python
- Documentation
- Peer review of code
- Clean final code
Analysis that is performed outside of regular reporting processes, often to answer a specific need or question and is likely to be requested again in the same or a similar form.
We recommend the following principles for this type of analysis. Analysts should also consider the full set of principles for analysis and apply any others that are appropriate, useful, and proportionate to their project.
You can see an example case study of RAP principles being applied to this type of analysis on the RAP case studies page.
Recommended principles
- Source data is acquired and stored sensibly
- Sensible folder and file structure
- Processing is done with code, where possible
- Use appropriate tools e.g., R, SQL, Python
- Automated high-level checks
- Documentation
- Recyclable code for future use
- Peer review of code
- Project specific automated sense checks
- Version controlled final code scripts
- Clean final code
Recurring analysis produced within the department to support ongoing operational needs. It often follows a consistent format and schedule, and is used to monitor key internal activities.
We recommend the following principles for this type of analysis. Analysts should also consider the full set of principles for analysis and apply any others that are appropriate, useful, and proportionate to their project.
You can see an example case study of RAP principles being applied to this type of analysis on the RAP case studies page.
Recommended principles
- Source data is acquired and stored sensibly
- Sensible folder and file structure
- Processing is done with code, where possible
- Use appropriate tools e.g., R, SQL, Python
- Automated high-level checks
- Documentation
- Recyclable code for future use
- Peer review of code
- Project specific automated sense checks
- Version controlled final code scripts
- Whole pipeline can be run from a single script or workflow
- Clean final code
- Automated reproducible reports
- Collaboratively develop code using Git
Analysis which must be quickly conducted and is often not pre-planned, aimed at delivering timely insights to support fast decision making.
We recommend the following principles for this type of analysis. Analysts should also consider the full set of principles for analysis and apply any others that are appropriate, useful, and proportionate to their project.
You can see an example case study of RAP principles being applied to this type of analysis on the RAP case studies page.
Recommended principles
- Source data is acquired and stored sensibly
- Sensible folder and file structure
- Processing is done with code, where possible
- Use appropriate tools e.g., R, SQL, Python
- Documentation
- Peer review of code
This refers to the use of a consistent analytical approach applied to multiple datasets. Although the method remains the same, the data inputs vary e.g., by time period, breakdown or variable.
We recommend the following principles for this type of analysis. Analysts should also consider the full set of principles for analysis and apply any others that are appropriate, useful, and proportionate to their project.
Recommended principles
- Source data is acquired and stored sensibly
- Sensible folder and file structure
- Processing is done with code, where possible
- Use appropriate tools e.g., R, SQL, Python
- Automated high-level checks
- Documentation
- Recyclable code for future use
- Peer review of code
- Project specific automated sense checks
- Version controlled final code scripts
- Whole pipeline can be run from a single script or workflow
- Clean final code
- Collaboratively develop code using Git
The collection and routine checking of data as it is coming into the department is an area that RAP can be applied to. However, the levels of control in this area vary drastically from team to team. If you would like advice and help to automate any particular processes, feel free to contact the Statistics Development Team.
We recommend the following principles for this type of analysis. Analysts should also consider the full set of principles for analysis and apply any others that are appropriate, useful, and proportionate to their project.
Recommended principles
- Source data is acquired and stored sensibly
- Sensible folder and file structure
- Processing is done with code, where possible
- Use appropriate tools e.g., R, SQL, Python
- Automated high-level checks
- Documentation
- Recyclable code for future use
- Peer review of code
- Version controlled final code scripts
- Whole pipeline can be run from a single script or workflow
- Clean final code
- Collaboratively develop code using Git
Apply the principles
Now that you’ve prioritised the area of your workflow to focus on, the next step is to apply the relevant RAP principles to that part of your process.
The next section is split by the three core principles, preparing data, writing code and version control.
Within each of these principles are separate elements of RAP. Each of these is discussed in detail so that you know what is expected of you as an analyst.
How does RAP fit into ADA / Databricks
Teams can use data held on the Databricks platform to implement RAP principles like version control, automated QA and automated pipelines. If you have an existing pipeline, please see the ADA guidance on statistics publications to see how migrating to ADA and Databricks will affect your work.
RAP Principles adaptable to spreadsheets
Spreadsheets can incorporate several RAP-aligned practices:
Source data is acquired and stored sensibly: Ensure input data is stored securely and referenced clearly, ideally in a separate tab or file.
Sensible folder and file structure: Keep raw data, calculations, and outputs clearly separated and consistently organised.
Automated high-level checks: Include high-level checks such as totals, flags, or conditional formatting to catch errors early.
Documentation : Use cell comments, notes and separate README files to explain the purpose, logic, and structure of the spreadsheet.
Peer review: Ask a colleague to review the spreadsheet’s logic, formulas, and outputs—especially for repeated or high-impact analysis.
Supporting the transition to RAP
For repeatable, cross-departmental, or high-risk analysis, spreadsheets should ideally be replaced with code-based RAP workflows. Hybrid approaches such as using code to generate inputs or to perform calculations further into the process can be a useful stepping stone.