RAP FAQs
This guidance is to help answer questions about RAP and act as a tool for anyone managing those implementing RAP.
Frequently asked questions
If you have any questions you’d like answering or adding, let us know by emailing statistics.development@education.gov.uk.
What is RAP
General questions about Reproducible Analytical Pipelines and why analysts need to care about it.
What is RAP, is it just using R?
No. Reproducible Analytical Pipelines (RAPs) are automated statistical and analytical processes. They incorporate elements of software engineering best practice to ensure that the pipelines are reproducible, auditable, efficient, and high quality.
Doing RAP is doing analysis using the best methods available to us, which is an expectation of the statistics code of practice.
The tools we recommend for statistics production RAP are SQL, Git and R. Other suitable alternatives that allow you to meet the core principles can also be used, but you should check this with the Statistics Development Team first. Ideally any tools used would be open source. Python is a good example of a tool that would also be well suited, though is less widely used in DfE and has a steeper learning curve than R.
Using R for parts of the process does get you close to a lot of the RAP principles with little effort, which is why we recommend it as one of the tools you should use.
More details and learning resources for the recommended tools can be found in our appropriate tools guidance.
Is this specific to DfE?
No – RAP is a cross government strategy, and all departments are expected to use this way of working. See the analytical function page on RAP to see other examples of departments utilising RAP and seeing the benefits.
Here is a great blog post from NHS digital on ‘Why we’re getting our data teams to RAP’.
RAP is also a strategic objective of Analysis Function strategy for 22-25:
delivering the Reproducible Analytical Pipelines (RAP) Strategy and action plan to embed RAP across government, ensuring our processes are automated, efficient, and high quality which frees up resource to add value and insight in our analysis.
I’m overwhelmed by all the steps, is RAP really necessary?
The levels within RAP good practice are in line with what a lot of teams are doing already, take time to understand what the different steps mean and don’t panic about the quantity of them as they’ve intentionally been broken down into the smallest level that was sensible. Ideally your work should meet as many of the RAP criteria as possible, but even the use of some RAP principles when you’re just starting out are better than none.
Statistics publications are some of the most important pieces of statistics work that the department does. It’s important we get that data right; RAP can help us automate the production and QA of outputs in a way that gives us the most confidence in the statistics and causes the least burden for you.
Will implementing RAP lead to a disconnect with the data and ‘black box’ processes?
Not at all. When fully implemented and accompanied with the relevant skills, RAP has the opposite effect as each stage of the process is clearly written as code.
We recognise there is a sizeable skill gap, and that until this is addressed there can be a risk of feeling like code is a black box. This isn’t a reason to avoid RAP though, instead it’s a big reason why we need to push ahead with prioritising upskilling ourselves to implement it. An element of RAP is having documentation for your analytical processes. Having good documentation will help to support your team as they are learning new skills, particularly when you have new starters who are taking on pieces of work for the first time.
Teams vary, and what my team does is different to others, if I’m happy with my approach, can I ignore some of the RAP steps?
No, you should not ignore any of the steps. If you think any of the steps aren’t applicable to you, talk to the Statistics Development Team so we can understand more about the processes in your area. It may well be that you’re meeting it without realising, there’s a misunderstanding, or there’s something more we can add to the guidance.
If you have unique or nuanced processes, RAP helps you document these and make your area more resilient to turnover of people. RAP is all about automating the process and documenting it in a standardised way to make it easily reproducible.
Getting started
Questions about how to get started with RAP.
Can I leave the R stuff to others in my team whilst I focus elsewhere?
No. Anyone undertaking any analytical processes should now be using RAP processes, and coding is a critical analytical skill. RAP is a cross-government strategy and there is an expectation in all departments for all analysts to move to this way of working.
We all must ensure that analysis is reproducible, transparent, and robust using coding and code management best practices (Analysis Function competency framework).
Implementing RAP takes time to setup initially, how can we prioritise it?
There is a clear expectation that this is the direction for analysis in government, ultimately, it’s a part of the new digital skills required for the role of analysts. If we’re not prioritising improving our analysis in line with RAP principles, we’re not approaching our work correctly.
We have support from senior leadership, and this is a department priority, so you should be building in time for it. If you are having difficulties prioritising RAP after talking with your line management chain, please get in touch with the Statistics Development Team.
In the long term, implementing RAP will significantly reduce the time it takes to run your pipeline, and so while it requires time upfront, it creates more time in the future to prioritise other things. It also improves the quality and reproducibility of our work, giving numerous business benefits.
Tools for RAP
Questions about what tools and software to use when applying RAP principles.
Do I need to rewrite existing SQL code into R?
No. In fact, we recommend you don’t necessarily do this!
SQL is a useful tool in its own right and some processes are better using SQL code executed on the databases.
What we recommend, is that all of your code is within a single R project, and the SQL scripts are executed via R, this helps with documenting the process, but keeps the SQL code and benefits of doing heavy processes on the database side. See the final question in this ‘Tools for RAP’ section for more information.
I’m sometimes limited by the access and tools I have (ESFA servers, EDAP, Databricks), is there anything that can be done about this?
The first step is to let us (the Statistics Development Team) know so we can understand the wider landscape and escalate. There isn’t really a quick fix, but the first step is raising awareness.
What happens if we can’t reproduce our current processes using R?
This is highly unlikely and is more likely to be from a lack of knowledge of what is possible in R – if you’re struggling to reproduce processes, please contact the Statistics Development Team so they’re aware of your processes and can help you implement RAP principles.
R isn’t always the quickest tool for processing data, can we use SQL instead?
Yes. We recommend different tools for different purposes, SQL should be used for querying source data and large data processes, R should be used for more complicated data manipulation, QA, and analysis. There’s a grey area in the middle where you can do some things in either tool, sometimes you’ll need to test for yourself which way is faster (e.g., linking datasets or transforming tables).
SQL scripts should be run in an automated way using R, instead of manually running them and copying the outputs. The difference we’re talking about here is whether you process data on the database server or on the machine where you’re running R. A simplistic rule of thumb is do large scale processing on the database side (e.g., filtering and aggregating), and then only bring the data you need for manipulation / complicated processing into R.
Implementing RAP
Questions on how to implement RAP.
Having a single script for code doesn’t seem the best way to do it, why are you suggesting this?
This is a misconception of our guidance that we will be clarifying and improving the phrasing of. The two steps this refers to are:
- Can the publication be reproduced using a single code script?
- Are data files produced via single code script(s) with integrated QA?
We’re not suggesting all code lives in a single script. These steps in the RAP guidance are to encourage the repository being structured so that one click of a button can run the entire process from the source data through to the final output in order.
This means that you should have one ‘centralized’ script that details all of the different scripts in your repository. This single run script then provides a single reference point for where all of your code and scripts live, and in what order they’re executed, which is a fantastic piece of documentation to have!
How do I dual run for QA if our process is in code?
You don’t need to be dual running if your process is automated, it’s not the best way to QA. See our guidance on QA for more details on how you can approach it or talk to us if you’re unsure.
Should I have separate repositories or branches in Git for each release?
More information on recommended ways of working with Git can be found in our guidance on repositories.
In short, a single repository should be used for all releases of your publication, there’s no need to have multiple as all the history is saved within previous commits.
We should have plain text documentation to accompany the process, code and comments don’t feel like enough?
Yes, you should have some plain text documentation, this is a part of the guidance on RAP and a part of the ‘documentation’ step to ‘good’ practice.
The README in your repository is the place for traditional ‘desk notes’ and text explanations of bits of the process and context required, we have guidance on writing READMEs on the statistics production website.
Learning about RAP
Questions learning more about RAP and developing skills.
Can I look at what other people are doing?
Absolutely. Currently the best way to do this is to ask other teams to share code directly with you. If we can all make more progress with using Git, then this will make it much easier to share repositories and have code open by default for interested analysts to browse.
We run a regular RAP Knowledge Share series in which we share good practice and examples of work. We record each session so that you can refer back to them at a later date if you want to. Previous recordings can be found in our RAP knowledge share Sharepoint folder. Upcoming sessions are advertised on the DfE analysts network mailing list.
We’d also encourage analysts to make more use of the RAP Community channel in the DfE Analyst Network Teams group to post questions and ask about other people doing similar types of analysis, no question is too big or too small!
I don’t have the skills to implement RAP, how do I get them?
See the top section on the support on offer!
Plus, if you’re ever unsure at all, you can always contact statisics.development@education.gov.uk who will be able to help you find resources that work for you or do bespoke training for your team’s needs.