Skip to contents

R can be used to execute SQL scripts to extract data from a database as well as querying the database directly via R. There are three primary ways to do this:

  1. executing a separate SQL script from R
  2. writing strings of SQL code in your R script
  3. using dbplyr to query a database using R code

Which you use will depend on how comfortable you are with SQL and R and if you already have existing SQL scripts that you want to execute or you’re writing new database queries. This vignette focuses on that first example, using the get_clean_sql() function to read in a separate SQL script and execute from R.

For more information on the other two methods, or on troubleshooting the connection between R and SQL in the Department for Education (DfE), please see the Interacting with a SQL database section of our Analysts’ Guide.

Pre-requisites

To connect to SQL you will need SQL drivers installed. These come as standard currently with the Microsoft SQL Server Management Studio downloads from Department for Education’s software center. Make sure to have downloaded the latest version before starting.

Connecting to a database

Usually in the DfE we use a combination of odbc and DBI to connect to our SQL databases.

How you connect will vary depending on whether you’re running R code on your laptop, or as a part of a deployed R Shiny app. For running code on your laptop you can automatically use your Windows login (a trusted connection) to grant you access to the database, as the package can automatically detect your user details.

# Library calls ====

library(odbc)
library(DBI)

# Database connection ====

con <- DBI::dbConnect(odbc::odbc(),
  Driver = "ODBC Driver 17 for SQL Server",
  Server = "server_name",
  Database = "database_name",
  UID = "",
  PWD = "",
  Trusted_Connection = "Yes"
)

For advice on finding your database details, or connecting to a SQL database from an R Shiny app that is deployed on a server, please contact the Statistics Development Team who will be able to advise on the setup and steps required.

Note that your server name may include backslashes, if it does you’ll need to make sure to have two backslashes in your R code due to the way that special characters are escaped.

For example, in SQL Server Management Studio (SSMS) or Azure Data Studio you might call your server T1PRANMSQL\SQLPROD,60125, but in R when adding the server name you’d need to call it T1PRANMSQL\\SQLPROD,60125.

Reading a SQL script into R

There are a number of standard characters found in SQL scripts that can cause issues when reading in a SQL script within R and we have created the get_clean_sql() function to assist with this. Assume you have connected to the database and assigned that connection to a con variable, you would then use the following line to read a cleaned version of your SQL script into R.

get_clean_sql() will ignore any lines of code that start with USE to specify a database, as your database should already be specified in your connection setup.

sql_query <- dfeR::get_clean_sql("path_to_sql_file.sql")

Executing the SQL query

Now that the SQL query is saved as a variable in the R environment you can pass that into a function to execute against the database. There’s a number of potential ways to do this, though a common way is to use dbGetQuery() from the DBI package, setting the statement as your cleaned SQL query.

It’s important to note that dbGetQuery() is intended to work with ‘SELECT’ style queries only. If you’re doing something that isn’t a ‘SELECT’ query, such as writing back into SQL, consider using the dbExecute() or dbSendQuery() functions from the DBI package instead.

sql_query_result <- DBI::dbGetQuery(con, statement = sql_query)

As a side note, if your SQL query is short, you could write it directly into the function such as:

sql_query_result <- DBI::dbGetQuery(
  con,
  statement = "SELECT * FROM [my_database_table]"
)

Troubleshooting

Our first advice if you hit an error, would be to check that your SQL script runs in SQL Server Management Studio (SSMS) or Azure Data Studio and is a valid SQL ‘SELECT’ query that returns a single output.

Assuming that it runs fine in SSMS, the next thing to try is to set additional settings while cleaning the SQL script. You can do this with the additional_settings argument in the get_clean_sql() function.

sql_query <- dfeR::get_clean_sql(
  "path_to_sql_file.sql",
  additional_settings = TRUE
)

This will add additional settings to the start of your SQL query that are sometimes necessary for the odbc and DBI connection to correctly execute your query.

For further troubleshooting tips, please see the Interacting with a SQL database section of our Analysts’ Guide, or contact the Statistics Development Team for support.