Chapter 12 R Markdown
We now know how to do a range of data analysis, as well as producing a lot of different visualisations. Now we need to know how to compile all of that into a report, and automatically produce that report. The has numerous benefits over the traditional Excel and Word approach to writing out analysis:
- Faster, once the template is written
- More robust - errors are less likely to occur, particuarly when copying from Excel to Word
- Can be updated immediately if/when data changes
There are two elements to R Markdown, the text (which has a number of ways of formatting it) and the code (which can be displayed and included in different ways). We’ll look at both of these, but first, we need to open an R Markdown file:
- Go to File
- Go to New File
- Click on R Markdown
- A new tab will open up in the script pane - have a look at it as it contains an example R Markdown template
- in the top right hand corner of the script pane click on the arrow next to Knit and click on ‘Knit to HTML’
- Save the R Markdown file in the ‘2_code’ folder
- The code will run and the output will pop up upon completion
- Compare this to the input file
- The output file will have saved in the ‘2_code’ folder - it shouldn’t output to here, but we’ll sort that later
This is how you generate an R Markdown document. When you make changes to it from now on, you won’t have to specify the name and save location every time you knit it.
Tip: HTML is the preferable output - it is more versatile in designs and can include interactive elements, however if you want to you can also output to Word, and if you have additional software installed (called LaTex and pronounced ‘lay-tec’) you can output straight to PDF.
12.1 Text
We can see from the template above that there are a range of ways of formatting text to render it to look good in a report.
First of all, headings. Headings are generated by sequential hashes - one hash is the largest heading, and six hashes is the smallest heading. These can be used to create a series of sub-headings in documents.
Heading 1
Heading 2
Heading 3
Heading 4
Activity A10.3: Add some headings and change the sizes of some of the headings in the template. Run the R Markdown document and see how they output.
Another useful formatting feature are lists. These can either be ‘ordered’ (1, 2, 3 or a, b, c) or ‘unordered’ (bullet points). The two examples below how to produce ordered and unordered lists, but there are three important things to remember with lists:
- There must be a clear line in between the end of the prose above the list and the list itself
- There must be a space between the character that denotes the list item and the prose
- There must be a clear line in between the end of the list and the prose below
An unordered list:
* Thing 1
* Thing 2
* Thing 3
An unordered list:
- Thing 1
- Thing 2
- Thing 3
An ordered list:
1. Thing 1
2. Thing 2
3. Thing 3
An ordered list:
- Thing 1
- Thing 2
- Thing 3
Activity A10.3: Add some headings and change the sizes of some of the headings in the template. Run the R Markdown document and see how they output.
The final formatting feature to demonstrate in this chapter are creating hyperlinks. To create the hyperlink, the text to be made a link is bound in square brackets, and the address of the link comes immediately afterwards bound by brackets:
A [link](http://www.yourlinkhere.com)
A link
Tip: There is loads more functionality in R Markdown for different formatting - just Google it!
12.2 Code
However, the text edits we can make to R Markdown documents are only half of the story - the beauty of it is that you can merge text and the outputs of code. So how do we include code?
Code in R Markdown is written in ‘chunks’. Chunks are written in the form below and appear in an R Markdown in a slightly grey section:
```{r}
print(’hello world)
```
Let’s break that down:
- 3 grave accents signify the start of the code chunk
- {} signify the metadata that goes with the chunk - more on this in a minute
- Inside the curly brackets, the r signifies the programming language we’re going to be writing in
- We’ve then got some code - as many lines as you want with whatever outputs you want
- 3 grave accents close of the code chunk
Here’s what it looks like in an output:
[1] "hello world"
Activity: Write this out and run it in your R Markdown documents.
Sometimes we’ll want to include this code with outputs, sometimes only outputs, sometimes only code. We can control this using effect messages. Effect messages appear after the r within the curly brackets:
```{r, effect_message_here}
With an effect message you will specify whether that effect is true or false. The list below shows the effect messages that are likely to be useful - the default for all of the message below is true.
- eval = FALSE: Prints the code but not the results
- warning = FALSE: Hides any warnings that come with the code
- echo = FALSE: Hides the code but prints the results
- include = FALSE: Hides code and doesn’t print any results (good for setup sections which load in libraries and data)
- message = FALSE: Removes any other messages that come with an output (this is the least common of this set)
Tip: After r in a code chunk include a new for that chunk - it makes navigating easier, which can be done by clicking in the bottom lefthand corner of the script pane. You can’t have multiple sections with the same name though.
Activity: Run the code below five times. The first four timed include one of eval, warning/span>, echo/span>, and include/span> in the top of the code chunk, and for the fifth time include whatever combination of effect messages you would need to output this graph for a customer who was only interested in the outputs.
The final useful code chunk technique is ‘inline code’. Often we’ll want to calculate values from the data and report them in the text. Traditionally, if we were using Word and Excel, we’d calculate this in Excel, remember the number, and then type it up in Word - think of the things that could go wrong with that! We could get the number wrong when trying to remember it or we could type it wrong. R Markdown allows you to embed values in the text, so that it’s directly calculated from the data.
To include an inline piece of code, write this:
`r object_name`
Activity: In your R Markdown file, write a code chunk that calculates the average total number of FTE Teaching Assistants (Tot_TAs_FTE) and call it ave_tot_tas_fte. Then write a sentence that states what the average total FTE of Teaching Assistants in each school is. Run the code and look at the output.
12.3 Multiple Reports
Remember we wrote an lapply function to produce numerous graph for each different region? We can do the same for R Markdown reports to produce the same report for different entities, and whilst it’s got a few more steps than the lapply approach, the premise is the same. We’ll nned two file:
- The first is the actual R Markdown file, with a variable in it which will allow us to filter data to produce it for specific entities
- The second is an R script, which will iterate over the categories and produce a Markdown output for each of them.
We’ll produce the R Markdown first.
Activity: Open a new R Markdown and do the following:
- Remove the template in there currently
- Save the R Markdown file in the code file, and call it ‘region_factsheet’
- Create a heading which has an inline code chunk with the variable region_name (this doesn’t currently exist - this is the object name that will be the iterand)
- Write a code chunk which loads in the tidyverse library and the School Workforce Census data - the code and outputs should be hidden (when you’re loading the data you’ll need data/swfc_2016_machine_readable.csv, because the R Markdown file is saved in the 2_code folder, so the says ‘go back up one step’)
- Write another code chunk that creates a numeric object (use as.numeric()) which is the number of schools in the region (you might have to trial this with an actual region before replacing it with region_name)
- Write a sentence which states the number of schools within the region, using region_name and your object from stage 5 in inline code within the sentence
- Write the code to plot the distribution of the school size within that region, using the code below
Next, we’ll produce the R script which will create each of the factsheets.
Activity: Open a new R script and do the following:
- Load the tidyverse and rmarkdown libraries
- Load in the School Workforce Census
- Create an object which contains all the different levels of Government_Office_Region_Name called regions
- Create an lapply:
- The element which is going to have the function applied to each element of it is regions
- The object name in function() is region_name - this is the thing we want to change each time in the R Markdown file (look back at the R Markdown file you’ve just created if needs be)
- The function we’re going to apply is the render() function from the markdown package, it uses the code below
render('2_code/region_factsheet.Rmd',
#This is the name of the R Markdown file we want to produce an output from
output_file = paste0(region_name,".html"),
#This is what the name of the output file will be - the region name and .html
output_dir = 'outputs'))
#They will be stored in the outputs directory
RUN IT!
And that’s it - hopefully by completing this course you’ve got a good introduction to the power of R!