Overview

In this practical you’ll practice “data wrangling” with the dplyr and tidyr packages (part of the `tidyverse collection of packages).

By the end of this practical you will know how to:

  1. Change column names, select specific columns
  2. Create new columns based on existing ones
  3. Select specific rows of data based on multiple criteria
  4. Group data and calculate summary statistics
  5. Combine multiple data sets through key columns
  6. Convert data between wide and long formats

Datasets

You’ll need the following datasets for this practical:

library(tidyverse)
trial_act <- read_csv("../_data/baselrbootcamp_data/trial_act.csv")
trial_act_demo <- read_csv("../_data/baselrbootcamp_data/trial_act_demo_fake.csv")
File Rows Columns
trial_act.csv 2139 27
trial_act_demo_fake 2139 3

Packages

Package Installation
tidyverse install.packages("tidyverse")

Glossary

Function Package Description
rename() dplyr Rename columns
select() dplyr Select columns based on name or index
filter() dplyr Select rows based on some logical criteria
arrange() dplyr Sort rows
mutate() dplyr Add new columns
case_when() dplyr Recode values of a column
group_by(), summarise() dplyr Group data and then calculate summary statistics
left_join() dplyr Combine multiple data sets using a key column
spread() tidyr Convert long data to wide format - from rows to columns
gather() tidyr Convert wide data to long format - from columns to rows

Cheatsheet