class: center, middle, inverse, title-slide # Data I/O and Wrangling ### The R Bootcamp @ Erfurt
www.therbootcamp.com
@therbootcamp
### June 2018 --- layout: true <div class="my-footer"><span> <a href="https://therbootcamp.github.io/"><font color="#7E7E7E">Erfurt, June 2018</font></a>                                           <a href="https://therbootcamp.github.io/"><font color="#646464">www.therbootcamp.com</font></a> </span></div> --- .pull-left45[ ## Data input - Paths <font size = 4>When using R projects, paths are always relative to the location of your project folder</font></br> <font size = 4>If all of your data files are in your project folder (or a subfolder), the path will be easy to find</font></br> <img src="https://raw.githubusercontent.com/therbootcamp/Erfurt_2018June/master/_sessions/_image/example_folder_structure_annotated.png" width="100%" /> <font size = 4>Note: You <i>can</i> load files from other locations on your computer, but you shouldn't (for reproducibility reasons)</font></br> ] .pull-right5[ ### Example) ```r # Create survey dataframe from # survey.csv (Comma-separated csv) survey <- read_csv(file = "data/survey.csv") # Create study1 dataframe from # study1.sav (SPSS file) survey <- read_spss(file = "data/study1.sav") ``` Tip! Press the tab key to view a list of all files in the directory! <img src="https://raw.githubusercontent.com/therbootcamp/Erfurt_2018June/master/_sessions/_image/tab_completion.jpg" width="80%" style="display: block; margin: auto;" /> Look at help menus for additional arguments ```r # Look at help menu for read_csv function ?read_csv ``` ] --- ## Data input - Web <font size = 4>You can easily load files directly from the web by specifying the URL as the path</font></br> ```r # Save file URL as fires_url (it's pretty long) fires_url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/forestfires.csv" # Load as new dataframe called fires fires <- read_csv(file = fires_url) # Look at result! print(fires) ``` ``` ## # A tibble: 517 x 13 ## X Y month day FFMC DMC DC ISI temp RH wind rain area ## <int> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0 0 ## 2 7 4 oct tue 90.6 35.4 669. 6.7 18 33 0.9 0 0 ## 3 7 4 oct sat 90.6 43.7 687. 6.7 14.6 33 1.3 0 0 ## 4 8 6 mar fri 91.7 33.3 77.5 9 8.3 97 4 0.2 0 ## 5 8 6 mar sun 89.3 51.3 102. 9.6 11.4 99 1.8 0 0 ## 6 8 6 aug sun 92.3 85.3 488 14.7 22.2 29 5.4 0 0 ## 7 8 6 aug mon 92.3 88.9 496. 8.5 24.1 27 3.1 0 0 ## 8 8 6 aug mon 91.5 145. 608. 10.7 8 86 2.2 0 0 ## 9 8 6 sep tue 91 130. 693. 7 13.1 63 5.4 0 0 ## 10 7 5 sep sat 92.5 88 699. 7.1 22.8 40 4 0 0 ## # ... with 507 more rows ``` --- # Where you're at... .pull-left4[ 0) Loaded packages (like tidyverse!) with `library()`<br> 1) Created a new dataframe (df) from external files using a data input function such as `df <- read_csv()`<br> 2) You can explore the dataframe with functions like `View()`, `dim()` and `names()` 3) You can explore the dataframe and calculate descriptive statistics on specific columns using `FUN(df$NAME)`<br> <br><br><br> What's next?... **Wrangling!** ] .pull-right55[ ```r # Step 0) Load libraries library(tidyverse) # Step 1) Read file called baslers.txt # in a data folder with read_csv() # and save as new object baslers baslers <- read_csv(file = "data/baslers.txt") # Step 2) Explore data View(baslers) # Open in new window dim(baslers) # Show number of rows and columns names(baslers) # Show names # Step 3) Calculate descriptives on named colums mean(baslers$age) # What is the mean age? table(baslers$sex) # How many of each sex? # Step 4) ... ``` ] --- .pull-left5[ # What is Wrangling? Transform - Adding new columns - Combining columns - Splitting columns Organise - Moving data between columns and rows - Merging several dataframes - Sorting data by columns Aggregation - Aggregate data according to variables - Summarizing data across groups ] .pull-right5[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/organise_transform_aggregate.png" width="100%" style="display: block; margin: auto;" /> ] --- .pull-left45[ <br><br> <font size = 6>To wrangle data in R, we will use the dplyr and tidyr packages</font> ```r # Load packages individually # install.packages('dplyr') # install.packages('tidyr') library(dplyr) library(tidyr) # Or just use the tidyverse! # install.packages('tidyverse') library(tidyverse) ``` ] .pull-right5[ <br> <br><br> <img src="https://raw.githubusercontent.com/therbootcamp/Erfurt_2018June/master/_sessions/_image/dplyr_tidyr_hex.png" width="100%" style="display: block; margin: auto;" /> ] --- # %>% aka "The Pipe!" .pull-left35[ <font size = 6>dplyr makes extensive use of a new operator called the "Pipe" <b><font style="color:#52D998">%>%</font></b></font><br><br> <font size = 6>The "Pipe" <b><font style="color:#52D998">%>%</font></b> is used to pass objects from one function to another</font><br><br> <font size = 6>You can read the "Pipe" <b><font style="color:#52D998">%>%</font></b> as "And Then..."</font><br><br> <br> ] .pull-right65[ <br> <font size = 6><center><b>object</b> %>% <i><font color='gray'>"And then..."</font></i></center></font><br> <font size = 6><center><b>FUN1()</b> %>% <i><font color='gray'>"And then..."</font></i></center></font><br> <font size = 6><center><b>FUN2()</b> %>% <i><font color='gray'>"And then..."</font></i></center></font><br> <font size = 6></center></font><br> ] --- .pull-left45[ <img src="https://raw.githubusercontent.com/therbootcamp/Erfurt_2018June/master/_sessions/_image/pipe_explanation_kn.png" width="95%" /> ] .pull-right5[ <br><br><br> ### Example ```r # Create a vector score score <- c(8, 4, 6, 3, 7, 3) ``` ```r # Base-R method mean(score) ``` ``` ## [1] 5.167 ``` ```r # Pipe %>% method score %>% # Take the vector score AND THEN mean() # calculate the mean ``` ``` ## [1] 5.167 ``` ] --- .pull-left45[ <img src="https://raw.githubusercontent.com/therbootcamp/Erfurt_2018June/master/_sessions/_image/pipe_explanation_kn.png" width="95%" /> ] .pull-right5[ <br><br><br> ### Example ```r # Create a vector score score <- c(8, 4, 6, 3, 7, 3) ``` ```r # Base-R method round(sd(score), digits = 1) ``` ``` ## [1] 2.1 ``` ```r # Pipe %>% method score %>% # Take vector score AND THEN sd() %>% # Calculate the sd AND THEN round(digits = 1) # Round to 1 digit ``` ``` ## [1] 2.1 ``` ] --- .pull-left4[ <br><br><br> ## Functions in dplyr <font size = 5>There are dozens (hundreds?) functions in dplyr that allow you to wrangle data.</font><br> <font size = 5>For an overview, look at the <a href='https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf'>cheatsheet!</a></font><br> <font size = 5>We will now show you how to use ~10 of the most common functions</font><br> ] .pull-right55[ <br> <br> <img src="https://raw.githubusercontent.com/therbootcamp/Erfurt_2018June/master/_sessions/_image/dplyr_functions.png" width="600" style="display: block; margin: auto;" /> ] --- .pull-left45[ <br> ## How dplyr looks in action <br> > <font size = 4>From the <b>baselers</b> data...<br><br>1) <b>Change the name</b> of the column 'age' to 'age_years' and 'weight' to 'weight_kg'.<br><br>2), <b>only include</b> people younger than 40.<br><br>3) <b>Add a new column</b> called that shows people's debt divided by their income.<br><br>4) Create <b>groups</b> based on sex.<br><br>5) <b>Calculate summary statistics</b>: the mean income, debt, and debt ratio for each group.</font> ] .pull-right55[ <br> ```r # Assign result to baslers_agg baslers_agg <- baselers %>% # Change column names with rename() rename(age_years = age, weight_kg = weight) %>% # PIPE! # Select specific rows with filter() filter(age_years < 40) %>% # PIPE! # Create new columns witb mutate() mutate(debt_ratio = debt / income) %>% # PIPE! # Group data with group_by() group_by(sex) %>% # PIPE! # Calculate summary statistics with summarise() summarise(income_mean = mean(income), debt_mean = mean(debt), dr_mean = mean(dr)) ``` ] --- .pull-left4[ <br> ## How dplyr looks in action <br> Start by assigning your result to a new object to save it!<br><br> "Keep the pipe <font color = 'purple'>%>%</font> going" to continue working with your dataframe<br><br> The output of dplyr functions will (almost) always be a tibble<br><br> You can almost always include multiple operations within each function<br><br> ] .pull-right55[ <br> ```r # Assign result to baslers_agg baslers_agg <- baselers %>% # Change column names with rename() rename(age_years = age, weight_kg = weight) %>% # PIPE! # Select specific rows with filter() filter(age_years < 40) %>% # PIPE! # Create new columns witb mutate() mutate(debt_ratio = debt / income) %>% # PIPE! # Group data with group_by() group_by(sex) %>% # PIPE! # Calculate summary statistics with summarise() summarise(income_mean = mean(income), debt_mean = mean(debt), dr_mean = mean(dr)) ``` ] --- .pull-left4[ # Transformation Functions | Function| Description| |:-------------|:----| | `rename()` | Change column names | | `mutate()`| Create a new column from existing columns| | `case_when()`| Recode values from a vector to another| | `left_join()` | Combine multiple dataframes| ] .pull-right55[ <br><br><br><br> ### patients_df ```r patients_df # Demographic data ``` ``` ## # A tibble: 5 x 3 ## id b c ## <dbl> <dbl> <dbl> ## 1 1 37 1 ## 2 2 65 2 ## 3 3 57 2 ## 4 4 34 1 ## 5 5 45 2 ``` ] --- # `rename()` .pull-left3[ Change column names with `rename()` New = Old, New = Old, ... ```r patients_df # Original ``` ``` ## # A tibble: 5 x 3 ## id b c ## <dbl> <dbl> <dbl> ## 1 1 37 1 ## 2 2 65 2 ## 3 3 57 2 ## 4 4 34 1 ## 5 5 45 2 ``` ] .pull-right65[ Create better column names with `rename()` ```r # 0) Start with patients_df data patients_df <- patients_df %>% # 1) Change column names with rename() rename(age = b, # New = Old arm = c) # New = Old ``` ] --- # `mutate()` .pull-left3[ Add new columns (or change existing ones) with `mutate()`<br><br> Define new columns in terms of existing columns in the dataframe<br><br> Can add multiple new columns within one `mutate()`<br><br> Change an existing column by assigning it to itself ```r df %>% mutate( NEW1 = DEFINITION1, NEW2 = DEFINITION2, NEW3 = DEFINITION3, ... ) ``` ] .pull-right65[ Add new columns with `mutate()` ```r patients_df <- patients_df %>% # Create new columns with mutate() mutate(age_months = age * 12, age_decades = age / 10) ``` ``` ## Warning: package 'bindrcpp' was built under R version 3.4.4 ``` ] --- .pull-left4[ # `case_when()` Use `case_when()` in combination with `mutate()` to define new columns based on logical conditions ```r # Using mutate(case_when()) df %>% mutate( NEW = case_when( COND1 ~ VAL1, COND2 ~ VAL2 )) ``` Useful for recoding values. Example: |arm|arm_char| |:--|:--| |0|"placebo"| |1|"drug"| |0|"placebo"| ] .pull-right55[ <br><br> Recode values with `mutate(case_when())` ```r patients_df <- patients_df %>% # Create column arm_char based on values of arm mutate( arm_char = case_when(arm == 0 ~ "placebo", arm == 1 ~ "drug") ) ``` Note: By default, all 'other' values are set to NA. To set 'other' values to something else, use the condition `TRUE` as a catch-all for all 'other' values. ```r patients_df <- patients_df %>% mutate( arm_char = case_when(arm == 0 ~ "placebo", arm == 1 ~ "drug", TRUE ~ "other") # Catch-all ) ``` ] --- # Joining data <img src="https://raw.githubusercontent.com/therbootcamp/Erfurt_2018June/master/_sessions/_image/joining_data.png" width="95%" /> --- .pull-left35[ ## left_join() <br> <font size = 4>Use left_join() to combine data from 2 dataframes based on one or more keuy columns<br><br></font> <font size = 4>Each dataset must contain the 'key' column(s)<br></font> ```r # Join df2 to df1 # using KEY as the key column df1 %>% left_join(df2, by = c("KEY")) ``` <img src="https://raw.githubusercontent.com/therbootcamp/Erfurt_2018June/master/_sessions/_image/joining_data.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right6[ <br> <font size = 5>Join patients_df with results_df to create combined_df</font> ```r combined_df <- patients_df %>% # Join with results_df with left_join() left_join(results_df, by = "id") # Look at all column names names(combined_df) ``` ``` ## [1] "id" "age" "arm" "age_months" "age_decades" "arm_char" "t1" ## [8] "t2" ``` ```r # Show a few columns combined_df %>% select(id, arm, age, t1, t2) ``` ``` ## # A tibble: 5 x 5 ## id arm age t1 t2 ## <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1 1 37 123 135 ## 2 2 2 65 143 140 ## 3 3 2 57 NA NA ## 4 4 1 34 100 105 ## 5 5 2 45 NA NA ``` ] --- .pull-left35[ <br><br> # Notes Don't forget to start by assigning to a new (or existing) object with <-<br> Keep adding new functions connected by the pipe %>%<br> Order matters! You can refer to new columns in later code <br><br> ] .pull-right6[ <br><br> Create combined_df using multiple functions ```r # 0) Start with patients_df data combined_df <- patients_df %>% # 1) Change column names with rename() rename(age = b, arm = c) %>% # 2) Create new columns with mutate() mutate(age_months = age * 12, age_decades = age / 10, arm_char = case_when(arm == 0 ~ "placebo", arm == 1 ~ "drug") ) %>% # 3) Add data from results_df with left_join() left_join(results_df, by = "id") ``` ] --- .pull-left55[ # Organisation Functions Organisation functions help you shuffle your data by sorting rows by columns, filter rows based on criteria, select columns (etc.) | Function| Purpose|Example| |:--------|:----|:------------| | `arrange()` |Sort rows by columns|`df %>%`<br>`arrange(arm, age)`| | `slice()`| Select rows by location|`df %>%`<br>`slice(1:10)`| | `filter()` | Select specific rows by criteria|`df %>%`<br>`filter(age > 50)`| | `select()`| Select specific columns|`df %>%`<br>`select(arm, t1)`| ] .pull-right4[ <br><br> Organise the `combined_df` dataframe ```r combined_df %>% # Sort rows by arm then in # descending order of age arrange(arm, desc(age)) %>% # Only include age > 50 filter(age > 50) %>% # Select these columns select(arm, age, t1, t2) ``` ``` ## # A tibble: 2 x 4 ## arm age t1 t2 ## <dbl> <dbl> <dbl> <dbl> ## 1 2 65 143 140 ## 2 2 57 NA NA ``` ] --- .pull-left3[ # arrange() <font size = 5>The arrange() function is used to sort (aka, arrange) rows of a dataframe</font> <br><br> <font size = 5>You can sort by as many conditions as you want</font> <br><br> <font size = 5>To sort in descending order, use desc()</font> ] .pull-right65[ <br><br> <font size = 5>Sort rows by id then in descending order of age</font> ```r # 0) Start with combined_df data combined_df %>% # 1) Sort by id then age (descending) arrange(id, desc(age)) ``` ``` ## # A tibble: 5 x 8 ## id age arm age_months age_decades arm_char t1 t2 ## <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> ## 1 1 37 1 444 3.7 drug 123 135 ## 2 2 65 2 780 6.5 other 143 140 ## 3 3 57 2 684 5.7 other NA NA ## 4 4 34 1 408 3.4 drug 100 105 ## 5 5 45 2 540 4.5 other NA NA ``` ] --- .pull-left3[ # filter() The filter() function is used to select rows based on certain criteria using logical comparisons <br> |Operator| Effect| |---|---| |==| equal to| |!=| Not equal to| |<, >| Less than, greater than| |<=, >=|Less or =, greater or =| For complex conditions, chain multiple logical comparison operators with & (and), | (or) ] .pull-right65[ <br><br> <font size = 5>Select patients over 30 given drug</font> ```r combined_df %>% # Sort by id then age (descending) arrange(id, desc(age)) %>% # NEW - Only over 30 given drug filter(age > 30 & arm_char == "drug") ``` ``` ## # A tibble: 2 x 8 ## id age arm age_months age_decades arm_char t1 t2 ## <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> ## 1 1 37 1 444 3.7 drug 123 135 ## 2 4 34 1 408 3.4 drug 100 105 ``` ] --- .pull-left4[ # select() Keep columns with select() ```r # Select columns id, age df %>% select(id, age) ``` Drop columns (and keep others) with - ```r # Select everything BUT sex and id df %>% select(-sex, -id) ``` Use functions like contains(), starts_with(), (...) for advanced selection (look at help menu ?select() to see them all) ```r # Advanced selection df %>% select(contains("response")) df %>% select(id, # Start with id starts_with("demo"), contains("response")) ``` ] .pull-right55[ <br><br> <font size = 5>Drop age_months and age_decades colummns</font> ```r # 0) Start with combined_df data combined_df %>% # Sort by id then age (descending) arrange(id, desc(age)) %>% # Only drugtrial over 30 filter(arm_char == "drug" & age > 30) %>% # NEW - Remove age_months and age_decades select(-age_months, -age_decades) ``` ``` ## # A tibble: 2 x 6 ## id age arm arm_char t1 t2 ## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> ## 1 1 37 1 drug 123 135 ## 2 4 34 1 drug 100 105 ``` ] --- ## Grouped Aggregation <img src="https://raw.githubusercontent.com/therbootcamp/Erfurt_2018June/master/_sessions/_image/summarsed_data_diagram.png" width="85%" style="display: block; margin: auto;" /> --- .pull-left4[ # Aggregation functions | Function| action| |:-------------|:----| | `group_by()` |Group data by levels of one or more variables | | `summarise()` | Calculate summary statistics | ### Statistical functions | Function| action| |:-------------|:----| | `min(), max()`| Minimum, maximum | | `mean(), median()` |Mean, Median | | `sd()` |Standard deviation| | `sum()` | Sum| | `n()`| Number of cases| ] .pull-right55[ <br> ```r # First 2 rows of raw data combined_df %>% slice(1:2) ``` ``` ## # A tibble: 2 x 5 ## id age arm t1 t2 ## <dbl> <dbl> <chr> <dbl> <dbl> ## 1 1 37 drug 123 135 ## 2 2 65 other 143 140 ``` ```r # Group and summarise! combined_df %>% group_by(arm) %>% summarise( N = n(), age_mean = mean(age), t1_mean = mean(t1, na.rm = TRUE), t2_mean = mean(t2, na.rm = TRUE) ) ``` ``` ## # A tibble: 2 x 5 ## arm N age_mean t1_mean t2_mean ## <chr> <int> <dbl> <dbl> <dbl> ## 1 drug 2 35.5 112. 120 ## 2 other 3 55.7 143 140 ``` ] --- .pull-left3[ # group_by() <font size = 5>Used to combine data into separate groups based on 'grouping' columns<font> <br><br> <font size = 5>You can group by as many variables as you wish<font> <br><br> <font size = 5>You won't see any changes to the data until you use summarise()<font> <br> ] .pull-right65[ <br><br> <font size = 5>Group data by arm</font> ```r # 0) Start with combined_df data combined_df %>% # 1) Group data by arm with group_by group_by(arm) ``` ``` ## # A tibble: 5 x 5 ## # Groups: arm [2] ## id age arm t1 t2 ## <dbl> <dbl> <chr> <dbl> <dbl> ## 1 1 37 drug 123 135 ## 2 2 65 other 143 140 ## 3 3 57 other NA NA ## 4 4 34 drug 100 105 ## 5 5 45 other NA NA ``` ] --- .pull-left4[ # summarise() <font size = 5>Used to summarise data from groups<br><br> <font size = 5>Works like mutate() to create new variables, but with summary functions<font> <br> ```r df %>% group_by(VAR1) %>% summarise( SUMMARY = FUN(x), # age_mean = mean(age), # Ex 1 height_min = min(height) # Ex 2 total = n() ) ``` ] .pull-right55[ <br><br> <font size = 5>Calculate summary statistics</font> ```r # 0) Start with combined_df data combined_df %>% # 1) Group data by arm with group_by group_by(arm) %>% # 2) Calculate summary columns summarise( t1_mean = mean(t1, na.rm = TRUE), t2_mean = mean(t2, na.rm = TRUE), diff = t2_mean - t1_mean, N = n() # Number of rows in group ) ``` ``` ## # A tibble: 2 x 5 ## arm t1_mean t2_mean diff N ## <chr> <dbl> <dbl> <dbl> <int> ## 1 drug 112. 120 8.5 2 ## 2 other 143 140 -3 3 ``` ] --- .pull-left45[ ## Reshaping data There are two key functions in the tidyr package (part of the tidyverse) that allow you to 'reshape' a dataframe between 'wide'and 'long' formats.<br> 'Wide' dataframes have key data in separate columns, <br><br>'Long'</font> dataframes have key data in separate rows.<br> Why reshape data between wide and long? Some functions require data to be in a certain shape<br> #### Two key tidyr functions | Function | Result | |:----------|:-------| |`gather()`|Move data from 'wide' to 'long' format| |`spread()`|Move data from 'long' to 'wide' format| ] .pull-right5[ <br><br> ### 'Wide' vs. 'Long' data ```r # Wide format stock_w ``` ``` ## id t1 t2 ## 1 a 10 20 ## 2 b 20 26 ## 3 c 15 30 ``` ```r # Long format stock_l ``` ``` ## id time measure ## 1 a t1 10 ## 2 b t1 20 ## 3 c t1 15 ## 4 a t2 20 ## 5 b t2 26 ## 6 c t2 30 ``` ] --- .pull-left45[ ## gather() <font size=5>Use gather() to convert wide to long</font> ```r # Show wide data stock_w ``` ``` ## id t1 t2 ## 1 a 10 20 ## 2 b 20 26 ## 3 c 15 30 ``` ```r # "Gather" wide data to long stock_w %>% gather(time, # New group column measure, # New target column -id) # ID column ``` ``` ## id time measure ## 1 a t1 10 ## 2 b t1 20 ## 3 c t1 15 ## 4 a t2 20 ## 5 b t2 26 ## 6 c t2 30 ``` ] .pull-right5[ ## spread() <font size=5>Use spread() to convert long to wide</font> ```r # Show long data stock_l ``` ``` ## id time measure ## 1 a t1 10 ## 2 b t1 20 ## 3 c t1 15 ## 4 a t2 20 ## 5 b t2 26 ## 6 c t2 30 ``` ```r # "Spread" long data to wide stock_l %>% spread(time, # Old group column measure) # Old target column ``` ``` ## id t1 t2 ## 1 a 10 20 ## 2 b 20 26 ## 3 c 15 30 ``` ] --- .pull-left3[ ## Important! <br> <font size = 5>You can call many dplyr functions directly without the pipe %>%</font> <br><br><br> <font size = 5>However, we recommend always using the pipe %>% so you can string many operations together</font> ] .pull-right65[ <br><br> <font size = 5> Use the pipes %>%!</font> ```r ## Using dplyr functions without pipes %>% ## Avoid this if you can data <- mutate(data, age_years = age_months * 12) data <- filter(data, sex == "m") ## Using dplyr functions with pipes %>% ## Much better!!! data <- data %>% mutate(age_years = age_months * 12) data <- data %>% filter(sex == "m") # Why? Because with pipes you can easily put multiple # functions together if you choose. data <- data %>% mutate(age_years = age_months * 12) %>% filter(sex == "m") %>% # .... ``` ] --- .pull-left45[ # Questions? <img src="https://raw.githubusercontent.com/therbootcamp/Erfurt_2018June/master/_sessions/_image/dplyr_tidyr_hex.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right45[ <br><br> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/organise_transform_aggregate.png" width="100%" style="display: block; margin: auto;" /> ] --- <br><br><br> # [Wrangling Practical](https://therbootcamp.github.io/Erfurt_2018June/_sessions/D1S2_Wrangling/Wrangling_practical.html)