class: center, middle, inverse, title-slide # Intro to R ### The R Bootcamp @ Erfurt
www.therbootcamp.com
@therbootcamp
### June 2018 --- layout: true <div class="my-footer"><span> <a href="https://therbootcamp.github.io/"><font color="#7E7E7E">Erfurt, June 2018</font></a>                                           <a href="https://therbootcamp.github.io/"><font color="#7E7E7E">www.therbootcamp.com</font></a> </span></div> --- # Why R .pull-left3[ R steadily **grows in popularity**. Today, R is one of the **most popular languages for data science** and overall. In terms of the number of data science jobs, **R beats SAS and Matlab**, and is on par with Python: ] .pull-left65[ <p align="center"><img src="https://i0.wp.com/r4stats.com/wp-content/uploads/2017/02/Fig-1a-IndeedJobs-2017.png" height="370"></p> <p style="font-size:10px" align="center">source: https://i0.wp.com/r4stats.com/<p> ] --- # R is so popular because There are many good reasons to prefer R over superficially more user friendly software such as **Excel** or **SPSS** or more complex programming languages like **C++** or **Python**. .pull-left45[ ### Pro 1. **It's free** 2. Relatively **easy** 3. **Extensibility** ([CRAN](https://cran.r-project.org/), packages) 4. **User base** (e.g., [stackoverflow](https://stackoverflow.com/)) 5. [**Tidyverse**](https://www.tidyverse.org/) (`dplyr`, `ggplot`, etc.) 6. [**RStudio**](https://www.rstudio.com/) 7. **Productivity** options: [Latex](https://www.latex-project.org/), [Markdown](https://daringfireball.net/projects/markdown/), [GitHub](https://github.com/) ] .pull-right45[ ### Con Sometimes slow and awkward, but... [Tidyverse](https://www.tidyverse.org/) [Rcpp](http://www.rcpp.org/), [BH](https://cran.r-project.org/web/packages/BH/index.html): Links R to C++ and high-performance C++ libraries<br> [rPython](http://rpython.r-forge.r-project.org/): Links R to Python<br> [RHadoop](https://github.com/RevolutionAnalytics/RHadoop/wiki): Links R to Hadoop for big data applications.<br> ] --- # The almighty **tidyverse** Among its many packages, R newly contains a collection of high-performance, user-friendly packages (libraries) known as the [tidyverse](https://www.tidyverse.org/). The tidyverse includes: 1. `ggplot2` -- creating graphics. 2. `dplyr` -- data manipulation. 3. `tidyr` -- tidying data. 4. `readr` -- read wild data. 5. `purrr` -- functional programming. 6. `tibble` -- modern data frame. <br><br> <img src="http://d33wubrfki0l68.cloudfront.net/0ab849ed51b0b866ef6895c253d3899f4926d397/dbf0f/images/hex-ggplot2.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/071952491ec4a6a532a3f70ecfa2507af4d341f9/c167c/images/hex-dplyr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/5f8c22ec53a1ac61684f3e8d59c623d09227d6b9/b15de/images/hex-tidyr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/66d3133b4a19949d0b9ddb95fc48da074b69fb07/7dfb6/images/hex-readr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/9221ddead578362bd17bafae5b85935334984429/37a68/images/hex-purrr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/f55c43407ae8944b985e2547fe868e5e2b3f9621/720bb/images/hex-tibble.png" height="200px" /> --- # RStudio: R's favorite environment Next to many useful packages, R users greatly benefit from R's integrated development environment [**RStudio**](https://www.rstudio.com/). Rstudio is a **graphical user interface** that allows you to (a) edit code, (b) run code, (c) access files and history, and (d) create plots. RStudio also helps you with **project management**, **version control** via [Github](https://github.com/), writing **reports** using [markdown](http://rmarkdown.rstudio.com/authoring_basics.html) and [knitr](https://yihui.name/knitr/), and many other aspects of working with R. <p align="center"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/rstudio_plus.png" height="360"></p> --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have arguments 7. Arguments can have defaults 8. Functions expect certain object classes 9. View help files using `?` 10. Data is stored in data frames 11. Select variables (vectors) using `$` 12. Use RStudio and projects 13. Code in the editor 14. First load packages and data 15. Comment and format for readability 16. Struggle, ask for help, struggle... ] --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. <b><font style="color:#52D998">Everything is an object</font></b> 2. <b><font style="color:#52D998">Use `<-` to create/change objects</font></b> 3. <b><font style="color:#52D998">Name objects using `_`</font></b> 4. Objects have classes 5. Everything happens through functions 6. Functions have arguments 7. Arguments can have defaults 8. Functions expect certain object classes 9. View help files using `?` 10. Data is stored in data frames 11. Select variables (vectors) using `$` 12. Use RStudio and projects 13. Code in the editor 14. First load packages and data 15. Comment and format for readability 16. Struggle, ask for help, struggle... ] .pull-right5[ ```r # an object called some_name some_name <- c(1, 2, 3) # add 2 to the object's numbers some_name + 2 ``` ``` ## [1] 3 4 5 ``` ```r # print object some_name ``` ``` ## [1] 1 2 3 ``` ```r # make change permanent some_name <- some_name + 2 # print object some_name ``` ``` ## [1] 3 4 5 ``` ] --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. <b><font style="color:#52D998">Objects have classes</font></b> 5. Everything happens through functions 6. Functions have arguments 7. Arguments can have defaults 8. Functions expect certain object classes 9. View help files using `?` 10. Data is stored in data frames 11. Select variables (vectors) using `$` 12. Use RStudio and projects 13. Code in the editor 14. First load packages and data 15. Comment and format for readability 16. Struggle, ask for help, struggle... ] .pull-right5[ ```r # an object called some_name class(some_name) ``` ``` ## [1] "numeric" ``` ```r typeof(some_name) ``` ``` ## [1] "double" ``` ```r # an object called some_name class(list()) ``` ``` ## [1] "list" ``` ```r # an object called some_name class(tibble()) ``` ``` ## [1] "tbl_df" "tbl" "data.frame" ``` ] --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Objects have classes 4. Name objects using `_` 5. <b><font style="color:#52D998">Everything happens through functions</font></b> 6. Functions have arguments 7. Arguments can have defaults 8. Functions expect certain object classes 9. View help files using `?` 10. Data is stored in data frames 11. Select variables (vectors) using `$` 12. Use RStudio and projects 13. Code in the editor 14. First load packages and data 15. Comment and format for readability 16. Struggle, ask for help, struggle... ] .pull-right5[ ```r # function c() some_name <- c(1, 2, 3) # function `+`() some_name + 2 ``` ``` ## [1] 3 4 5 ``` ```r # function print() some_name ``` ``` ## [1] 1 2 3 ``` ```r # function class() class(some_name) ``` ``` ## [1] "numeric" ``` ] --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Objects have classes 4. Name objects using `_` 5. Everything happens through functions</font></b> 6. <b><font style="color:#52D998">Functions have arguments</font></b> 7. <b><font style="color:#52D998">Arguments can have defaults</font></b> 8. Functions expect certain object classes 9. View help files using `?` 10. Data is stored in data frames 11. Select variables (vectors) using `$` 12. Use RStudio and projects 13. Code in the editor 14. First load packages and data 15. Comment and format for readability 16. Struggle, ask for help, struggle... ] .pull-right5[ ```r # no argument mean() ``` ``` ## Error in mean.default(): argument "x" is missing, with no default ``` ```r # required argument mean(c(1, 2, 3)) ``` ``` ## [1] 2 ``` ```r # introducing NA mean(c(1, 2, 3, NA)) ``` ``` ## [1] NA ``` ```r # changing default to handle NA mean(c(1, 2, 3, NA), na.rm = TRUE) ``` ``` ## [1] 2 ``` ] --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Objects have classes 4. Name objects using `_` 5. Everything happens through functions</font></b> 6. Functions have arguments 7. Arguments can have defaults 8. <b><font style="color:#52D998">Functions expect certain object classes</font></b> 9. View help files using `?` 10. Data is stored in data frames 11. Select variables (vectors) using `$` 12. Use RStudio and projects 13. Code in the editor 14. First load packages and data 15. Comment and format for readability 16. Struggle, ask for help, struggle... ] .pull-right5[ ```r # mean works also for logical mean(c(TRUE, FALSE, TRUE)) ``` ``` ## [1] 0.6667 ``` ```r # but not for character mean(c("a", "b", "c")) ``` ``` ## [1] NA ``` ```r # classes relevant for all arg's mean(c(1, 2, 3), na.rm = "test") ``` ``` ## Error in if (na.rm) x <- x[!is.na(x)]: argument is not interpretable as logical ``` ] --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Objects have classes 4. Name objects using `_` 5. Everything happens through functions</font></b> 6. Functions have arguments 7. Arguments can have defaults 8. Functions expect certain object classes 9. <b><font style="color:#52D998">View help files using `?`</font></b> 10. Data is stored in data frames 11. Select variables (vectors) using `$` 12. Use RStudio and projects 13. Code in the editor 14. First load packages and data 15. Comment and format for readability 16. Struggle, ask for help, struggle... ] .pull-right5[ ```r ?mean ``` <p align="center"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/help.png" width="500"></p> ] --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Objects have classes 4. Name objects using `_` 5. Everything happens through functions</font></b> 6. Functions have arguments 7. Arguments can have defaults 8. Functions expect certain object classes 9. <b><font style="color:#52D998">View help files using `?`</font></b> 10. Data is stored in data frames 11. Select variables (vectors) using `$` 12. Use RStudio and projects 13. Code in the editor 14. First load packages and data 15. Comment and format for readability 16. Struggle, ask for help, struggle... ] .pull-right5[ ```r ?cor ``` <p align="center"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/help_cor.png" width="500"></p> ] --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Objects have classes 4. Name objects using `_` 5. Everything happens through functions</font></b> 6. Functions have arguments 7. Arguments can have defaults 8. Functions expect certain object classes 9. View help files using `?` 10. <b><font style="color:#52D998">Data is stored in data frames</font></b> 11. Select variables (vectors) using `$` 12. Use RStudio and projects 13. Code in the editor 14. First load packages and data 15. Comment and format for readability 16. Struggle, ask for help, struggle... ] .pull-right5[ ```r print(baselers) ``` ``` ## # A tibble: 10,000 x 20 ## id sex age height weight income ## <int> <chr> <int> <dbl> <dbl> <dbl> ## 1 1 male 44 174. 113. 6300 ## 2 2 male 65 180. 75.2 10900 ## 3 3 female 31 168. 55.5 5100 ## 4 4 male 27 209 93.8 4200 ## 5 5 male 24 177. NA 4000 ## 6 6 male 63 187. 67.4 11400 ## 7 7 male 71 152. 83.3 12000 ## 8 8 female 41 156. 67.8 7600 ## 9 9 male 43 176. 69.3 8500 ## 10 10 female 31 166. 66.3 6100 ## # ... with 9,990 more rows, and 14 more ## # variables ``` ] --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Objects have classes 4. Name objects using `_` 5. Everything happens through functions</font></b> 6. Functions have arguments 7. Arguments can have defaults 8. Functions expect certain object classes 9. View help files using `?` 10. Data is stored in data frames 11. <b><font style="color:#52D998">Select variables (vectors) using `$`</font></b> 12. Use RStudio and projects 13. Code in the editor 14. First load packages and data 15. Comment and format for readability 16. Struggle, ask for help, struggle... ] .pull-right5[ ```r # select sex veriable using $ baselers$sex ``` ``` ## [1] "male" "male" "female" "male" "male" ## [6] "male" "male" "female" ## [ reached getOption("max.print") -- omitted 9992 entries ] ``` ```r # Wherever possible, AVOID... baselers[['sex']] ``` ``` ## [1] "male" "male" "female" "male" "male" ## [6] "male" "male" "female" ## [ reached getOption("max.print") -- omitted 9992 entries ] ``` ```r baselers[[2]] ``` ``` ## [1] "male" "male" "female" "male" "male" ## [6] "male" "male" "female" ## [ reached getOption("max.print") -- omitted 9992 entries ] ``` ] --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Objects have classes 4. Name objects using `_` 5. Everything happens through functions</font></b> 6. Functions have arguments 7. Arguments can have defaults 8. Functions expect certain object classes 9. View help files using `?` 10. Data is stored in data frames 11. Select variables (vectors) using `$` 12. <b><font style="color:#52D998">Use RStudio and projects</font></b> 13. Code in the editor 14. First load packages and data 15. Comment and format for readability 16. Struggle, ask for help, struggle... ] .pull-right5[ ### Projects help... save workspace and history ● set project specific options ● access files ● version control ● etc. <p align="left"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/projects_1.png" width="360"></p> ] --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Objects have classes 4. Name objects using `_` 5. Everything happens through functions</font></b> 6. Functions have arguments 7. Arguments can have defaults 8. Functions expect certain object classes 9. View help files using `?` 10. Data is stored in data frames 11. Select variables (vectors) using `$` 12. Use RStudio and projects 13. <b><font style="color:#52D998">Code in the editor</font></b> 14. <b><font style="color:#52D998">First load packages and data</font></b> 15. Comment and format for readability 16. Struggle, ask for help, struggle... ] .pull-right5[ ```r # import packages library(tidyverse) library(yarrr) library(lme4) # import data baselers <- read_delim("baselers.txt", delim = '\t') ``` <br> ### The goal is... ... to create self-contained scripts that run uninterrupted from beginning to start. ] --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Objects have classes 4. Name objects using `_` 5. Everything happens through functions</font></b> 6. Functions have arguments 7. Arguments can have defaults 8. Functions expect certain object classes 9. View help files using `?` 10. Data is stored in data frames 11. Select variables (vectors) using `$` 12. Use RStudio and projects 13. Code in the editor 14. First load packages and data 15. <b><font style="color:#52D998">Comment and format for readability</font></b> 16. Struggle, ask for help, struggle... ] .pull-right5[ ## Bad ```r mean(subset((tibble(c('a','b'),runif(1000, 0,1))),c..a....b..=='a')[,'runif.1000..0..1.']) ``` ## Good ```r # create my data.frame my_data <- tibble('group' = c('a','b'), 'value' = runif(1000,0,1)) # subset data to group a # and compute average value my_data %>% filter(group == 'a') %>% summarize(mean(value)) ``` ] --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Objects have classes 4. Name objects using `_` 5. Everything happens through functions</font></b> 6. Functions have arguments 7. Arguments can have defaults 8. Functions expect certain object classes 9. View help files using `?` 10. Data is stored in data frames 11. Select variables (vectors) using `$` 12. Use RStudio and projects 13. Code in the editor 14. First load packages and data 15. <b><font style="color:#52D998">Comment and format for readability</font></b> 16. Struggle, ask for help, struggle... ] .pull-right5[ ### Short style guide ```r # Choose appropriate names analyze_baselers.R trial_id # Leave spaces around operators var_rt <- var(rt, na.rm = TRUE) # indent code if (var_rt < 2){ print('small variance') } else { print('large variance') } # Crete sections using # Data wrangling section --------- ``` See also <a href="http://style.tidyverse.org/">style.tidyverse.org/</a> ] --- # Essentials: The 2<sup>4</sup> Rules of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Objects have classes 4. Name objects using `_` 5. Everything happens through functions</font></b> 6. Functions have arguments 7. Arguments can have defaults 8. Functions expect certain object classes 9. View help files using `?` 10. Data is stored in data frames 11. Select variables (vectors) using `$` 12. Use RStudio and projects 13. Code in the editor 14. First load packages and data 15. Comment and format for readability 16. <b><font style="color:#52D998">Struggle, ask for help, struggle...</font></b> ] .pull-right5[ <p align="center"> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/struggle.png" width="420"> </p> ] --- # Downloads <font size="6" color="#F62D73"><a href="https://github.com/therbootcamp/Erfurt_2018June/blob/master/_sessions/_data/day_1.zip?raw=true">Datasets day 1</a></font><br><br> <font size="6" color="#F62D73"><a href="http://www.tug.org/mactex/">Latex (Mac) link</a></font><br> <font size="6" color="#F62D73"><a href="https://miktex.org/">Latex (Windows) link</a></font> --- # Interactive session <p><font size=6>Open up **Rstudio**...</font>