class: center, middle, inverse, title-slide # Intro to R ### R for Data Science
Basel R Bootcamp
### February 2019 --- layout: true <div class="my-footer"> <span style="text-align:center"> <span> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/> </span> <a href="https://therbootcamp.github.io/"> <span style="padding-left:82px"> <font color="#7E7E7E"> www.therbootcamp.com </font> </span> </a> <a href="https://therbootcamp.github.io/"> <font color="#7E7E7E"> R For Data Science | February 2019 </font> </a> </span> </div> --- <!--- # R From [Wikipedia](https://en.wikipedia.org/wiki/Statistical_model) (emphasis added): > R is an **open source programming language** and software environment for **statistical computing and graphics** that is supported by the **R Foundation for Statistical Computing**. The R language is **widely used among statisticians and data miners** for developing statistical software and data analysis. Polls, surveys of data miners, and studies of scholarly literature databases show that **R's popularity has increased substantially in recent years**. > R is a GNU package. The source code for the R software environment is written primarily in **C, Fortran, and R**. R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems. While R has a command line interface, there are several **graphical front-ends available**. GNU's Not Unix! GPL guarantees end users the freedom to run, study, share and modify the software ---> # R is a programming language From [Wikipedia](https://en.wikipedia.org/wiki/Statistical_model) (emphasis added): > A programming language is a **formal language** that specifies a set of instructions that can be used to produce various kinds of output. Programming languages generally consist of **instructions for a computer**. Programming languages can be used to create programs that **implement specific algorithms**. .pull-left4[ ### Algorithm 1. Load data 2. Extract variables 3. Run analysis 4. Print result ] .pull-right6[ ### Implementation in R ```r data <- read.table(link) variables <- data[,c('group','variable')] analysis <- lm(variable ~ group, data = variables) summary(analysis) ``` ] --- # Why R? .pull-left3[ R steadily **grows in popularity**. Today, R is one of the **most popular languages for data science** and overall. In terms of the number of data science jobs, **R beats SAS and Matlab**, and is on par with Python. In many sciences it becomes the de factor **lingua franca** for statistics. <p style="font-size:10px;float:bottom"> Image source: http://r4stats.com/blog/ <p> ] .pull-right6[ <p align="center"> <img src="https://i0.wp.com/r4stats.com/wp-content/uploads/2017/02/Fig-1a-IndeedJobs-2017.png" height="450"> </p> ] --- # R is so popular because There are many good reasons to prefer R over superficially more user friendly software such as **Excel** or **SPSS** or more complex programming languages like **C++** or **Python**. .pull-left45[ ### Pro 1. **It's free** 2. Relatively **easy** 3. **Extensibility** ([CRAN](https://cran.r-project.org/), packages) 4. **User base** (e.g., [stackoverflow](https://stackoverflow.com/)) 5. [**Tidyverse**](https://www.tidyverse.org/) (`dplyr`, `ggplot2`, etc.) 6. [**RStudio**](https://www.rstudio.com/) 7. **Productivity** options: [Latex](https://www.latex-project.org/), [Markdown](https://daringfireball.net/projects/markdown/), [GitHub](https://github.com/) ] .pull-right45[ ### Con Some consider it slow and convuluted, but... [Tidyverse](https://www.tidyverse.org/) [Rcpp](http://www.rcpp.org/), [BH](https://cran.r-project.org/web/packages/BH/index.html): Links R to C++ and high-performance C++ libraries<br> [rPython](http://rpython.r-forge.r-project.org/): Links R to Python<br> [RHadoop](https://github.com/RevolutionAnalytics/RHadoop/wiki): Links R to Hadoop for big data applications.<br> ] --- # The almighty `tidyverse` Among its many packages, R newly contains a collection of high-performance, user-friendly packages (libraries) known as the [tidyverse](https://www.tidyverse.org/). The tidyverse includes: 1. `ggplot2` -- creating graphics. 2. `dplyr` -- data manipulation. 3. `tidyr` -- tidying data. 4. `readr` -- read wild data. 5. `purrr` -- functional programming. 6. `tibble` -- modern data frame. <br><br> <img src="http://d33wubrfki0l68.cloudfront.net/0ab849ed51b0b866ef6895c253d3899f4926d397/dbf0f/images/hex-ggplot2.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/071952491ec4a6a532a3f70ecfa2507af4d341f9/c167c/images/hex-dplyr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/5f8c22ec53a1ac61684f3e8d59c623d09227d6b9/b15de/images/hex-tidyr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/66d3133b4a19949d0b9ddb95fc48da074b69fb07/7dfb6/images/hex-readr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/9221ddead578362bd17bafae5b85935334984429/37a68/images/hex-purrr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/f55c43407ae8944b985e2547fe868e5e2b3f9621/720bb/images/hex-tibble.png" height="200px" /> --- # Packages .pull-left35[ R features a vast and cutting-edge collection of **packages** provided on [**CRAN**](https://cran.r-project.org/) and [**Git/GitHub**](https://github.com/hadley) by R's large and highly active user base and the work of . ```r # To install a package install.packages('package_name') # load a package library(package_name) require(package_name) #Note: # Don't forget that packages # must also be loaded. ``` ] .pull-right55[ <iframe src="https://www.rdocumentation.org/trends?page1=1&sort1=total&page2=1&sort2=total&page3=1&page4=1" width="600" height="400" frameborder="0" marginheight="0" marginwidth="0" zoom="0.5"></iframe> <font size=2> <a href="https://www.rdocumentation.org/trends"> www.rdocumentation.org/trends </a> </font> ] --- # Packages .pull-left35[ R features a vast and cutting-edge collection of **packages** provided on [**CRAN**](https://cran.r-project.org/) and [**Git/GitHub**](https://github.com/hadley) by R's large and highly active user base and the work of . ```r # To install a package install.packages('package_name') # load a package library(package_name) require(package_name) #Note: # Don't forget that packages # must also be loaded. ``` ] .pull-right45[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/packages.png" height="350"></img> ] --- <div class="center_text"> <span> 12 basic R lessons </span> </div> --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. Use RStudio and projects 11. Use editor and auto-complete 12. Comment and format for readability ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. <high>Everything is an object</high> 2. <high>Use <mono> <- </mono> to create/change objects</high> 3. <high>Name objects using <mono>_</mono></high> 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. Use RStudio and projects 11. Use editor and auto-complete 12. Comment and format for readability ] .pull-right5[ ```r # an object called some_name some_name <- c(1, 2, 3) # add 2 to the object's numbers some_name + 2 ``` ``` ## [1] 3 4 5 ``` ```r # print object some_name ``` ``` ## [1] 1 2 3 ``` ```r # make change permanent some_name <- some_name + 2 # print object some_name ``` ``` ## [1] 3 4 5 ``` ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. <high>Objects have classes</high> 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. Use RStudio and projects 11. Use editor and auto-complete 12. Comment and format for readability ] .pull-right5[ ```r # What is the class of some_name class(some_name) ``` ``` ## [1] "numeric" ``` ```r # an object called some_name class(list()) ``` ``` ## [1] "list" ``` ```r # an object called some_name class(baselers) ``` ``` ## [1] "tbl_df" "tbl" "data.frame" ``` ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. <high>Everything happens through functions</high> 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. Use RStudio and projects 11. Use editor and auto-complete 12. Comment and format for readability ] .pull-right5[ ```r # function c() some_name <- c(1, 2, 3) # function `+`() some_name + 2 ``` ``` ## [1] 3 4 5 ``` ```r # function print() some_name ``` ``` ## [1] 1 2 3 ``` ```r # function class() class(x = some_name) ``` ``` ## [1] "numeric" ``` ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. <high>Functions have (default) arguments</high> 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. Use RStudio and projects 11. Use editor and auto-complete 12. Comment and format for readability ] .pull-right5[ ```r # no argument mean() ``` ``` ## Error in mean.default(): argument "x" is missing, with no default ``` ```r # required argument mean(c(1, 2, 3)) ``` ``` ## [1] 2 ``` ```r # introducing NA mean(c(1, 2, 3, NA)) ``` ``` ## [1] NA ``` ```r # changing default to handle NA mean(c(1, 2, 3, NA), na.rm = TRUE) ``` ``` ## [1] 2 ``` ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. <high>Functions expect certain object classes</high> 8. View help files using `?` 9. Study errors and warnings 10. Use RStudio and projects 11. Use editor and auto-complete 12. Comment and format for readability ] .pull-right5[ ```r # mean works also for logical mean(c(TRUE, FALSE, TRUE)) ``` ``` ## [1] 0.6667 ``` ```r # but not for character mean(c("a", "b", "c")) ``` ``` ## [1] NA ``` ```r # classes relevant for all arg's mean(c(1, 2, 3), na.rm = "test") ``` ``` ## Error in if (na.rm) x <- x[!is.na(x)]: argument is not interpretable as logical ``` ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. <high>View help files using `?`</high> 9. Study errors and warnings 10. Use RStudio and projects 11. Use editor and auto-complete 12. Comment and format for readability ] .pull-right5[ ```r ?mean ``` <p align="center"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/help.png" width="500"></p> ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. <high>View help files using `?`</high> 9. Study errors and warnings 10. Use RStudio and projects 11. Use editor and auto-complete 12. Comment and format for readability ] .pull-right5[ ```r ?cor ``` <p align="center"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/help_cor.png" width="500"></p> ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. <high>Study errors and warnings</high> 10. Use RStudio and projects 11. Use editor and auto-complete 12. Comment and format for readability ] .pull-right5[ ```r # message - attend basel <- type_convert(baselers) ``` ``` ## Parsed with column specification: ## cols( ## sex = col_character() ## ) ``` ```r # warning - attend closely result <- mean('NA') ``` ``` ## Warning in mean.default("NA"): argument is not numeric or logical: returning NA ``` ```r # error - fix lenth(x = 1) ``` ``` ## Error in lenth(x = 1): could not find function "lenth" ``` ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. <high>Study errors and warnings</high> 10. Use RStudio and projects 11. Use editor and auto-complete 12. Comment and format for readability ] .pull-right5[ ```r lenth(x = 1) ``` ``` ## Error in lenth(x = 1): could not find function "lenth" ``` | Error | Description| |:------|:--------------------------------------------| | `'could not find function'`| Typo or package not loaded| | `'error in eval'`| An object is used in function that does not exist.| | `'cannot open()'`| Typo or missing path.| | `'no applicable method'`| Function inapplicable for type | | package errors | Unable to install, compile, or load package. | ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. <high>Use RStudio and projects</high> 11. Use editor and auto-complete 12. Comment and format for readability ] .pull-right5[ ### Projects help ... USE projects! save workspace and history ● set project specific options ● access files ● version control ● etc. <p align="left"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/projects_1.png" width="360"></p> ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. <high>Use RStudio and projects</high> 11. Use editor and auto-complete 12. Comment and format for readability ] .pull-right5[ ### Folder structure Complement projects by a <high>folder structure</high> appropriate for your project. <br><br> <p align="left"> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/folder_structure.png"> </p> ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. Use RStudio and projects 11. <high>Use editor and auto-complete</high> 12. Comment and format for readability ] .pull-right5[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/editor_edit.png"></img> <br><br><br> Shortcut to <high>send to console</high>:<br><br2><font size = 6>⌘/ctrl + ⏎</font><br><br2><br2> Shortcut to <high>rerun chunk</high>:<br><br2><font size = 6> ⌘/ctrl + ⇧ + p</font> ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. Use RStudio and projects 11. <high>Use editor and auto-complete</high> 12. Comment and format for readability ] .pull-right5[ ```r # import packages library(tidyverse) library(yarrr) library(lme4) # import data baselers <- read_delim(file = "baselers.txt", delim = '\t') ``` <br> ### The goal is... ... to create self-contained scripts that run uninterrupted from beginning to end. ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. Use RStudio and projects 11. <high>Use editor and auto-complete</high> 12. Comment and format for readability ] .pull-right55[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/auto_complete_1.png"></img> ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. Use RStudio and projects 11. <high>Use editor and auto-complete</high> 12. Comment and format for readability ] .pull-right55[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/auto_complete_2.png"></img> ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. Use RStudio and projects 11. <high>Use editor and auto-complete</high> 12. Comment and format for readability ] .pull-right55[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/auto_complete_3.png"></img> ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. Use RStudio and projects 11. Use editor and auto-complete 12. <high>Comment and format for readability</high> ] .pull-right5[ ## Bad ```r mean(subset((tibble(c('a','b'),runif(1000, 0,1))),c..a....b..=='a')[,'runif.1000..0..1.']) ``` ## Good ```r # create my data.frame my_data <- tibble('group' = c('a','b'), 'value' = runif(1000, 0, 1)) # subset data to group a # and compute average value my_data %>% filter(group == 'a') %>% summarize(mean(value)) ``` ] --- # Essentials: 12 basic R lessons .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. Use RStudio and projects 11. Use editor and auto-complete 12. <high>Comment and format for readability</high> ] .pull-right5[ ### Short style guide ```r # Choose appropriate names analyze_baselers.R trial_id # Leave spaces around operators var_rt <- var(rt, na.rm = TRUE) # indent code if (var_rt < 2){ print('small variance') } else { print('large variance') } # Data wrangling section --------- ``` See also <a href="http://style.tidyverse.org/">style.tidyverse.org/</a> ] --- class: middle, center <h1><a href="https://therbootcamp.github.io/R4DS_2019Feb/1_Data/BaselRBootcamp.zip">Downloads</a></h1> --- class: middle, center <h1><a>Interactive</a></h1> <h2>Open RStudio</h2>