class: center, middle, inverse, title-slide # What is R? ### The R Bootcamp
Twitter:
@therbootcamp
### April 2018 --- # R From [Wikipedia](https://en.wikipedia.org/wiki/Statistical_model) (emphasis added): > R is an **open source programming language** and software environment for **statistical computing and graphics** that is supported by the R Foundation for Statistical Computing. The R language is **widely used among statisticians and data miners** for developing statistical software and data analysis. Polls, surveys of data miners, and studies of scholarly literature databases show that **R's popularity has increased substantially in recent years**. > R is a GNU package. The source code for the R software environment is written primarily in **C, Fortran, and R**. R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems. While R has a command line interface, there are several **graphical front-ends available**. --- # Programming language From [Wikipedia](https://en.wikipedia.org/wiki/Statistical_model) (emphasis added): > A programming language is a **formal language** that specifies a set of instructions that can be used to produce various kinds of output. Programming languages generally consist of **instructions for a computer**. Programming languages can be used to create programs that **implement specific algorithms**. .pull-left4[ ### Algorithm 1. Load data 2. Extract variables 3. Run analysis 4. Print result ] .pull-right6[ ### Implementation in R ```r data <- read.table(link) variables <- data[,c('group','variable')] analysis <- lm(variable ~ group, data = variables) summary(analysis) ``` ] <!-- --- # A short history on R From [Wikipedia](https://en.wikipedia.org/wiki/R_(programming_language) (emphasis added): .pull-left6[ > R is an implementation of the **S programming language** combined with **lexical scoping** semantics [environments] inspired by Scheme. [...] There are some important differences, but much of the code written for S runs unaltered. >R was created by **Ross Ihaka and Robert Gentleman** [...] and is currently developed by the **R Development Core Team** [...]. R is named partly after the first names of the first two R authors and partly as a play on the name of S. The project was conceived in 1992, with an initial version released in 1995 and a **stable beta version in 2000**. ] .pull-right6[ <img src="http://www.estatisticacomr.uff.br/wp-content/uploads/2014/11/Criadores.jpg" height="350px" style="display: block; margin: auto;" /> <p style="font-size:20px" align="center"> Robert Gentleman and Ross Ihaka<br> <font size="2" color="#F52D70">source: https://i0.wp.com/r4stats.com/</font> </p> ] --> --- # R is purpose specific R has been build for **statistical computing and graphics** and that is basically it: .pull-left5[ ### Use R (today) for... 1. Importing and exporting data 2. Handle and process data 3. Run cutting-edge analyses 4. Analyze big data 5. Create publication-ready figures 6. Prepare reproducible reports ] .pull-right5[ ### Don't use R for... 1. OS programs 2. GUIs 3. Complex websites ] --- # R is widely used .pull-left3[ R steadily **grows in popularity**. Today, R is one of the **most popular languages for data science** and overall. In terms of the number of data science jobs, **R beats SAS and Matlab**, and is on par with Python: ] .pull-left65[ <p align="center"><img src="https://i0.wp.com/r4stats.com/wp-content/uploads/2017/02/Fig-1a-IndeedJobs-2017.png" height="370"></p> <p style="font-size:10px" align="center">source: https://i0.wp.com/r4stats.com/<p> ] --- # R is so popular because There are many good reasons to prefer R over superficially more user friendly software such as **Excel** or **SPSS** or more complex programming languages like **C++** or **Python**. .pull-left45[ ### Pro 1. **It's free** 2. Relatively **easy** 3. **Extensibility** ([CRAN](https://cran.r-project.org/), packages) 4. **User base** (e.g., [stackoverflow](https://stackoverflow.com/)) 5. [**Tidyverse**](https://www.tidyverse.org/) (`dplyr`, `ggplot`, etc.) 6. [**RStudio**](https://www.rstudio.com/) 7. **Productivity** options: [Latex](https://www.latex-project.org/), [Markdown](https://daringfireball.net/projects/markdown/), [GitHub](https://github.com/) ] .pull-right45[ ### Con Sometimes slow and awkward, but... [Tidyverse](https://www.tidyverse.org/) [Rcpp](http://www.rcpp.org/), [BH](https://cran.r-project.org/web/packages/BH/index.html): Links R to C++ and high-performance C++ libraries<br> [rPython](http://rpython.r-forge.r-project.org/): Links R to Python<br> [RHadoop](https://github.com/RevolutionAnalytics/RHadoop/wiki): Links R to Hadoop for big data applications.<br> ] <!--- Move to Efficient projects # Project management RStudio facilitate project management via the use of *projects*. Projects support: .pull-left5[ 1. **File management** by automatically setting the working directory (see `setwd()`) <br><br> 2. **Project transitioning** by saving re-opening scripts, history, and workspace. <br><br> 3. **Customization** by enabling project specific settings. <br><br> 4. **Version control** by linking projects to repositories (e.g., using [GitHub](https://github.com/)) ] .pull-right5[ <p align="center"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/Projects.png" height="360"></p> ] ---> --- # Packages .pull-left35[ R features a vast and cutting-edge collection of **packages** provided on [**CRAN**](https://cran.r-project.org/) and [**Git/GitHub**](https://github.com/hadley) by R's large and highly active user base and the work of . ```r # To install a package install.packages('package_name') # load a package library(package_name) require(package_name) #Note: # Don't forget that packages # must also be loaded. ``` ] .pull-right55[ <iframe src="https://www.rdocumentation.org/trends?page1=1&sort1=total&page2=1&sort2=total&page3=1&page4=1" width="600" height="400" frameborder="0" marginheight="0" marginwidth="0" zoom="0.5"></iframe> <font size=2><a href="https://www.rdocumentation.org/trends">https://www.rdocumentation.org/trends</a></font> ] --- # The almighty **tidyverse** Among its many packages, R newly contains a collection of high-performance, user-friendly packages (libraries) known as the [tidyverse](https://www.tidyverse.org/). The tidyverse includes: 1. `ggplot2` -- creating graphics. 2. `dplyr` -- data manipulation. 3. `tidyr` -- tidying data. 4. `readr` -- read wild data. 5. `purrr` -- functional programming. 6. `tibble` -- modern data frame. <br><br> <img src="http://d33wubrfki0l68.cloudfront.net/0ab849ed51b0b866ef6895c253d3899f4926d397/dbf0f/images/hex-ggplot2.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/071952491ec4a6a532a3f70ecfa2507af4d341f9/c167c/images/hex-dplyr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/5f8c22ec53a1ac61684f3e8d59c623d09227d6b9/b15de/images/hex-tidyr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/66d3133b4a19949d0b9ddb95fc48da074b69fb07/7dfb6/images/hex-readr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/9221ddead578362bd17bafae5b85935334984429/37a68/images/hex-purrr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/f55c43407ae8944b985e2547fe868e5e2b3f9621/720bb/images/hex-tibble.png" height="200px" /> --- # RStudio: R's favorite environment Next to many useful packages, R users greatly benefit from R's integrated development environment [**RStudio**](https://www.rstudio.com/). Rstudio is a **graphical user interface** that allows you to (a) edit code, (b) run code, (c) access files and history, and (d) create plots. RStudio also helps you with **project management**, **version control** via [Github](https://github.com/), writing **reports** using [markdown](http://rmarkdown.rstudio.com/authoring_basics.html) and [knitr](https://yihui.name/knitr/), and many other aspects of working with R. <p align="center"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/rstudio_plus.png" height="360"></p> --- # Essentials of the R language .pull-left5[ >"To understand computations in R, two slogans are helpful: <br><br> >###(1) Everything that exists is an object and >###(2) everything that happens is a function call." ] .pull-right5[ <p align="center"><img src="https://statweb.stanford.edu/~jmc4/CopyPhoto.jpg" width="350" align="center"></p> <p align="center">John Chambers<br> <font size=2> Author of S and developer of R</font><br> <font size=2><a href="https://statweb.stanford.edu/~jmc4/">statweb.stanford.edu</a> </p> ] --- # Calls, assignments, and expressions R programs evolve through functions: Every action entails **passing on arguments (aka objects) to a function**, **calling (aka executing) the function**, and **printing or storing its output**. And this goes deep, many operations are functions in disguise. .pull-left45[ ```r # a prototypical function some_function <- function(argument_1, argument_2){ # do something with arguments result <- argument_1 + argument_2 # return result result } ``` ] .pull-right45[ ```r # Expressions some_function(2, 2) ``` ``` ## [1] 4 ``` ```r 2 + 2 ``` ``` ## [1] 4 ``` ```r # Assignments my_object1 <- some_function(2, 2) my_object2 <- 2 + 2 ``` ] <!--- # Object-orientation R is an object-oriented language. For us this entails:<br> >(a) **Everything is an object** (including functions).<br> >(b) **Generic functions** that respond to the **object's class**.<br> >(c) R **always copies deep**, implying that we always need to assign the output of a result.<br> .pull-left45[ ```r # creating a vector and testing its class my_vector <- c(1, 5, 2) class(my_vector) ``` ``` ## [1] "numeric" ``` ```r # testing the class of an object print(my_vector) ``` ``` ## [1] 1 5 2 ``` ] .pull-right45[ ```r # Sorting a vector sort(my_vector) ``` ``` ## [1] 1 2 5 ``` ```r print(my_vector) # still unsorted ``` ``` ## [1] 1 5 2 ``` ] ---> --- # Syntax style Programming languages have different expressive styles. R uses... .pull-left35[ + **Comment** symbol `#` <br2> + **Quotations** with either `" "` or ' ' <br2> + Curly brackets `{}` **enclose expressions** explicitly <br2> + Parentheses `()` **call functions** <br2> + Semicolon `;` **separates expressions** <br2> + `<`,`>`,`|`,`&`,`==`, `!=` define **logical statements** ] .pull-right55[ ```r # This is a comment # Quotes are used to define strings "a" == 'a' # Expression and calls my_fun(x,y){ x + y } # two expression in one line 2 + 2 ; 3 + 3 # are these equal/different 2 == 2 ; 2 != 2 ``` ] --- # Help .pull-left5[ R provides extremely helpful **help files** and **vignettes**. >**Help files** are required documentations for every R function and package published on [**CRAN**](https://cran.r-project.org/).<br> >**Vignettes** are long tutorials sometimes provided by the authors of a package. ```r # To access help files help("name_of_function") ?name_of_function # find help files ??name_of_function # To list and access vignettes vignette(package="name_of_package") vignette(package="name_of_vignette") ``` ] .pull-right45[ <p align="center"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/mean_help.png" width="500"></p> <!--- <br><br><iframe src="http://stat.ethz.ch/R-manual/R-devel/library/base/html/mean.html" height="510" width="800" align="center"></iframe> ---> ] <!--- ## The workflow of R <p align="center"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/rstudio_workings.png" height="500" align="center"></p> ---> --- # Datasets Day 1 <font size="6" color="#F62D73"><a href="https://github.com/therbootcamp/BaselRBootcamp_2018April/blob/master/_sessions/_data/data_BaselRBoocamp_Day1.zip?raw=true">download link</a></font> --- # Interactive session <p><font size=6>Open up **Rstudio**...</font> <p><font size=6><b><a href="https://therbootcamp.github.io/BaselRBootcamp_2018April/_sessions/D1S1_WhatIsR/What_is_R_practical.html">Link to practical</a>