class: center, middle, inverse, title-slide # Recap I ### The R Bootcamp
Twitter:
@therbootcamp
### April 2018 --- # Essentials of the R language .pull-left5[ >"To understand computations in R, two slogans are helpful: <br><br> >###(1) Everything that exists is an object and >###(2) everything that happens is a function call." ] .pull-right5[ <p align="center"><img src="https://statweb.stanford.edu/~jmc4/CopyPhoto.jpg" width="350" align="center"></p> <p align="center">John Chambers<br> <font size=2> Author of S and developer of R</font><br> <font size=2><a href="https://statweb.stanford.edu/~jmc4/">statweb.stanford.edu</a> </p> ] --- .pull-left45[ # Objects >###"Everything in R is an object"<br> ><p align="right">*John Chambers*<p> <br><br> + R's objects are have **content and attributes**. <br2> + The **content can be anything** from numbers or strings to functions or complex data structures. <br2> + Attributes often encompass **names**, **dimensions**, and the **class or type** of the object, but other attributes are possible. <br2> + Practically all data objects are equipped with those **three essential attributes**. ] .pull-right5[ <br><br> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/objects.png" align="center" width="579" height="560"> ] --- # Accessing & changing vectors To access (aka **subset** or **slicing**) and change atomic data objects use **brackets** `[]` and provide either **integers**, **logicals**, or **names** to indicate the relevant vector content. To change content, assign new content of matching size to subset using ´<-´. .pull-left45[ ```r # retrieve second element from vector my_vec <- c('A', 'B', 'C') my_vec[2] ``` ``` ## [1] "B" ``` ```r # change the second element my_vec[2] <- 'D' my_vec ``` ``` ## [1] "A" "D" "C" ``` ] .pull-right45[ ```r # Use logical comparison to access vector my_vec[my_vec != 'A'] ``` ``` ## [1] "D" "C" ``` ```r # Change vector using logical comparison my_vec[my_vec != 'A'] <- c('E', 'F') my_vec ``` ``` ## [1] "A" "E" "F" ``` ] --- # Functions Functions are objects that conduct operations on objects using objects. Functions have 3 elements: + **Name**: Used to call (execute) the function. + **Argument(s)**: Objects that the function needs to do its job. May have *default arguments*. + **Expression**: R code that does the job. .pull-left5[ ```r # Defining a function that computes # the mean or median my_stat <- function(x, method = 'mean'){ # detect and run method if(method == "mean") return(mean(x)) if(method == "median") return(median(x)) } # Define object my_vec <- c(1, 4, 6, 3, 7, 5, 12, 9) ``` ] .pull-right45[ ```r # Runnning our functions mean(x = my_vec) ``` ``` ## [1] 5.875 ``` ```r my_stat(x = my_vec, method = 'mean') ``` ``` ## [1] 5.875 ``` ```r my_stat(x = my_vec, method = 'median') ``` ``` ## [1] 5.5 ``` ] --- # Example ```r c(22, 45, 32, 18, 19, 24) age <- c(22, 45, 32, 18, 19, 24) as.character(age) age <- as.character(age) mean(age) ``` --- # Help .pull-left5[ **help files** (and **vignettes**) are very useful. Pay attention to... - `Usage` - shows function's use, its arguments and their defaults. - `Arguments` - explains arguments, and their `type`/`class` - `Value` - explains what the function returns - `Examples` - ```r # To access help files ?name_of_function # search help files ??name_of_function ``` ] .pull-right45[ <p align="center"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/mean_help.png" width="500"></p> <!--- <br><br><iframe src="http://stat.ethz.ch/R-manual/R-devel/library/base/html/mean.html" height="510" width="800" align="center"></iframe> ---> ] --- # Importing and Exporting Data In this session you will learn... .pull-left45[ 1. How to import data data from **delimeter separated files** (e.g., .csv)? <br2> 2. How to import data data from **proprietory file formats** (e.g., .sav)? <br2> 3. How to save/export data to various formats, including **R's own files types**? <br2> 4. How to use **file connections** to read data in its rawest possible way? <br2> 5. About a new data format called **tibble**. ] .pull-right45[ <img src="http://d33wubrfki0l68.cloudfront.net/66d3133b4a19949d0b9ddb95fc48da074b69fb07/7dfb6/images/hex-readr.png" width="150"> <img src="http://d33wubrfki0l68.cloudfront.net/f55c43407ae8944b985e2547fe868e5e2b3f9621/720bb/images/hex-tibble.png" width="150"> <br> <img src="http://haven.tidyverse.org/logo.png" width="150"> <img src="https://www.rstudio.com/wp-content/uploads/2017/05/readxl-259x300.png" width="150"> ] --- # An example Assume we have a *flat* data set with variables `id`, `var_1`, and `var_2` and cases as rows. Such data can be conveniently read in using `read_csv()`. Moreover, `read_csv()` will **automatically identify** (a) columns and rows, (b) column names, (c) the type of the columns and finally return a `tibble` (more on that later). .pull-left45[ ```r # This is how a text file may look # on your hard-drive id\tvar_1\tvar_2\n DCDL\t.287\t.048\n FEFK\t.894\t.383\n ZEWE\t1.374\t.623\n OJEE\t.631\t.826" ``` ] .pull-right45[ ```r # read in data (-> tibble) require(readr) read_delim("data/my_dataset.csv") ``` ``` ## Loading required package: readr ``` ``` ## # A tibble: 4 x 3 ## id var_1 var_2 ## <chr> <dbl> <dbl> ## 1 DCDL 0.287 0.0480 ## 2 FEFK 0.894 0.383 ## 3 ZEWE 1.37 0.623 ## 4 OJEE 0.631 0.826 ``` ] --- # `tibble`s The **output** from most `tidyverse` read functions such as `read_csv` and the preferred data format for many (but not all) analyses is a `tibble`. `tibble`s are a **modern, leaner version of data.frames**. .pull-left45[ ### tibbles ... + never change the input's type -> no factors. <br2> + never add row names. <br2> + never change column names. <br2> + look better in `print`. <br2> + are accessed more consistently. ] .pull-right45[ ### Functions ```r # create tibble my_data <- tibble(id, var_1, var_2) # convert to and from data.frame as_tibble(my_data_frame) as.data.frame(my_tibble) ``` ``` ## # A tibble: 4 x 3 ## id var_1 var_2 ## <chr> <dbl> <dbl> ## 1 DCDL 0.287 0.0480 ## 2 FEFK 0.894 0.383 ## 3 ZEWE 1.37 0.623 ## 4 OJEE 0.631 0.826 ``` ] --- # The almighty **tidyverse** Among its many packages, R contains a collection of high-performance, easy-to-use packages (libraries) designed specifically for handling data know as the [tidyverse](https://www.tidyverse.org/). The tidyverse includes: 1. `ggplot2` -- creating graphics. 2. `dplyr` -- data manipulation. 3. `tidyr` -- tidying data. 4. `readr` -- read wild data. 5. `purrr` -- functional programming. 6. `tibble` -- modern data frame. <br><br> <img src="http://d33wubrfki0l68.cloudfront.net/0ab849ed51b0b866ef6895c253d3899f4926d397/dbf0f/images/hex-ggplot2.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/071952491ec4a6a532a3f70ecfa2507af4d341f9/c167c/images/hex-dplyr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/5f8c22ec53a1ac61684f3e8d59c623d09227d6b9/b15de/images/hex-tidyr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/66d3133b4a19949d0b9ddb95fc48da074b69fb07/7dfb6/images/hex-readr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/9221ddead578362bd17bafae5b85935334984429/37a68/images/hex-purrr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/f55c43407ae8944b985e2547fe868e5e2b3f9621/720bb/images/hex-tibble.png" height="200px" />