class: center, middle, inverse, title-slide # Recap II ### The R Bootcamp
Twitter:
@therbootcamp
### April 2018 --- # Essentials of the R language .pull-left5[ >"To understand computations in R, two slogans are helpful: <br><br> >###(1) Everything that exists is an object and >###(2) everything that happens is a function call." ] .pull-right5[ <p align="center"><img src="https://statweb.stanford.edu/~jmc4/CopyPhoto.jpg" width="350" align="center"></p> <p align="center">John Chambers<br> <font size=2> Author of S and developer of R</font><br> <font size=2><a href="https://statweb.stanford.edu/~jmc4/">statweb.stanford.edu</a> </p> ] --- .pull-left45[ # Objects >###"Everything in R is an object"<br> ><p align="right">*John Chambers*<p> <br><br> + R's objects are have **content and attributes**. <br2> + The **content can be anything** from numbers or strings to functions or complex data structures. <br2> + Attributes often encompass **names**, **dimensions**, and the **class or type** of the object, but other attributes are possible. <br2> + Practically all data objects are equipped with those **three essential attributes**. ] .pull-right5[ <br><br> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/objects.png" align="center" width="579" height="560"> ] --- # Vectors R's most **basic (and most simple) data format** - even single values (aka **scalars**) are implemented as vectors. .pull-left45[ ```r # creating a vector (incl. names) my_vec <- c(t_1 = 1.343, t_2 = 5.232) # naming vectors my_vec <- c(t_1 = 1.343, t_2 = 5.232) names(my_vec) <- c("new_1","new_2") # evaluting inherent attributes names(my_vec) length(my_vec) typeof(my_vec) ``` ] .pull-right5[ <p align="left"><br><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/vector.png" width="628px" height="190px"></p> ] --- # Types Vectors contains elements of only one type. Most often one of the four **basic types**: `integer`, `double`, `numeric`, and `character`. You can **test** the type using `typeof()` or the type-specific `is.*()`, e.g., `is.integer()`. .pull-left45[ ```r # numeric vectors my_vec <- c(1.343, 5.232) typeof(my_vec) ``` ``` ## [1] "double" ``` ```r # integer vectors (L avoids coercion) my_vec <- c(1L, 7L, 2L) typeof(my_vec) ``` ``` ## [1] "integer" ``` ] .pull-right45[ ```r # logical vectors my_vec <- c(TRUE, FALSE) typeof(my_vec) ``` ``` ## [1] "logical" ``` ```r # character vectors my_vec <- c('a', 'hello', 'world') typeof(my_vec) ``` ``` ## [1] "character" ``` ] --- # `data_frame`s .pull-left45[ * Data frames (and its variants, e.g., **tibble**s) are R's **main data format**. * **Data frames are lists** with specific **requirements**: + Every element must be a vector. + The lengths of the vectors must be equal (or multiples of another). * Use `data_frame()` and `as_data_frame` to create or to coerce to data frame, to `tibble` to be exact. ] .pull-right45[ <p align="center"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/data.frame.png" width="409px" height="266px"></p> ] --- # Accessing & changing data frames (and lists) Data frames (and lists) are best accessed using **names** and the `$`-operator. This, of course, implies that you followed good practice and named the individual elements in the data object. ```r # define data frame my_df <- data_frame('v_1' = c('A', 'B'), 'v_2' = c(1, 2)) ``` .pull-left45[ ```r # One bad, two correct ways to subset my_df[1] ; my_df[[1]] ; my_df[['v_1']] ``` ``` ## # A tibble: 2 x 1 ## v_1 ## <chr> ## 1 A ## 2 B ``` ``` ## [1] "A" "B" ``` ``` ## [1] "A" "B" ``` ] .pull-right45[ ```r # Best use $-operator to access my_df$v_1 ``` ``` ## [1] "A" "B" ``` ```r # and change my_df$v_1 <- c('Y', 'Z') my_df$v_1 ``` ``` ## [1] "Y" "Z" ``` ] <!--- .pull-left45[ ```r # retrieve element 'a' from vector my_vec <- c(a = 1, b = 4, c = 5) my_vec['a'] ``` ``` ## a ## 1 ``` ```r # change the element my_vec['c'] <- 'D' # change beyond length(my_vec) my_vec['d'] <- 1 ; my_vec ``` ``` ## a b c d ## "1" "4" "D" "1" ``` ] .pull-right45[ ```r # create matrix my_mat <- matrix(c(1:6), nrow=2) colnames(my_mat) <- c('v_1','v_2','v_3') rownames(my_mat) <- c('c_1','c_2') # retrieve second row from matrix my_mat['c_1', ] ; my_mat['c_1', 'v_2'] ``` ``` ## v_1 v_2 v_3 ## 1 3 5 ``` ``` ## [1] 3 ``` ] # Accessing & changing **complex** objects pt. 1 In accessing and changing complex objects the additional `list`-layer needs to be taken into account. Single brackets `[` will select elements within the list, not the object behind those elements. To select the object behind the element use **double brackets** '[['. Additionaly, complex objects can be conveniently accessed using the **dollor operator `$`**. In order to further descend into the `list`'s structur append **multiple select operators**, e.g., `my_list[[1]][[2]]`. .pull-left45[ ```r # retrieve elements from list my_list <- list('A'=c('A','B'), 'B'=list(c(1,2,3), c(TRUE,FALSE,TRUE))) my_list[1] ; my_list[[1]] ; my_list[['A']] ``` ``` ## $A ## [1] "A" "B" ``` ``` ## [1] "A" "B" ``` ``` ## [1] "A" "B" ``` ] .pull-right45[ ```r # retrieve deep elements in list my_list <- list('A'=c('A','B'), 'B'=list(c(1,2,3), c(TRUE,FALSE,TRUE))) my_list[[2]][1] ; my_list[[2]][[1]] # etc ``` ``` ## [[1]] ## [1] 1 2 3 ``` ``` ## [1] 1 2 3 ``` ] # Accessing & changing **complex** objects pt. 2 Data frames can be accessed **exactly like lists**. In addition, data frames allow for a matrix-like access using **single bracket** `[`. Note however that selecting rows using single bracket returns a data frame, whereas for selecting columns returns a vector. .pull-left45[ ```r # retrieve elements from list my_df <- data.frame('v_1'=c('A','B','C'), 'v_2'=c(1,2,3)) my_df[1] ; my_df[[1]] ; my_df[['v_1']] ``` ``` ## v_1 ## 1 A ## 2 B ## 3 C ``` ``` ## [1] A B C ## Levels: A B C ``` ``` ## [1] A B C ## Levels: A B C ``` ] .pull-right45[ ```r # retrieve elements from list my_df <- data.frame('v_1'=c('A','B','C'), 'v_2'=c(1,2,3)) my_df[1,] ; my_df[,1] ; my_df[1,2] ``` ``` ## v_1 v_2 ## 1 A 1 ``` ``` ## [1] A B C ## Levels: A B C ``` ``` ## [1] 1 ``` ] ---> --- # Example ```r c(22, 45, 32, 18, 19, 24) age <- c(22, 45, 32, 18, 19, 24) as.character(age) age <- as.character(age) mean(age) ``` --- # Functions Functions are objects that conduct operations on objects using objects. Functions have 3 elements: + **Name**: Used to call (execute) the function. + **Argument(s)**: Objects that the function needs to do its job. May have *default arguments*. + **Expression**: R code that does the job. .pull-left5[ ```r # Defining a function that computes # the mean or median my_stat <- function(x, method = 'mean'){ # detect and run method if(method == "mean") return(mean(x)) if(method == "median") return(median(x)) } # Define object my_vec <- c(1, 4, 6, 3, 7, 5, 12, 9) ``` ] .pull-right45[ ```r # Runnning our functions mean(x = my_vec) ``` ``` ## [1] 5.875 ``` ```r my_stat(x = my_vec, method = 'mean') ``` ``` ## [1] 5.875 ``` ```r my_stat(x = my_vec, method = 'median') ``` ``` ## [1] 5.5 ``` ] --- # Help .pull-left5[ **help files** (and **vignettes**) are very useful. Pay attention to... - `Usage` - shows function's use, its arguments and their defaults. - `Arguments` - explains arguments, and their `type`/`class` - `Value` - explains what the function returns - `Examples` - ```r # To access help files ?name_of_function # search help files ??name_of_function ``` ] .pull-right45[ <p align="center"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/mean_help.png" width="500"></p> <!--- <br><br><iframe src="http://stat.ethz.ch/R-manual/R-devel/library/base/html/mean.html" height="510" width="800" align="center"></iframe> ---> ] --- # The almighty **tidyverse** Among its many packages, R contains a collection of high-performance, easy-to-use packages (libraries) designed specifically for handling data know as the [tidyverse](https://www.tidyverse.org/). The tidyverse includes: 1. `ggplot2` -- creating graphics. 2. `dplyr` -- data manipulation. 3. `tidyr` -- tidying data. 4. `readr` -- read wild data. 5. `purrr` -- functional programming. 6. `tibble` -- modern data frame. <br><br> <img src="http://d33wubrfki0l68.cloudfront.net/0ab849ed51b0b866ef6895c253d3899f4926d397/dbf0f/images/hex-ggplot2.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/071952491ec4a6a532a3f70ecfa2507af4d341f9/c167c/images/hex-dplyr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/5f8c22ec53a1ac61684f3e8d59c623d09227d6b9/b15de/images/hex-tidyr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/66d3133b4a19949d0b9ddb95fc48da074b69fb07/7dfb6/images/hex-readr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/9221ddead578362bd17bafae5b85935334984429/37a68/images/hex-purrr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/f55c43407ae8944b985e2547fe868e5e2b3f9621/720bb/images/hex-tibble.png" height="200px" /> --- .pull-left4[ # How do we wrangle data in R? <font size = 5>Answer: dplyr</font> Anytime you want to transform, organize, or aggregate data, use `dplyr` <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/dplyr_hex.png" width="50%" style="display: block; margin: auto;" /> ] .pull-right5[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/organise_transform_aggregate.png" width="100%" style="display: block; margin: auto;" /> ] --- .pull-left35[ # dplyr `dplyr` is a combination of 3 things: 1. **`objects`** like dataframes 2. **`functions`** that **do** things to objects. 3. **`pipes`** `%>%` that string together objects and verbs <br> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/pipe.jpg" width="70%" style="display: block; margin: auto;" /> ] .pull-right6[ <br><br><br><br> ## The pipe %>% <br2> `dplyr` makes extensive use of the 'pipe' `%>%` which passes objects between functions. <br><br> ```r data %>% # Start with data, AND THEN... FUN1 %>% # Do FUN1, AND THEN... FUN2 %>% # Do FUN2, AND THEN... FUN3 %>% # Do FUN3, AND THEN... group_by(x, y) %>% # Group by variables x, y summarise( VAR_A_New = FUN4(X), VAR_B_New = FUN5(Y), VAR_C_New = FUN6(Z), ) ) ``` ] --- ## There are tons of statistical packages in R! .pull-left35[ <br><br> | Package| Models| |------:|:----| | `afex`| Anovas| | `rpart`| Decision Trees| | `lme4`| Mixed effects regression| | `BayesFactor`| Bayesian statistics| | `igraph`| Cluster analyses| | `neuralnet`| Neural networks| ] .pull-right6[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/statistical_procedures.png" width="100%" style="display: block; margin: auto;" /> ] --- # Statistics .pull-left5[ | Test| R Function| |:------|:----| | `formula`|The dependent variable and one or more independent variables| | `data`| The dataframe containing variables| | `subset` | Optional subset of data to test | ```r trial_X ``` ``` ## id sex age arm y_primary y_secondary ## 1 1 m 35 1 50 1 ## 2 2 f 42 2 78 1 ## 3 3 f 24 1 46 0 ## 4 4 m 56 2 97 1 ## 5 5 f 49 1 74 1 ``` ] .pull-right45[ ```r TEST(formula = Y ~ X1 + X2 + ..., data = DATA, subset = SUBSET, ...) ``` ### Examples ```r # T-test # DV = y_primary, IV = arm t.test(formula = y_primary ~ arm, data = trial_X) # Regression # DV = y_secondary, IV = arm, sex, age # Subset = only id > 50 glm(formula = y_secondary ~ arm + sex + age, data = trial_X, subset = id > 50) ``` ] --- .pull-left35[ <br><br> ## Exploring statistical objects Assign the results of a statistical function to a new object, then use *generic* functions such as `print()`, `summary()`, `predict()` and `plot()` to explore it. ```r # Create statistical object my_obj <- FUN(formula = ..., data = ...) names(my_obj) # Elements print(my_obj) # Print summary(my_obj) # Summary plot(my_obj) # Plotting predict(my_obj, ..) # Predict ``` ] .pull-right6[ <br><br> ```r # Create a glm object chick_glm <- glm(formula = weight ~ Time, data = ChickWeight) # Print the object names(chick_glm) ``` ``` ## [1] "coefficients" "residuals" "fitted.values" "effects" "R" ## [6] "rank" "qr" "family" "linear.predictors" "deviance" ## [11] "aic" "null.deviance" "iter" "weights" "prior.weights" ## [16] "df.residual" "df.null" "y" "converged" "boundary" ## [21] "model" "call" "formula" "terms" "data" ## [26] "offset" "control" "method" "contrasts" "xlevels" ``` ```r ## Access elements chick_glm$coefficients ``` ``` ## (Intercept) Time ## 27.467 8.803 ``` ] --- # What is machine learning? .pull-left6[ ### Algorithms autonomously learning from data. Given data, an algorithm tunes its *parameters* to match the data, understand how it works, and make predictions for what will occur in the future. <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/mldiagram_A.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right4[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/machinelearningcartoon.png" width="70%" style="display: block; margin: auto;" /> ] --- # How do I do machine learning in R? .pull-left6[ In principle, it's a lot like evaluating statistical functions. Install the necessary package, evaluate the model function, explore the results. ```r # Create training and test data data_train <- ... data_test <- ... # Train models on training data model_A <- A_fun(formula = y ~ ., data = data_train) # Model A predictions pred_A <- predict(model_A, newdata = data_test) # Calculate Model A error pred_err_A <- mean(abs(pred_A - data_test$y)) # Compare to Models B, C, D... ``` ] .pull-right35[ If you're really into machine learning, the `caret` package can automate much of the the machine learning process. <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/mlrcaret.png" width="95%" style="display: block; margin: auto;" /> ] --- # Today <p><font size=6><b><a href="https://therbootcamp.github.io/BaselRBootcamp_2018April/schedule">Schedule</a>