Recap II

class: center, middle, inverse, title-slide

# Recap II
### The R Bootcamp Twitter: <a href='https://twitter.com/therbootcamp'>@therbootcamp</a>
### April 2018

---

# Essentials of the R language

.pull-left5[
>"To understand computations in R, two slogans are helpful:
 
>###(1) Everything that exists is an object and 
>###(2) everything that happens is a function call."
]

.pull-right5[
<img src="https://statweb.stanford.edu/~jmc4/CopyPhoto.jpg" width="350" align="center">
John Chambers Author of S and developer of R 
<a href="https://statweb.stanford.edu/~jmc4/">statweb.stanford.edu</a>

]

---

.pull-left45[
# Objects
>###"Everything in R is an object" 
>*John Chambers*

+ R's objects are have **content and attributes**.
<br2>
+ The **content can be anything** from numbers or strings to functions or complex data structures. 
<br2>
+ Attributes often encompass **names**, **dimensions**, and the **class or type** of the object, but other attributes are possible. 
<br2>
+ Practically all data objects are equipped with those **three essential attributes**.

]

.pull-right5[
 
<img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/objects.png" align="center" width="579" height="560">
]

---

# Vectors

R's most **basic (and most simple) data format** - even single values (aka **scalars**) are implemented as vectors.

.pull-left45[

```r
# creating a vector (incl. names)
my_vec <- c(t_1 = 1.343, t_2 = 5.232)

# naming vectors
my_vec <- c(t_1 = 1.343, t_2 = 5.232)
names(my_vec) <- c("new_1","new_2")

# evaluting inherent attributes
names(my_vec)
length(my_vec)
typeof(my_vec)
```
]

.pull-right5[
 <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/vector.png" width="628px" height="190px">
]

---

# Types

Vectors contains elements of only one type. Most often one of the four **basic types**: `integer`, `double`, `numeric`, and `character`. You can **test** the type using `typeof()` or the type-specific `is.*()`, e.g., `is.integer()`.

.pull-left45[

```r
# numeric vectors
my_vec <- c(1.343, 5.232)
typeof(my_vec)
```

```
## [1] "double"
```

```r
# integer vectors (L avoids coercion)
my_vec <- c(1L, 7L, 2L)
typeof(my_vec)
```

```
## [1] "integer"
```
]

.pull-right45[

```r
# logical vectors
my_vec <- c(TRUE, FALSE)
typeof(my_vec)
```

```
## [1] "logical"
```

```r
# character vectors
my_vec <- c('a', 'hello', 'world')
typeof(my_vec)
```

```
## [1] "character"
```
]

---

# `data_frame`s

.pull-left45[

* Data frames (and its variants, e.g., **tibble**s) are R's **main data format**.

* **Data frames are lists** with specific **requirements**:

+ Every element must be a vector.

+ The lengths of the vectors must be equal (or multiples of another).

* Use `data_frame()` and `as_data_frame` to create or to coerce to data frame, to `tibble` to be exact.
]

.pull-right45[
<img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/data.frame.png" width="409px" height="266px">
]

---

# Accessing & changing data frames (and lists)

Data frames (and lists) are best accessed using **names** and the `$`-operator. This, of course, implies that you followed good practice and named the individual elements in the data object.

```r
# define data frame
my_df <- data_frame('v_1' = c('A', 'B'), 'v_2' = c(1, 2))
```

.pull-left45[

```r
# One bad, two correct ways to subset
my_df[1] ; my_df[[1]] ; my_df[['v_1']]
```

```
## # A tibble: 2 x 1
## v_1 
## <chr>
## 1 A 
## 2 B
```

```
## [1] "A" "B"
```

```
## [1] "A" "B"
```
]

.pull-right45[

```r
# Best use $-operator to access
my_df$v_1
```

```
## [1] "A" "B"
```

```r
# and change
my_df$v_1 <- c('Y', 'Z')
my_df$v_1
```

```
## [1] "Y" "Z"
```
]

<!---

.pull-left45[

```r
# retrieve element 'a' from vector
my_vec <- c(a = 1, b = 4, c = 5)
my_vec['a'] 
```

```
## a 
## 1
```

```r
# change the element 
my_vec['c'] <- 'D' 
 
# change beyond length(my_vec) 
my_vec['d'] <- 1 ; my_vec
```

```
##   a   b   c   d 
## "1" "4" "D" "1"
```

]

.pull-right45[

```r
# create matrix
my_mat <- matrix(c(1:6), nrow=2)
colnames(my_mat) <- c('v_1','v_2','v_3')
rownames(my_mat) <- c('c_1','c_2')

# retrieve second row from matrix
my_mat['c_1', ] ; my_mat['c_1', 'v_2']
```

```
## v_1 v_2 v_3 
##   1   3   5
```

```
## [1] 3
```
]

# Accessing & changing **complex** objects pt. 1

In accessing and changing complex objects the additional `list`-layer needs to be taken into account. Single brackets `[` will select elements within the list, not the object behind those elements. To select the object behind the element use **double brackets** '[['. Additionaly, complex objects can be conveniently accessed using the **dollor operator `$`**. In order to further descend into the `list`'s structur append **multiple select operators**, e.g., `my_list[[1]][[2]]`.

.pull-left45[

```r
# retrieve elements from list
my_list <- list('A'=c('A','B'), 
 'B'=list(c(1,2,3),
 c(TRUE,FALSE,TRUE)))
my_list[1] ; my_list[[1]] ; my_list[['A']]
```

```
## $A
## [1] "A" "B"
```

```
## [1] "A" "B"
```

]

.pull-right45[

```r
# retrieve deep elements in list
my_list <- list('A'=c('A','B'), 
 'B'=list(c(1,2,3),
 c(TRUE,FALSE,TRUE)))
my_list[[2]][1] ; my_list[[2]][[1]] # etc
```

```
## [[1]]
## [1] 1 2 3
```

```
## [1] 1 2 3
```
]

# Accessing & changing **complex** objects pt. 2

Data frames can be accessed **exactly like lists**. In addition, data frames allow for a matrix-like access using **single bracket** `[`. Note however that selecting rows using single bracket returns a data frame, whereas for selecting columns returns a vector.

.pull-left45[

```r
# retrieve elements from list
my_df <- data.frame('v_1'=c('A','B','C'), 
 'v_2'=c(1,2,3))
my_df[1] ; my_df[[1]] ; my_df[['v_1']]
```

```
##   v_1
## 1   A
## 2   B
## 3   C
```

```
## [1] A B C
## Levels: A B C
```

```
## [1] A B C
## Levels: A B C
```
]

.pull-right45[

```r
# retrieve elements from list
my_df <- data.frame('v_1'=c('A','B','C'), 
 'v_2'=c(1,2,3))
my_df[1,] ; my_df[,1] ; my_df[1,2]
```

```
##   v_1 v_2
## 1   A   1
```

```
## [1] A B C
## Levels: A B C
```

```
## [1] 1
```
]

--->

---

# Example

```r
c(22, 45, 32, 18, 19, 24)
age <- c(22, 45, 32, 18, 19, 24)

as.character(age)
age <- as.character(age)

mean(age)
```

---

# Functions

Functions are objects that conduct operations on objects using objects. Functions have 3 elements:
+ **Name**: Used to call (execute) the function.
+ **Argument(s)**: Objects that the function needs to do its job. May have *default arguments*.
+ **Expression**: R code that does the job.
    
.pull-left5[

```r
# Defining a function that computes 
# the mean or median
my_stat <- function(x, method = 'mean'){
 
 # detect and run method
 if(method == "mean") return(mean(x))
 if(method == "median") return(median(x))
}

# Define object
my_vec <- c(1, 4, 6, 3, 7, 5, 12, 9)
```
]

.pull-right45[

```r
# Runnning our functions
mean(x = my_vec)
```

```
## [1] 5.875
```

```r
my_stat(x = my_vec, method = 'mean')
```

```
## [1] 5.875
```

```r
my_stat(x = my_vec, method = 'median')
```

```
## [1] 5.5
```
]

---

# Help

.pull-left5[

**help files** (and **vignettes**) are very useful.

Pay attention to...

- `Usage` - shows function's use, its arguments and their defaults.
- `Arguments` - explains arguments, and their `type`/`class`
- `Value` - explains what the function returns
- `Examples` -

```r
# To access help files
?name_of_function

# search help files
??name_of_function
```

]

.pull-right45[
<img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/mean_help.png" width="500">

]

---

# The almighty **tidyverse**

Among its many packages, R contains a collection of high-performance, easy-to-use packages (libraries) designed specifically for handling data know as the [tidyverse](https://www.tidyverse.org/). The tidyverse includes:
1. `ggplot2` -- creating graphics.
2. `dplyr` -- data manipulation.
3. `tidyr` -- tidying data.
4. `readr` -- read wild data.
5. `purrr` -- functional programming.
6. `tibble` -- modern data frame.

---

.pull-left4[
# How do we wrangle data in R?

Answer: dplyr

Anytime you want to transform, organize, or aggregate data, use `dplyr`

]

.pull-right5[

]

---

.pull-left35[
# dplyr

`dplyr` is a combination of 3 things:

1. **`objects`** like dataframes
2. **`functions`** that **do** things to objects.
3. **`pipes`** `%>%` that string together objects and verbs

]

.pull-right6[
 
## The pipe %>%
<br2>

`dplyr` makes extensive use of the 'pipe' `%>%` which passes objects between functions.

```r
data %>%    # Start with data, AND THEN...
  FUN1 %>% # Do FUN1, AND THEN...
  FUN2 %>% # Do FUN2, AND THEN...
  FUN3 %>% # Do FUN3, AND THEN...
  group_by(x, y) %>%  # Group by variables x, y
    summarise(
      VAR_A_New = FUN4(X),
      VAR_B_New = FUN5(Y),
      VAR_C_New = FUN6(Z),
    )
  )
```

]

---

## There are tons of statistical packages in R!

.pull-left35[

| Package| Models|
|------:|:----|
|     `afex`|   Anovas|
|     `rpart`|    Decision Trees|
|     `lme4`|   Mixed effects regression|
|     `BayesFactor`| Bayesian statistics|
|     `igraph`| Cluster analyses|
|     `neuralnet`| Neural networks|

]

.pull-right6[

]

---

# Statistics

.pull-left5[

| Test| R Function|
|:------|:----|
|     `formula`|The dependent variable and one or more independent variables|
|     `data`|  The dataframe containing variables|
|   `subset`  | Optional subset of data to test |

```r
trial_X
```

```
##   id sex age arm y_primary y_secondary
## 1  1   m  35   1        50           1
## 2  2   f  42   2        78           1
## 3  3   f  24   1        46           0
## 4  4   m  56   2        97           1
## 5  5   f  49   1        74           1
```

]

.pull-right45[

```r
TEST(formula = Y ~ X1 + X2 + ...,
     data = DATA,
     subset = SUBSET,
     ...)
```

### Examples

```r
# T-test
# DV = y_primary, IV = arm
t.test(formula = y_primary ~ arm,
       data = trial_X)

# Regression
# DV = y_secondary, IV = arm, sex, age
# Subset = only id > 50
glm(formula = y_secondary ~ arm + sex + age,
    data = trial_X,
    subset = id > 50)
```

]

---

.pull-left35[

## Exploring statistical objects

Assign the results of a statistical function to a new object, then use *generic* functions such as `print()`, `summary()`, `predict()` and `plot()` to explore it.

```r
# Create statistical object
my_obj <- FUN(formula = ...,
 data = ...)

names(my_obj)       # Elements
print(my_obj)       # Print
summary(my_obj)     # Summary
plot(my_obj)        # Plotting
predict(my_obj, ..) # Predict
```

]

.pull-right6[

```r
# Create a glm object
chick_glm <- glm(formula = weight ~ Time,
 data = ChickWeight)

# Print the object
names(chick_glm)
```

```
##  [1] "coefficients"      "residuals"         "fitted.values"     "effects"           "R"                
##  [6] "rank"              "qr"                "family"            "linear.predictors" "deviance"         
## [11] "aic"               "null.deviance"     "iter"              "weights"           "prior.weights"    
## [16] "df.residual"       "df.null"           "y"                 "converged"         "boundary"         
## [21] "model"             "call"              "formula"           "terms"             "data"             
## [26] "offset"            "control"           "method"            "contrasts"         "xlevels"
```

```r
## Access elements
chick_glm$coefficients
```

```
## (Intercept)        Time 
##      27.467       8.803
```

]

---

# What is machine learning?

.pull-left6[

### Algorithms autonomously learning from data.

Given data, an algorithm tunes its *parameters* to match the data, understand how it works, and make predictions for what will occur in the future.

]

.pull-right4[

]

---
# How do I do machine learning in R?

.pull-left6[

In principle, it's a lot like evaluating statistical functions.

Install the necessary package, evaluate the model function, explore the results.

```r
# Create training and test data
data_train <- ...
data_test <- ...

# Train models on training data
model_A <- A_fun(formula = y ~ ., 
 data = data_train)

# Model A predictions
pred_A <- predict(model_A, 
 newdata = data_test)

# Calculate Model A error
pred_err_A <- mean(abs(pred_A - data_test$y))

# Compare to Models B, C, D...
```

]

.pull-right35[

If you're really into machine learning, the `caret` package can automate much of the the machine learning process.

]

---
# Today

<a href="https://therbootcamp.github.io/BaselRBootcamp_2018April/schedule">Schedule</a>