Recap I

class: center, middle, inverse, title-slide

# Recap I
### The R Bootcamp Twitter: <a href='https://twitter.com/therbootcamp'>@therbootcamp</a>
### April 2018

---

# Essentials of the R language

.pull-left5[
>"To understand computations in R, two slogans are helpful:
 
>###(1) Everything that exists is an object and 
>###(2) everything that happens is a function call."
]

.pull-right5[
<img src="https://statweb.stanford.edu/~jmc4/CopyPhoto.jpg" width="350" align="center">
John Chambers Author of S and developer of R 
<a href="https://statweb.stanford.edu/~jmc4/">statweb.stanford.edu</a>

]

---

.pull-left45[
# Objects
>###"Everything in R is an object" 
>*John Chambers*

+ R's objects are have **content and attributes**.
<br2>
+ The **content can be anything** from numbers or strings to functions or complex data structures. 
<br2>
+ Attributes often encompass **names**, **dimensions**, and the **class or type** of the object, but other attributes are possible. 
<br2>
+ Practically all data objects are equipped with those **three essential attributes**.

]

.pull-right5[
 
<img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/objects.png" align="center" width="579" height="560">
]

---

# Accessing & changing vectors

To access (aka **subset** or **slicing**) and change atomic data objects use **brackets** `[]` and provide either **integers**, **logicals**, or **names** to indicate the relevant vector content. To change content, assign new content of matching size to subset using ´<-´.

.pull-left45[

```r
# retrieve second element from vector
my_vec <- c('A', 'B', 'C')
my_vec[2] 
```

```
## [1] "B"
```

```r
# change the second element 
my_vec[2] <- 'D'
my_vec
```

```
## [1] "A" "D" "C"
```

]

.pull-right45[

```r
# Use logical comparison to access vector
my_vec[my_vec != 'A']
```

```
## [1] "D" "C"
```

```r
# Change vector using logical comparison
my_vec[my_vec != 'A'] <- c('E', 'F')
my_vec
```

```
## [1] "A" "E" "F"
```
]
---

# Functions

Functions are objects that conduct operations on objects using objects. Functions have 3 elements:
+ **Name**: Used to call (execute) the function.
+ **Argument(s)**: Objects that the function needs to do its job. May have *default arguments*.
+ **Expression**: R code that does the job.
    
.pull-left5[

```r
# Defining a function that computes 
# the mean or median
my_stat <- function(x, method = 'mean'){
 
 # detect and run method
 if(method == "mean") return(mean(x))
 if(method == "median") return(median(x))
}

# Define object
my_vec <- c(1, 4, 6, 3, 7, 5, 12, 9)
```
]

.pull-right45[

```r
# Runnning our functions
mean(x = my_vec)
```

```
## [1] 5.875
```

```r
my_stat(x = my_vec, method = 'mean')
```

```
## [1] 5.875
```

```r
my_stat(x = my_vec, method = 'median')
```

```
## [1] 5.5
```
]

---

# Example

```r
c(22, 45, 32, 18, 19, 24)
age <- c(22, 45, 32, 18, 19, 24)

as.character(age)
age <- as.character(age)

mean(age)
```

---

# Help

.pull-left5[

**help files** (and **vignettes**) are very useful.

Pay attention to...

- `Usage` - shows function's use, its arguments and their defaults.
- `Arguments` - explains arguments, and their `type`/`class`
- `Value` - explains what the function returns
- `Examples` -

```r
# To access help files
?name_of_function

# search help files
??name_of_function
```

]

.pull-right45[
<img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/mean_help.png" width="500">

]

---

# Importing and Exporting Data

In this session you will learn...

.pull-left45[
1. How to import data data from **delimeter separated files** (e.g., .csv)?
<br2>
2. How to import data data from **proprietory file formats** (e.g., .sav)?
<br2>
3. How to save/export data to various formats, including **R's own files types**?
<br2>
4. How to use **file connections** to read data in its rawest possible way? 
<br2>
5. About a new data format called **tibble**.
]

.pull-right45[
<img src="http://d33wubrfki0l68.cloudfront.net/66d3133b4a19949d0b9ddb95fc48da074b69fb07/7dfb6/images/hex-readr.png" width="150">
<img src="http://d33wubrfki0l68.cloudfront.net/f55c43407ae8944b985e2547fe868e5e2b3f9621/720bb/images/hex-tibble.png" width="150">
 
<img src="http://haven.tidyverse.org/logo.png" width="150">
<img src="https://www.rstudio.com/wp-content/uploads/2017/05/readxl-259x300.png" width="150">

]

---

# An example

Assume we have a *flat* data set with variables `id`, `var_1`, and `var_2` and cases as rows. Such data can be conveniently read in using `read_csv()`. Moreover, `read_csv()` will **automatically identify** (a) columns and rows, (b) column names, (c) the type of the columns and finally return a `tibble` (more on that later).

.pull-left45[

```r
# This is how a text file may look
# on your hard-drive

id\tvar_1\tvar_2\n
DCDL\t.287\t.048\n
FEFK\t.894\t.383\n
ZEWE\t1.374\t.623\n
OJEE\t.631\t.826"
```
]

.pull-right45[

```r
# read in data (-> tibble)
require(readr)
read_delim("data/my_dataset.csv")
```

```
## Loading required package: readr
```

```
## # A tibble: 4 x 3
## id var_1 var_2
## <chr> <dbl> <dbl>
## 1 DCDL 0.287 0.0480
## 2 FEFK 0.894 0.383 
## 3 ZEWE 1.37 0.623 
## 4 OJEE 0.631 0.826
```
]

---

# `tibble`s

The **output** from most `tidyverse` read functions such as `read_csv` and the preferred data format for many (but not all) analyses is a `tibble`. `tibble`s are a **modern, leaner version of data.frames**.

.pull-left45[
### tibbles ...
+ never change the input's type -> no factors.
<br2>
+ never add row names.
<br2>
+ never change column names.
<br2>
+ look better in `print`. 
<br2>
+ are accessed more consistently.
]

.pull-right45[
### Functions

```r
# create tibble
my_data <- tibble(id, var_1, var_2)

# convert to and from data.frame
as_tibble(my_data_frame)
as.data.frame(my_tibble)
```

```
## # A tibble: 4 x 3
## id var_1 var_2
## <chr> <dbl> <dbl>
## 1 DCDL 0.287 0.0480
## 2 FEFK 0.894 0.383 
## 3 ZEWE 1.37 0.623 
## 4 OJEE 0.631 0.826
```
]

---

# The almighty **tidyverse**

Among its many packages, R contains a collection of high-performance, easy-to-use packages (libraries) designed specifically for handling data know as the [tidyverse](https://www.tidyverse.org/). The tidyverse includes:
1. `ggplot2` -- creating graphics.
2. `dplyr` -- data manipulation.
3. `tidyr` -- tidying data.
4. `readr` -- read wild data.
5. `purrr` -- functional programming.
6. `tibble` -- modern data frame.