class: center, middle, inverse, title-slide # Objects & Functions ### The R Bootcamp
www.therbootcamp.com
@therbootcamp
### July 2018 --- layout: true <div class="my-footer"><span> <a href="https://therbootcamp.github.io/"><font color="#7E7E7E">Basel, July 2018</font></a>                                           <a href="https://therbootcamp.github.io/"><font color="#7E7E7E">www.therbootcamp.com</font></a> </span></div> --- # 3 Object types for data .pull-left4[ R has 3 main data objects... <high>`list`</high> - R's multi-purpose container - Can carry any data, incl. lists - Often used for function outputs <high>`data_frame`</high> - R's spreadsheet - Specific type of `list` - Typical data format - For multi-variable data sets <high>`vectors`</high> - R's data container - Actually carries the data - Contain data of 1 of many types ] .pull-right55[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/main_objects.png"></img> ] --- # `list` .pull-left45[ <br><br><br> 1 - Can <high>carry any data</high>, incl. `list`s, `data_frame`s, `vector`s, etc. <br><br> 2 - Are often used for <high>function outputs</high> <br><br> 3 - Have <high>named elements</high>. <br><br> 4 - Elements can be <high>inspect</high>ed via `names()` or `str()`. <br><br> 5 - Elements are (typically) <high>select</high>ed by `$`. ] .pull-right5[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/list.png"></img> ] --- # `list`: Select element using <high>`$`</high> .pull-left45[ ```r # regression reg_model <- lm(height ~ sex + age, data = baselers) reg_results <- summary(reg_model) # get element names names(reg_results) ``` ``` ## [1] "call" "terms" ## [3] "residuals" "coefficients" ## [5] "aliased" "sigma" ## [7] "df" "r.squared" ``` ```r # select element using $ reg_results$coefficients ``` ``` ## Estimate t value ## (Intercept) 164.171266 499.5339 ## sexmale 13.993699 66.4724 ## age -0.003753 -0.5819 ``` ] .pull-right5[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/list.png"></img> ] --- .pull-left45[ # `data_frame` <br><br> 1 - Are `list`s containing <high>`vector`s of equal length</high> representing the variables. <br><br> 2 - Contain `vector`s of different types: `numeric`, `character`, etc. <br><br> 3 - Have named elements. <br><br> 4 - Elements can be <high>inspect</high>ed via `names()`, `str()`, `print()`, `View()`, or `skimr::skim()`. <br><br> 5 - Elements are (typically) <high>select</high>ed by `$`. <br><br> 6 - Come in different flavors: `data.frame()`, `data.table()`, `tibble()`. ] .pull-right45[ <br><br><br> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/data_frame.png"></img> ] --- .pull-left45[ # Inspect content ```r # inspect baselers via print baselers ``` ``` ## # A tibble: 10,000 x 20 ## id sex age height weight ## <int> <chr> <int> <dbl> <dbl> ## 1 1 male 44 174. 113. ## 2 2 male 65 180. 75.2 ## 3 3 female 31 168. 55.5 ## 4 4 male 27 209 93.8 ## 5 5 male 24 177. NA ## 6 6 male 63 187. 67.4 ## 7 7 male 71 152. 83.3 ## 8 8 female 41 156. 67.8 ## 9 9 male 43 176. 69.3 ## 10 10 female 31 166. 66.3 ## # ... with 9,990 more rows, and 15 more ## # variables ``` ] .pull-right45[ <br><br><br> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/data_frame.png"></img> ] --- .pull-left45[ # Inspect content ```r # inspect baselers via print View(baselers) ``` <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/view.png"></img> ] .pull-right45[ <br><br><br> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/data_frame.png"></img> ] --- .pull-left45[ # Select via <high>`$`</high> ```r # select age variable baselers$age ``` ``` ## [1] 44 65 31 27 24 63 71 41 43 31 42 31 ## [13] 38 49 39 54 78 62 88 74 ``` ```r # select age variable baselers$education ``` ``` ## [1] "SEK_III" ## [2] "obligatory_school" ## [3] "SEK_III" ## [4] "SEK_III" ## [5] "SEK_III" ## [6] "SEK_III" ## [7] "SEK_III" ## [8] "SEK_III" ## [9] "apprenticeship" ## [10] "SEK_II" ``` ] .pull-right45[ <br><br><br> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/data_frame.png"></img> ] --- .pull-left45[ # Change/Add via <high>`$`</high> ```r # compute age in months baselers$age <- baselers$age * 2 # inspect baselers baselers ``` ``` ## # A tibble: 10,000 x 20 ## id sex age height weight ## <int> <chr> <dbl> <dbl> <dbl> ## 1 1 male 88 174. 113. ## 2 2 male 130 180. 75.2 ## 3 3 female 62 168. 55.5 ## 4 4 male 54 209 93.8 ## 5 5 male 48 177. NA ## 6 6 male 126 187. 67.4 ## 7 7 male 142 152. 83.3 ## 8 8 female 82 156. 67.8 ## 9 9 male 86 176. 69.3 ## 10 10 female 62 166. 66.3 ## # ... with 9,990 more rows, and 15 more ## # variables ``` ] .pull-right45[ <br><br><br> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/data_frame.png"></img> ] --- .pull-left45[ # Tidy data 1 - Each variable you measure should be in one column. 2 - Each different observation of that variable should be in a different row. 3 - There should be one table for each "kind" of variable. 4 - If you have multiple tables, they should include a column in the table that allows them to be linked. <br><br> see <a href="http://worldpece.org/sites/default/files/datastyle.pdf">The Elements of Data Analytic Style</a> by Jeff Leek ] .pull-right45[ <br><br><br> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/data_frame.png"></img> ] --- # `vector` .pull-left45[ 1 - R's <high>basic and, in a way, only data container</high>. <br><br> 2 - Can contain only a <high>single type of data</high> and missing values. <br><br> 3 - Data types   <high>`numeric`</high> - All numbers<br>   <high>`character`</high> - All characters (e.g., names)<br>   <high>`logical`</high> - `TRUE` or `FALSE`<br>   ...<br>   <high>`NA`</high> - missing values<br> ] .pull-right4[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/vector.png"></img> ] --- # Select/Change/(Add) via `[ ]` .pull-left45[ ```r # extract vector containing age age <- baselers$age age ``` ``` ## [1] 88 130 62 54 48 126 142 82 86 ``` ```r # select value age[2] ``` ``` ## [1] 130 ``` ```r # change value age[2] <- 2 age ``` ``` ## [1] 88 2 62 54 48 126 142 82 86 ``` <br> Find more info on indexing [here](http://rspatial.org/intr/rst/4-indexing.html). ] .pull-right4[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/vector.png"></img> ] --- # Data types: `numeric` .pull-left45[ `numeric` vectors are used to store numbers and only numbers. ```r baselers$age ``` ``` ## [1] 88 130 62 54 48 126 142 82 86 ``` ```r # evaluate type typeof(baselers$age) ``` ``` ## [1] "double" ``` ```r is.numeric(baselers$age) ``` ``` ## [1] TRUE ``` ] .pull-right4[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/vector_types.png"></img> ] --- # Data types: `character` .pull-left45[ `character` vector are used to store data represented by <high>letters and symbols, and all other data</high>. ```r baselers$sex ``` ``` ## [1] "male" "male" "female" "male" ## [5] "male" "male" "male" "female" ``` ```r # evaluate type as.character(baselers$age) ``` ``` ## [1] "88" "130" "62" "54" "48" "126" ## [7] "142" "82" "86" ``` ] .pull-right4[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/vector_types.png"></img> ] --- # Data types: `logical` .pull-left45[ `logical` vector are used to <high>*slice* data</high> aka to select elements or rows. `logical` are typically created from other vectors via <high>logical comparisons</high>. ```r baselers$sex == "male" ``` ``` ## [1] TRUE TRUE FALSE TRUE TRUE TRUE ## [7] TRUE FALSE ``` ```r # evaluate type baselers$age < 30 ``` ``` ## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [8] TRUE TRUE ``` ] .pull-right4[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/vector_types.png"></img> ] --- # Data types: `logical` .pull-left45[ `logical` vector are used to <high>*slice* data</high> aka to select elements or rows. `logical` are typically created from other vectors via <high>logical comparisons</high>. <u>Logical operators</u> <high>`==`</high> - is equal to<br> <high>`<`</high>, <high>`>`</high> - smaller/greater than<br> <high>`≤`</high>, <high>`≥`</high> - smaller/greater than or equal<br> <high>`&`</high>, <high>`&&`</high> - logical AND<br> <high>`|`</high>, <high>`||`</high> - logical OR<br> ] .pull-right4[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/vector_types.png"></img> ] --- .pull-left45[ # Object Classes <br><br> 1 - R's objects have <high>content and attributes</high>. <br><br> 2 - Attributes include always <high>names</high>, <high>dimensions</high>, and the <high>class</high> (or type) of the object. <br2> 3 - <high>Classes</high> are critical because they determine <high>when and how they can be used in functions</high>! ] .pull-right45[ <br><br><br> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/object_class.png"></img> ] --- .pull-left4[ # Functions Functions have 3 elements: 1 - <high>Name</high>: Used to refer to the function and call (execute) it. 2 - <high>Arguments</high>: Used to provide (data) inputs and to control what the function does. Arguments with default values (e.g., `use = "everything"`) need not be specified. Arguments without default values (e.g., `x`) need be specified. <high>Inputs must have the appropriate class!</high> 3 - <high>Body</high>: The code that uses the inputs (arguments) to produce the desired output. The code of the functions body is based <high>copies of the inputs</high>, which are named according to the arguments names. ] .pull-right55[ <br><br> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/function.png"></img> ] --- # Documentation .pull-left5[ R documentation (<high>help files</high> and <high>vignettes</high>) will become very to use once you are familiar with the basic R vocabulary. Pay attention to... <high>Usage</high> - shows how to use function, its arguments and their defaults.<br><high>Arguments</high> - describes arguments, and their `class`.<br><high>Value</high> - describes what the function returns.<br><high>Examples</high> - provide working R code. ```r # To access help files ?name_of_function # search help files ??name_of_function ``` ] .pull-right5[ ```r ?cor ``` <p align="center"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/help_cor.png" width="500"></p> ] --- # Practical <p> <font size=6> <a href="https://therbootcamp.github.io/BaselRBootcamp_2018July/_sessions/Objects/Objects_practical.html"><b>Link to practical<b></a> </font> </p>