class: center, middle, inverse, title-slide # Recap ### Introduction to Data Science with R
www.therbootcamp.com
@therbootcamp
### October 2018 --- layout: true <div class="my-footer"><span> <a href="https://therbootcamp.github.io/"><font color="#7E7E7E">Introduction to Data Science with R, October 2018</font></a>                           <a href="https://therbootcamp.github.io/"><font color="#7E7E7E">www.therbootcamp.com</font></a> </span></div> --- # Essentials: The 2<sup>4</sup> Lessons of the R Bootcamp .pull-left4[ 1. Everything is an object 2. Use `<-` to create/change objects 3. Name objects using `_` 4. Objects have classes 5. Everything happens through functions 6. Functions have (default) arguments 7. Functions expect certain object classes 8. View help files using `?` 9. Study errors and warnings 10. Data is stored in data frames 11. Select variables (vectors) using `$` 12. Use RStudio and projects 13. Use editor and shortcuts 14. First load packages and data 15. Use auto-complete 16. Comment and format for readability ] --- # 3 Object types for data .pull-left4[ R has 3 main data objects... <high>`list`</high> - R's multi-purpose container - Can carry any data, incl. lists - Often used for function outputs <high>`data_frame`</high> - R's spreadsheet - Specific type of `list` - Typical data format - For multi-variable data sets <high>`vectors`</high> - R's data container - Actually carries the data - Contain data of 1 of many types ] .pull-right55[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/main_objects.png"></img> ] --- .pull-left4[ # Functions Functions have 3 elements: 1 - <high>Name</high>: Used to refer to the function and call (execute) it. 2 - <high>Arguments</high>: Used to provide (data) inputs and to control what the function does. Arguments with default values (e.g., `use = "everything"`) need not be specified. Arguments without default values (e.g., `x`) need be specified. <high>Inputs must have the appropriate class!</high> 3 - <high>Body</high>: The code that uses the inputs (arguments) to produce the desired output. The code of the functions body is based <high>copies of the inputs</high>, which are named according to the arguments names. ] .pull-right55[ <br><br> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/function.png"></img> ] --- # Documentation .pull-left5[ R documentation (<high>help files</high> and <high>vignettes</high>) will become very to use once you are familiar with the basic R vocabulary. Pay attention to... <high>Usage</high> - shows how to use function, its arguments and their defaults.<br><high>Arguments</high> - describes arguments, and their `class`.<br><high>Value</high> - describes what the function returns.<br><high>Examples</high> - provide working R code. ```r # To access help files ?name_of_function # search help files ??name_of_function ``` ] .pull-right5[ ```r ?cor ``` <p align="center"><img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/help_cor.png" width="500"></p> ] --- # Data Input/Output .pull-left45[ Raw data can come in many shapes and sizes, but <high>R's got you covered</high>. <br><br> .pull-left65[ | Package | Description| |:----------------|:-------------------------------------------------------| | `readr`| `.csv`, `.txt`, etc. | | `haven`| `.sav`, `.sas7bdat`, `.dta` | | `readxl` | `.xls`, `.xlsx` | | `R.matlab` | `.mat` | | `jsonlite` | `.json` | | `rvest` | `.html` | | `XML`, `xml2` | `.xml` | ] ] .pull-right45[ <img src="http://blog.datasift.com/wp-content/uploads/2014/10/ms-files-3.jpg"> ] --- # Finding the file path .pull-left4[ 1 - Identify the file path using the <high>auto-complete</high>. 2 - Initiate auto-complete and browse through the folder structure by placing the cursor between two quotation marks and using the <high>tab key</high>. <p align="center"> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/tab.png" height="80px"></img> </p> 3 - Auto-complete begins with the project folder - <high>place your data inside your project folder!</high> ] .pull-right55[ <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/load_baselers_ss.jpg"></img> ] --- .pull-left45[ # What is wrangling? <font size = 5><high>Transform</high></font> Change column names Add new columns <font size = 5><high>Organise</high></font> Sort data by columns Merging data from two separate dataframes Move data between columns and rows <font size = ><high>Aggregate</high></font> Group data and summarise ] .pull-right5[ <br> <p align="center"> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/organise_transform_aggregate.png" height = "550px"> </p> ] --- # The Pipe! <high>`%>%`</high> .pull-left4[ `dplyr` makes extensive use of a new operator called the "Pipe" <high>`%>%`</high><br> Read the "Pipe" <high>`%>%`</high> as "And Then..." <br> ```r # Start with data data %>% # AND THEN... DO_SOMETHING %>% # AND THEN... DO_SOMETHING %>% # AND THEN... DO_SOMETHING %>% # AND THEN... ``` ] .pull-right55[ <p align="center"> <img src="https://upload.wikimedia.org/wikipedia/en/thumb/b/b9/MagrittePipe.jpg/300px-MagrittePipe.jpg" width = "450px"><br> This is not a pipe (but %>% is!) </p> ] --- .pull-left35[ <br><br><br><br><br><br><br><br><br> <p align="center"> <font size=8><hfont><high>Questions?</high></hfont></font><br> <font size = 4><a href = "https://therbootcamp.github.io/Intro2DataScience_2018Oct/">Link to schedule</a></font> </p> ] .pull-right6[ <img src="https://github.com/therbootcamp/therbootcamp.github.io/blob/master/_sessions/_image/schedule_2018Oct_Intro2DS.png?raw=true" height="580" align="center"> ]