class: center, middle, inverse, title-slide # Nächste Schritte ### Einführung in die moderne Datenanalyse mit R
Basel R Bootcamp
### November 2019 --- layout: true <div class="my-footer"> <span style="text-align:center"> <span> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/> </span> <a href="https://therbootcamp.github.io/"> <span style="padding-left:82px"> <font color="#7E7E7E"> www.therbootcamp.com </font> </span> </a> <a href="https://therbootcamp.github.io/"> <font color="#7E7E7E"> Einführung in die moderne Datenanalyse mit R | November 2019 </font> </a> </span> </div> --- .pull-left45[ <br><br><br><br><br> # Hallo Base R Experten Nach einem Nachmittag kennt ihr nun die Grundlagen von R! ] .pull-right4[ <img src="image/schedule.png" height="580" align="center"> ] --- # R kann viel mehr .pull-left5[ <font size = 6> 1. `tidyverse` Kern<br> 2. `tidyverse` erweitert<br> 3. Statistik<br> 4. Machinelles lernen<br> 5. Reporting<br> </font> ] .pull-right5[ <p align="center"><img border="0" alt="W3Schools" src="https://www.oreilly.com/library/view/the-art-of/9781593273842/httpatomoreillycomsourcenostarchimages915868.png.jpg" width="400px"></p> ] --- # `tidyverse` Kern Das [`tidyverse`](https://www.tidyverse.org/) ist im Kern eine Sammlung hoch-performanter, nutzerfreundlicher Pakete, die speziell für eine effizientere Datenanalyse entwickelt wurden. 1. `ggplot2` für Grafiken. 2. `dplyr` für Datenverarbeitung. 3. `tidyr` für Datenverarbeitung. 4. `readr` für Daten I/O. 5. `purrr` für funktionales Programmieren. 6. `tibble` für moderne `data.frame`s. <br><br> <img src="http://d33wubrfki0l68.cloudfront.net/0ab849ed51b0b866ef6895c253d3899f4926d397/dbf0f/images/hex-ggplot2.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/071952491ec4a6a532a3f70ecfa2507af4d341f9/c167c/images/hex-dplyr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/5f8c22ec53a1ac61684f3e8d59c623d09227d6b9/b15de/images/hex-tidyr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/66d3133b4a19949d0b9ddb95fc48da074b69fb07/7dfb6/images/hex-readr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/9221ddead578362bd17bafae5b85935334984429/37a68/images/hex-purrr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/f55c43407ae8944b985e2547fe868e5e2b3f9621/720bb/images/hex-tibble.png" height="200px" /> --- # `ggplot2` .pull-left45[ ```r library(tidyverse) ; library(ggrepel) # Lade Tourismus Daten tour <- read_csv('1_Data/Tourismus.csv') # Erstelle Plot mit ggplot2 ggplot(data = tour, mapping = aes(x = Besucher, y = Dauer, label = Land)) + scale_x_continuous(trans = 'log2') + geom_point(size=2) ``` ] .pull-right45[ ![](NächsteSchritte_files/figure-html/unnamed-chunk-3-1.png)<!-- --> ] --- # `ggplot2` .pull-left45[ ```r library(tidyverse) ; library(ggrepel) # Lade Tourismus Daten tour <- read_csv('1_Data/Tourismus_18.csv') # Plotte Dauer gegen Besucher ggplot(data = tour, mapping = aes(x = Besucher, y = Dauer, label = Land)) + scale_x_continuous(trans = 'log2') + geom_point(size=2) + geom_label_repel(size = 2, label.padding = 0.1) + theme_bw() ``` ] .pull-right45[ ![](NächsteSchritte_files/figure-html/unnamed-chunk-5-1.png)<!-- --> ] --- # `ggplot2` .pull-left45[ ```r library(tidyverse) ; library(ggrepel) # Lade Tourismus Daten tour <- read_csv('1_Data/Tourismus.csv') # Plotte Dauer gegen Besucher ggplot(data = tour, mapping = aes(x = Besucher, y = Dauer, label = Land)) + scale_x_continuous(trans = 'log2') + geom_point(size=2) + geom_label_repel(size = 2, label.padding = 0.1) + theme_bw() + facet_grid(Region ~ .) ``` ] .pull-right45[ ![](NächsteSchritte_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ] --- # `dplyr` .pull-left45[ ```r library(tidyverse) # Lade Tourismus Daten tour <- read_csv('1_Data/Tourismus.csv') # Zeige Top 10 Länder tour %>% mutate(Nächte = Besucher * Dauer) %>% arrange(desc(Nächte)) %>% select(Land, Nächte) %>% top_n(10) ``` ] .pull-right45[ ``` ## # A tibble: 10 x 2 ## Land Nächte ## <chr> <dbl> ## 1 Deutschland 18059 ## 2 Vereinigte Staaten 15353 ## 3 Vereinigtes Königreich 7981. ## 4 Frankreich 5288 ## 5 Italien 3224 ## 6 Spanien 2294 ## 7 Niederlande 2069 ## 8 Kanada 2006 ## 9 Österreich 1683 ## 10 Indien 1622 ``` ] --- # `dplyr` .pull-left45[ ```r library(tidyverse) # Lade Tourismus Daten tour <- read_csv('1_Data/Tourismus.csv') # Berechne Nächte per Region tour %>% mutate(Nächte = Besucher * Dauer) %>% group_by(Region) %>% summarize( Nächte_mittel = mean(Nächte), Nächte_summe = sum(Nächte), ) ``` ] .pull-right45[ ``` ## # A tibble: 5 x 3 ## Region Nächte_mittel Nächte_summe ## <chr> <dbl> <dbl> ## 1 Afrika 259 1036 ## 2 Amerika 2435. 19479 ## 3 Asien 464. 9278 ## 4 Australien 580 1160 ## 5 Europa 1415. 52353 ``` ] --- # `dplyr` .pull-left45[ ```r library(tidyverse) ; library(ggrepel) # Lade Tourismus Daten tour <- read_csv('1_Data/Tourismus.csv') europa <- read_csv('1_Data/Europa.csv') # Verbinde Nächte mit Äquivalenzeinkommen tour %>% mutate(Nächte = Besucher * Dauer) %>% left_join(europa) %>% ggplot(aes(x = Äquivalenzeinkommen, y = Nächte, label = Land)) + scale_y_continuous(trans = 'log2') + geom_point() + geom_label_repel(size = 2) + theme_bw() ``` ] .pull-right45[ ![](NächsteSchritte_files/figure-html/unnamed-chunk-13-1.png)<!-- --> ] --- # `tidyverse` erweitert Das Umfeld des [`tidyverse`](https://www.tidyverse.org/) beinhaltet eine Sammlung weiterer hoch-performanter, nutzerfreundlicher Pakete, die den tidyverse Kern ergänzen. 1. `xlm2` für die Verarbeitung von XML und HTML Dateien. 2. `rvest` für Web Scraping. 3. `haven` für SPSS, SAS, und Stata Dateien. 4. `readxl` für Excel Dateien. 5. `lubridate` für Zeitvariablen. 6. `tidytext` für Textverarbeitung. <br><br> <img src="https://github.com/rstudio/hex-stickers/raw/master/PNG/rvest.png" height="200px" /><img src="https://github.com/rstudio/hex-stickers/raw/master/PNG/haven.png" height="200px" /><img src="https://github.com/rstudio/hex-stickers/raw/master/PNG/readxl.png" height="200px" /><img src="https://github.com/rstudio/hex-stickers/raw/master/PNG/lubridate.png" height="200px" /><img src="https://pbs.twimg.com/media/DeacLnGU0AAlmS9.png" height="200px" /> --- # Web Scraping ```r # Tabelle laden von Wikipedia (Pakete nicht vergessen) read_html("https://en.wikipedia.org/wiki/R_(programming_language)") %>% html_node(xpath = '//*[@id="mw-content-text"]/div/table[2]') %>% html_table() %>% as_tibble() ``` ``` ## # A tibble: 17 x 3 ## Release Date Description ## <chr> <chr> <chr> ## 1 0.16 "" "This is the last alpha version developed primarily by Ihaka and Gen… ## 2 0.49 1997-04-… This is the oldest source release which is currently available on CR… ## 3 0.60 1997-12-… R becomes an official part of the GNU Project. The code is hosted an… ## 4 0.65.1 1999-10-… First versions of update.packages and install.packages functions for… ## 5 1.0 2000-02-… Considered by its developers stable enough for production use.[49] ## 6 1.4 2001-12-… S4 methods are introduced and the first version for Mac OS X is made… ## 7 1.8 2003-10-… Introduced a flexible condition handling mechanism for signalling an… ## 8 2.0 2004-10-… Introduced lazy loading, which enables fast loading of data with min… ## 9 2.1 2005-04-… Support for UTF-8 encoding, and the beginnings of internationalizati… ## 10 2.11 2010-04-… Support for Windows 64 bit systems. ## # … with 7 more rows ``` --- # Textanalyse .pull-left5[ ```r library(tidytext) ; library(wordcloud) library(dplyr) ; library(stringr) # Wörter zählen counts <- read_html(".../R_(programming_language)") %>% html_text() %>% tibble() %>% unnest_tokens(w, ".") %>% filter(!str_detect(w, '[:digit:]')) %>% anti_join(stop_words) %>% count(word, sort = TRUE) %>% top_n(100) # Wordwolke wordcloud(counts$word, counts$n) ``` ] .pull-right4[ ![](NächsteSchritte_files/figure-html/unnamed-chunk-18-1.png)<!-- --> ] --- # Statistik .pull-left45[ ```r library(tidyverse) # Lade Tourismus Daten tour <- read_csv('1_Data/Tourismus.csv') länd <- read_csv('1_Data/Länder.csv') # verbinde Daten data <- tour %>% inner_join(länd) %>% mutate(Nächte = Besucher * Dauer) # Regressionsanalyse model = lm(Nächte ~ Bevölkerung + Dichte + BIP, data = data) ``` ] .pull-right45[ ```r # Bestimmtheitsmass summary(model)$r.squared ``` ``` ## [1] 0.05719 ``` ```r # Koeffizienten summary(model)$coef[,-2] ``` ``` ## Estimate t value Pr(>|t|) ## (Intercept) 1.213e+01 0.01294 0.9897 ## Bevölkerung 2.379e-06 1.36594 0.1775 ## Dichte -3.381e-01 -0.82598 0.4124 ## BIP 2.988e+01 1.50779 0.1373 ``` ] --- .pull-left45[ # Maschinelles lernen ```r library(tidyverse) ; library(rpart) # Lade Tourismus Daten tour <- read_csv('1_Data/Tourismus.csv') länd <- read_csv('1_Data/Länder.csv') # verbinde Daten data <- tour %>% inner_join(länd) %>% mutate(Nächte = Besucher * Dauer) # Regressionsanalyse rpart( formula = Besucher ~ Bevölkerung + Dichte + BIP, data = data) ``` ] .pull-right45[ <br><br> ![](NächsteSchritte_files/figure-html/unnamed-chunk-24-1.png)<!-- --> ] --- # Reporting Werkzeuge R und RStudio bieten auch exzellente Werkzeuge zum erstellen von <high>Berichten</high>, <high>Slides</high>, und sogar <high>Webseiten</high>. 1. `rmarkdown` für dynamische PDF Dokumente. 2. `xaringan` für Slides. 3. `shiny` für Webseiten und Cockpits. <br><br> <img src="https://github.com/rstudio/hex-stickers/raw/master/PNG/rmarkdown.png" height="200px" /><img src="https://github.com/rstudio/hex-stickers/raw/master/PNG/xaringan.png" height="200px" /><img src="https://github.com/rstudio/hex-stickers/raw/master/PNG/shiny.png" height="200px" /> --- .pull-left45[ # `rmarkdown` <p align="left"><img style="height:440px" src="image/markdown.png"></p> ] .pull-right5[ <br><br> <p align="center"><img style="height:510px;box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="image/markdown2.png"></p> ] --- <iframe width="1000" height="600" src="https://shiny.rstudio.com/gallery/movie-explorer.html" frameborder="0" allowfullscreen></iframe> --- # Nächste Schritte .pull-left5[ <font size = 4><i> <font size = 6> 1. Anwenden<br> 2. Bücher<br> 3. Webseiten<br> 4. Hilfe & Consulting<br> 5. Weiterführende Kurse<br> ] </font> <br> .pull-right5[ <p align="center"><img border="0" alt="W3Schools" src="http://www.theunmanageableemployee.com/wp-content/uploads/2011/07/5cs-stones-cropped-11052730.jpg?w=112"></p> ] --- # Bücher Hier ist eine unvollständige Liste guter Bücher über R lose geordnet nach vorausgesetzter Erfahrung.<br><br> <table width="80%" style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td> <a href="http://r4ds.had.co.nz/"><img border="0" alt="W3Schools" src="http://r4ds.had.co.nz/cover.png" height="180"></a> </td> <td> <a href="https://covers.oreillystatic.com/images/0636920028574/cat.gif"><img border="0" alt="W3Schools" src="https://covers.oreillystatic.com/images/0636920028574/cat.gif" height="180"></a> </td> <td> <a href="http://r4ds.had.co.nz/"><img border="0" alt="W3Schools" src="http://r4ds.had.co.nz/cover.png" height="180"></a> </td> <td> <a href="https://www.springer.com/de/book/9783540799979"><img border="0" alt="W3Schools" src="https://images.springer.com/sgw/books/medium/9783540799979.jpg" height="180" ></a> </td> <td> <a href="https://bookdown.org/ndphillips/YaRrr/"><img border="0" alt="W3Schools" src="https://bookdown.org/ndphillips/YaRrr/images/YaRrr_Cover.jpg" height="180" ></a> </td> <td> <a href="https://www.orellfuessli.ch/shop/home/artikeldetails/ID35367941.html?ProvID=10917736&gclid=Cj0KCQiAg_HhBRDNARIsAGHLV5238Q26gQmFttHRnYGjcAhz4CslStb-3qBegvuZS5gnCpWSLNlQvF0aAgfOEALw_wcB"><img border="0" alt="W3Schools" src="https://assets.thalia.media/img/35367941-00-00.jpg" height="180" ></a> </td> </tr> <tr style="background-color:#ffffff"> <td> <a href="http://appliedpredictivemodeling.com/"><img border="0" alt="W3Schools" src="http://static1.squarespace.com/static/51156277e4b0b8b2ffe11c00/t/51157487e4b0b8b2ffe16829/1509217882069/?format=1500w" height="180" ></a> </td> <td> <a href="http://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf"><img border="0" alt="W3Schools" src="https://images-na.ssl-images-amazon.com/images/I/41EaH4W9LVL._SX332_BO1,204,203,200_.jpg" height="180" ></a> </td> <td> <a href="https://www.manning.com/books/deep-learning-with-r"><img border="0" alt="W3Schools" src="https://images-na.ssl-images-amazon.com/images/I/51h5d4dYaoL._SX396_BO1,204,203,200_.jpg" height="180" ></a> </td> <td> <a href="https://csgillespie.github.io/efficientR/"><img border="0" alt="W3Schools" src="https://csgillespie.github.io/efficientR/figures/f0_web.png" height="180" ></a> </td> <td> <a href="www.rcpp.org/"><img border="0" alt="W3Schools" src="http://t3.gstatic.com/images?q=tbn:ANd9GcSO9T6JQYtpQgcaCXudbqMB-fnvTjGowsnmeh9-BQku3zveR4-J" height="180" ></a> </td> <td> <a href="http://adv-r.had.co.nz/"><img border="0" alt="W3Schools" src="https://images.tandf.co.uk/common/jackets/amazon/978146658/9781466586963.jpg" height="180" ></a> </td> </tr> </table> <br> --- .pull-left4[ # Webseiten Das Netz ist vll. der beste Ort für Informationen über R. <font size = 4><i>Beginnt eure Suche mit:</i></font> [Google](www.google.com). Stellt sicher, dass ihr *R* oder *Rproject* mit in eurer Suche habt. <br> <font size = 4><i>Meistens werdet ihr dann weitergeleitet zu:</i></font> [R-bloggers](www.r-bloggers.com) informiert euch über die neusten R Entwicklungen. Meldet euch für den Newsletter an. [Stackoverflow](https://stackoverflow.com/questions/tagged/r) ist eine Seite für R Probleme und Lösungen. Versucht selbst mal ein Problem zu posten. Oft bekommt ihr erstaunlich schnell Antwort. ] .pull-right5[ <p align="left" style="padding: 0 0px"><br><br><br><br><br><br><br><br><br> <a href="https://www.google.com/"><img border="0" alt="W3Schools" src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/google.png" height="100"></a><br><br><br2> <a href="https://www.r-bloggers.com/"><img border="0" alt="W3Schools" src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/rbloggers.png" height="105" style="margin-bottom:10px"></a><br> <a href="https://stackoverflow.com/"><img border="0" alt="W3Schools" src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/stackoverflow.png" height="105"></a> </p> ] --- # Hilfe & Consulting <table class="tg" style="cellspacing:0; cellpadding:0; border:none"> <col width="22%"> <col width="22%"> <col width="22%"> <tr> <th class="tg-yw4l" valign='top'> <p align="center"><br> <a href="www.dirkwulff.org"><img border="0" alt="W3Schools" src="https://therbootcamp.github.io/img/team/1.png" height="230" style="border-radius:50%"></a><br> <p align="center"> <font size = 5>Dr. Dirk Wulff</font><br><br> <a href="www.dirkwulff.org"><b>dirkwulff.org</b></a><br> <a href="https://github.com/dwulff"><b>github.com/dwulff</b></a><br> <font size=4><i>packages: </i></font> <a href="https://cran.r-project.org/web/packages/cstab/index.html"><b>cstab</b></a>, <br> <a href="https://cran.r-project.org/web/packages/mousetrap/index.html"><b>mousetrap</b></a>, <a href="https://cran.r-project.org/web/packages/memnet/index.html"><b>memnet</b> <a href="https://github.com/dwulff/choicepp"><b>choicepp</b> </p> </th> <th class="tg-yw4l" valign='top'><p align="center"><br> <a href="https://psychologie.unibas.ch/en/persons/markus-steiner/about-me/"><img border="0" alt="W3Schools" src="https://therbootcamp.github.io/img/team/2.png" height="230" style="border-radius:50%"></a><br> <p align="center"> <font size = 5>Markus Steiner</font><br><br> <a href="https://github.com/mdsteiner"><b>github.com/mdsteiner</b></a><br> <font size=4><i>packages: </i></font> <a href="https://github.com/mdsteiner/ShinyPsych"><b>ShinyPsych</b></a>, <br> <a href="https://github.com/mdsteiner/EFAdiff"><b>EFAdiff</b></a> <br><br> </p> </th> <th class="tg-yw4l" valign='top'> <p align="center"> <p align="center"><br> <a href="https://www.schulte-mecklenbeck.com/"><img border="0" alt="W3Schools" src="https://therbootcamp.github.io/img/team/3.png" height="230" style="border-radius:50%"></a><br> <p align="center"> <font size = 5>Dr. Michael Schulte-<br>Mecklenbeck</font><br><br> <a href="www.schulte-mecklenbeck.com"><b>schulte-mecklenbeck.com</b></a><br> <a href="https://github.com/schultem"><b>github.com/schultem</b></a><br><br> </p> </th> </tr> </table> --- # Weiterführende Kurse <table class="tg" style="cellspacing:0; cellpadding:0; border:none;"> <tr valign="top"> <col width="25%"> <col width="25%"> <col width="25%"> <col width="25%"> <td> <p align="center"> <a class="project-link" href="mailto:therbootcamp@gmail.com?subject=Preregistration for Statistics with R&body=I would like to preregister for the R Bootcamp on 'Data Mining mit R' on February 14/15 in Basel, Switzerland." align="center"> <font style="font-size:20px;weight:700"><br>Explorative Datenanalyse mit R</font><br> <br> <img src="https://therbootcamp.github.io/img/courses/1.png" height="230px" style="border-radius:50%;border:10px solid #E9ECEF"></img><br><br> <high>Februar 14/15, 2020</high><br><br><br> </p> </td> <td> <p align="center"> <a class="project-link" href="mailto:therbootcamp@gmail.com?subject=Preregistration for Statistics with R&body=I would like to preregister for the R Bootcamp on 'Statistik mit R' on März 13/14 in Basel, Switzerland." align="center"> <font style="font-size:20px;weight:700"><br>Statistik mit R<br></font><br> <br> <img src="https://therbootcamp.github.io/img/courses/2.png" height="230px" style="border-radius:50%;border:10px solid #E9ECEF"></img><br><br> <high>March 20/21, 2019</high><br><br><br> </a> </p> </td> <td> <p align="center"> <a class="project-link" href="mailto:therbootcamp@gmail.com?subject=Preregistration for Statistics with R&body=I would like to preregister for the R Bootcamp on 'Statistik mit R' on March 13/14 in Basel, Switzerland." align="center"> <font style="font-size:20px;weight:700"><br>Maschinelles Lernen mit R</font><br> <br> <img src="https://therbootcamp.github.io/img/courses/3.png" height="230px" style="border-radius:50%;border:10px solid #E9ECEF"></img><br><br> <high>April 3/4, 2019</high><br><br><br> </a> </p> </td> <td> <p align="center"> <a class="project-link" href="mailto:therbootcamp@gmail.com?subject=Preregistration for Statistics with R&body=I would like to preregister for the R Bootcamp on 'Reporting mit R' on March 13/14 in Basel, Switzerland." align="center"> <font style="font-size:20px;weight:700"><br>Reporting mit R<br></font><br> <br> <img src="https://therbootcamp.github.io/img/courses/4.png" height="230px" style="border-radius:50%;border:10px solid #E9ECEF"></img><br><br> <high>Mai 15/16, 2019</high><br><br><br> </a> </p> </td> </tr> --- .pull-left45[ # Bitte gebt uns Feedback <br><br> <p align = "center"> <img src="image/feedback.png" height=350px></img><br> <font style="font-size:10px">from <a href="https://cdn-images-1.medium.com/max/1600/1*5OZNYAfzDZfM1lwJBZEuHQ.png">medium.com</a></font> </p> ] .pull-right45[ <p align="center"><br><br> <iframe src="https://docs.google.com/forms/d/e/1FAIpQLScz5sd7Otys8BXmKLZn0OPxyr1Tw6ilWSQgxm6Zr6q48TD66g/viewform?embedded=true" width="430" height="550" frameborder="0" marginheight="0" marginwidth="0">Loading…</iframe></p> ] --- class: center, middle # Thank you! and one more thing...