class: center, middle, inverse, title-slide .title[ # Prochaines étapes ] .author[ ### Introduction à l’analyse de données avec R
The R Bootcamp
] .date[ ### Juin 2023 ] --- layout: true <div class="my-footer"> <span style="text-align:center"> <span> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/> </span> <a href="https://therbootcamp.github.io/"> <span style="padding-left:82px"> <font color="#7E7E7E"> www.therbootcamp.com </font> </span> </a> <a href="https://therbootcamp.github.io/"> <font color="#7E7E7E"> Introduction à l'analyse de données avec R | Juin 2023 </font> </a> </span> </div> --- .pull-left45[ <br><br><br><br><br> # Félicitations Vous comprenez maintenant les bases de R! ] .pull-right4[ <img src="image/schedule_fr.png" height="580" align="center" style="padding-top:16px"> ] --- # R peut faire beaucoup plus .pull-left5[ <font size = 6> - `{tidyverse}` <br> - `{tidyverse}` "élargi"<br> - Web scrapping<br> - Analyse de texte<br> - Statistique<br> - Machine Learning<br> - Reporting<br> - Sites Web<br> </font> ] .pull-right5[ <p align="center"><img border="0" alt="" src="" width="400px"></p> ] --- # `tidyverse` Das [`tidyverse`](https://www.tidyverse.org/) est essentiellement une collection de libraries performantes et conviviales qui ont été spécialement développés pour une analyse de données plus efficace. 1. `ggplot2` pour les graphique. 2. `dplyr` pour la transformation de données. 3. `tidyr` pour le nettoyage de données. 4. `readr` pour chargement et enrégistrement de données. 5. `purrr` pour la programmation fonctionnelle. 6. `tibble` les `data.frame`s modernes. <br><br> <img src="http://d33wubrfki0l68.cloudfront.net/0ab849ed51b0b866ef6895c253d3899f4926d397/dbf0f/images/hex-ggplot2.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/071952491ec4a6a532a3f70ecfa2507af4d341f9/c167c/images/hex-dplyr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/5f8c22ec53a1ac61684f3e8d59c623d09227d6b9/b15de/images/hex-tidyr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/66d3133b4a19949d0b9ddb95fc48da074b69fb07/7dfb6/images/hex-readr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/9221ddead578362bd17bafae5b85935334984429/37a68/images/hex-purrr.png" height="200px" /><img src="http://d33wubrfki0l68.cloudfront.net/f55c43407ae8944b985e2547fe868e5e2b3f9621/720bb/images/hex-tibble.png" height="200px" /> --- # `ggplot2` .pull-left45[ ```r library(tidyverse) ; library(ggrepel) # Chargement des données tour <- read_csv('1_Data/Tourismus.csv') # Créer un graphique avec ggplot2 ggplot(data = tour, mapping = aes(x = Besucher, y = Dauer, label = Land)) + scale_x_continuous(trans = 'log2') + geom_point(size=2) ``` ] .pull-right45[ ![](NaechsteSchritte_files/figure-html/unnamed-chunk-3-1.png)<!-- --> ] --- # `ggplot2` .pull-left45[ ```r library(tidyverse) ; library(ggrepel) # Chargement des données tour <- read_csv('1_Data/Tourismus.csv') # Graphique de Dauer par rapport à Besucher ggplot(data = tour, mapping = aes(x = Besucher, y = Dauer, label = Land)) + scale_x_continuous(trans = 'log2') + geom_point(size=2) + geom_label_repel(size = 2, label.padding = 0.1) + theme_bw() ``` ] .pull-right45[ ![](NaechsteSchritte_files/figure-html/unnamed-chunk-5-1.png)<!-- --> ] --- # `ggplot2` .pull-left45[ ```r library(tidyverse) ; library(ggrepel) # Chargement des données tour <- read_csv('1_Data/Tourismus.csv') # Graphique de Dauer par rapport à Besucher ggplot(data = tour, mapping = aes(x = Besucher, y = Dauer, label = Land)) + scale_x_continuous(trans = 'log2') + geom_point(size=2) + geom_label_repel(size = 2, label.padding = 0.1) + theme_bw() + facet_grid(Region ~ .) ``` ] .pull-right45[ ![](NaechsteSchritte_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ] --- # `dplyr` .pull-left45[ ```r library(tidyverse) # Chargement des données tour <- read_csv('1_Data/Tourismus.csv') # Affichage du Top 10 de Nachte par Land tour %>% mutate(Nachte = Besucher * Dauer) %>% arrange(desc(Nachte)) %>% select(Land, Nachte) %>% top_n(10) ``` ] .pull-right45[ ``` ## # A tibble: 10 x 2 ## Land Nachte ## <chr> <dbl> ## 1 Deutschland 18059 ## 2 Vereinigte Staaten 15353 ## 3 Vereinigtes Königreich 7981 ## 4 Frankreich 5288 ## 5 Italien 3224 ## 6 Spanien 2294 ## 7 Niederlande 2069 ## 8 Kanada 2006 ## 9 Österreich 1683 ## 10 Indien 1622 ``` ] --- # `dplyr` .pull-left45[ ```r library(tidyverse) # Chargement des données tour <- read_csv('1_Data/Tourismus.csv') # Statistiques de Nachte par Region tour %>% mutate(Nachte = Besucher * Dauer) %>% group_by(Region) %>% summarize( Nachte_mittel = mean(Nachte), Nachte_summe = sum(Nachte), ) ``` ] .pull-right45[ ``` ## # A tibble: 5 x 3 ## Region Nachte_mittel Nachte_summe ## <chr> <dbl> <dbl> ## 1 Afrika 259 1036 ## 2 Amerika 2435. 19479 ## 3 Asien 464. 9278 ## 4 Australien 580 1160 ## 5 Europa 1415. 52353 ``` ] --- # `dplyr` + `gplot2` .pull-left45[ ```r library(tidyverse) ; library(ggrepel) # Chargement des données tour <- read_csv("1_Data/Tourismus.csv") europa <- read_csv("1_Data/Europa.csv") # Verbinde Nachte mit Aquivalenzeinkommen tour %>% mutate(Nachte = Besucher * Dauer) %>% left_join(europa) %>% ggplot(aes(x = Aquivalenzeinkommen, y = Nachte, label = Land)) + scale_y_continuous(trans = 'log2') + geom_point() + geom_label_repel(size = 2) + theme_bw() ``` ] .pull-right45[ ![](NaechsteSchritte_files/figure-html/unnamed-chunk-13-1.png)<!-- --> ] --- # `tidyverse` "élargi" L'environment de [`tidyverse`](https://www.tidyverse.org/) contient une collection d'autres packages hautes performances et conviviaux qui complètent le noyau Tidyverse. 1. `xlm2` pour le traitement des fichiers XML et HTML. 2. `rvest` pour le "Web Scraping". 3. `haven` pour le traitement de données de SPSS, SAS et Stata. 4. `readxl` pour le traitement de données Excel. 5. `lubridate` pour les données temporelles. 6. `tidytext` pour les textuelles. <br><br> <img src="https://github.com/rstudio/hex-stickers/raw/master/PNG/rvest.png" height="200px" /><img src="https://github.com/rstudio/hex-stickers/raw/master/PNG/haven.png" height="200px" /><img src="https://github.com/rstudio/hex-stickers/raw/master/PNG/readxl.png" height="200px" /><img src="https://github.com/rstudio/hex-stickers/raw/master/PNG/lubridate.png" height="200px" /> --- # Web Scraping ```r library(tidytext) # Charger la table de Wikipédia (n'oubliez pas les libraries) read_html("https://en.wikipedia.org/wiki/R_(programming_language)") %>% html_text() %>% tibble() %>% unnest_tokens(word, ".") ``` ``` ## # A tibble: 9,543 x 1 ## word ## <chr> ## 1 r ## 2 programming ## 3 language ## 4 wikipediadocument.documentelement.classname ## 5 client ## 6 js ## 7 vector ## 8 feature ## 9 language ## 10 in ## # ... with 9,533 more rows ``` --- # Analyse de texte .pull-left5[ ```r library(wordcloud) library(dplyr) ; library(stringr) # Comptage de mots read_html("https://en.wikipedia.org/wiki/R_(programming_language)") %>% html_text() %>% tibble() %>% unnest_tokens(word, ".") %>% filter(!str_detect(word, '[:digit:]')) %>% anti_join(stop_words) %>% count(word, sort = TRUE) %>% top_n(100) ``` ] .pull-right4[ ``` ## # A tibble: 107 x 2 ## word n ## <chr> <int> ## 1 output 200 ## 2 mw 197 ## 3 parser 197 ## 4 retrieved 90 ## 5 data 60 ## 6 programming 51 ## 7 hlist 49 ## 8 navbox 48 ## 9 december 44 ## 10 language 42 ## # ... with 97 more rows ``` ] --- # Nuage de mots .pull-left5[ ```r library(dplyr) ; library(tidytext) ; library(wordcloud) ; library(stringr) counts <- read_html("https://en.wikipedia.org/wiki/R_(programming_language)") %>% html_text() %>% tibble() %>% unnest_tokens(word, ".") %>% filter(!str_detect(word, '[:digit:]')) %>% anti_join(stop_words) %>% count(word, sort = TRUE) %>% top_n(100) # Nuage de mots par(mar=c(0,0,0,0)) wordcloud(counts$word, counts$n) ``` ] .pull-right4[ ![](NaechsteSchritte_files/figure-html/unnamed-chunk-20-1.png)<!-- --> ] --- # Statistique .pull-left45[ ```r library(tidyverse) # Chargement des données tour <- read_csv('1_Data/Tourismus.csv') land <- read_csv('1_Data/Lander.csv') # Joindre les données data <- tour %>% inner_join(land) %>% mutate(Nachte = Besucher * Dauer) # Régression linéare model <- lm(Nachte ~ Bevolkerung + Dichte + BIP, data = data) ``` ] .pull-right45[ ```r # Coéficient de détermination summary(model)$r.squared ``` ``` ## [1] 0.05719 ``` ```r # Coefficients summary(model)$coef[,-2] ``` ``` ## Estimate t value Pr(>|t|) ## (Intercept) 1.213e+01 0.01294 0.9897 ## Bevolkerung 2.379e-06 1.36594 0.1775 ## Dichte -3.381e-01 -0.82598 0.4124 ## BIP 2.988e+01 1.50779 0.1373 ``` ] --- .pull-left45[ # Machine Learning ```r library(tidyverse) ; library(rpart) # Chargement des données tour <- read_csv('1_Data/Tourismus.csv') land <- read_csv('1_Data/Lander.csv') # Joindres les données data <- tour %>% inner_join(land) %>% mutate(Nachte = Besucher * Dauer) # Régression linéare rpart( formula = Besucher ~ Bevolkerung + Dichte + BIP, data = data) ``` ] .pull-right45[ <br><br> ![](NaechsteSchritte_files/figure-html/unnamed-chunk-25-1.png)<!-- --> ] --- # Reporting R et RStudio fournissent également d'excellents outils pour créer <high>Rapports</high>, <high>Slides</high>, et même des <high>sites Web</high>. 1. `rmarkdown` pour des documents PDF crées de manière dynamique. 2. `xaringan` pour des Slides. 3. `shiny` pour des sites Web et des Cockpits. <br><br> <img src="https://github.com/rstudio/hex-stickers/raw/master/PNG/rmarkdown.png" height="200px" /><img src="https://github.com/rstudio/hex-stickers/raw/master/PNG/xaringan.png" height="200px" /><img src="https://github.com/rstudio/hex-stickers/raw/master/PNG/shiny.png" height="200px" /> --- .pull-left45[ # `rmarkdown` <p align="left"><img style="height:440px" src="image/rmarkdown_fr.png"></p> ] .pull-right5[ <br><br> <p align="center"><img style="height:510px;box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="image/pdf_fr.png"></p> ] --- # `Web sites` <iframe width="1000" height="600" src="https://vac-lshtm.shinyapps.io/ncov_tracker/?_ga=2.157815026.975657143.1601587486-2064892133.1598629448" frameborder="0" allowfullscreen></iframe> --- # Prochaines étapes .pull-left5[ <font size = 4><i> <font size = 6> 1. Appliquer<br> 2. Manuels<br> 3. Sources Web<br> 4. Support et Consulting<br> 5. Cours avancés<br> ] </font> <br> .pull-right5[ <p align="center"><img border="0" alt="W3Schools" src="http://www.theunmanageableemployee.com/wp-content/uploads/2011/07/5cs-stones-cropped-11052730.jpg?w=112"></p> ] --- # Manuels Voici une liste partielle de bons livres sur R plus au moins ordonnés selon l'expérience de l'utilisateur.<br><br> <table width="80%" style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td> <a href="http://r4ds.had.co.nz/"><img border="0" alt="W3Schools" src="http://r4ds.had.co.nz/cover.png" height="180"></a> </td> <td> <a href="https://covers.oreillystatic.com/images/0636920028574/cat.gif"><img border="0" alt="W3Schools" src="https://covers.oreillystatic.com/images/0636920028574/cat.gif" height="180"></a> </td> <td> <a href="https://ggplot2-book.org/"><img border="0" alt="W3Schools" src="https://images-na.ssl-images-amazon.com/images/I/31uoy-qmhEL._SX331_BO1,204,203,200_.jpg" height="180"></a> </td> <td> <a href="https://www.springer.com/de/book/9783540799979"><img border="0" alt="W3Schools" src="https://media.springernature.com/w306/springer-static/cover-hires/book/978-3-540-79998-6" height="180" ></a> </td> <td> <a href="https://bookdown.org/ndphillips/YaRrr/"><img border="0" alt="W3Schools" src="https://bookdown.org/ndphillips/YaRrr/images/YaRrr_Cover.jpg" height="180" ></a> </td> <td> <a href="https://www.orellfuessli.ch/shop/home/artikeldetails/ID35367941.html?ProvID=10917736&gclid=Cj0KCQiAg_HhBRDNARIsAGHLV5238Q26gQmFttHRnYGjcAhz4CslStb-3qBegvuZS5gnCpWSLNlQvF0aAgfOEALw_wcB"><img border="0" alt="W3Schools" src="https://assets.thalia.media/img/35367941-00-00.jpg" height="180" ></a> </td> </tr> <tr style="background-color:#ffffff"> <td> <a href="http://appliedpredictivemodeling.com/"><img border="0" alt="W3Schools" src="http://static1.squarespace.com/static/51156277e4b0b8b2ffe11c00/t/51157487e4b0b8b2ffe16829/1509217882069/?format=1500w" height="180" ></a> </td> <td> <a href="http://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf"><img border="0" alt="W3Schools" src="https://images-na.ssl-images-amazon.com/images/I/41EaH4W9LVL._SX332_BO1,204,203,200_.jpg" height="180" ></a> </td> <td> <a href="https://www.manning.com/books/deep-learning-with-r"><img border="0" alt="W3Schools" src="https://images-na.ssl-images-amazon.com/images/I/51h5d4dYaoL._SX396_BO1,204,203,200_.jpg" height="180" ></a> </td> <td> <a href="https://csgillespie.github.io/efficientR/"><img border="0" alt="W3Schools" src="https://csgillespie.github.io/efficientR/figures/f0_web.png" height="180" ></a> </td> <td> <a href="www.rcpp.org/"><img border="0" alt="W3Schools" src="https://images-na.ssl-images-amazon.com/images/I/31Y8QSFEMJL._SX328_BO1,204,203,200_.jpg" height="180" ></a> </td> <td> <a href="http://adv-r.had.co.nz/"><img border="0" alt="W3Schools" src="https://images-na.ssl-images-amazon.com/images/I/41RJLhZ32VL._SX329_BO1,204,203,200_.jpg" height="180" ></a> </td> </tr> </table> <br> --- .pull-left4[ # Sources Web Le net est probablement le meilleur endroit pour obtenir des informations sur R. <font size = 4><i>Naturellement, dans un moteur de recherche:</i></font> Example dans [Google](https://www.google.com) ou [DuckDuckGo](https://duckduckgo.com) et assurez-vous que *R* ou *Rproject* est mentionné dans votre recherche. <font size = 4><i>Souvent, vous serez redirigé vers:</i></font> [R-bloggers](https://www.r-bloggers.com) informations sur les derniers développements R (comprend une newsletter). [Stackoverflow tag R](https://stackoverflow.com/questions/tagged/r) est une page pour les problèmes R avec des solutions proposés par d'autres utilisateurs. Essayez de publier un problème vous-même. Souvent, vous obtenez des réponses étonnamment rapidement. ] .pull-right5[ <p align="left" style="padding: 0 0px"><br><br><br><br><br><br><br><br><br> <a href="https://www.google.com/"><img border="0" alt="W3Schools" src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/google.png" height="100"></a><br><br><br2> <a href="https://www.r-bloggers.com/"><img border="0" alt="W3Schools" src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/rbloggers.png" height="105" style="margin-bottom:10px"></a><br> <a href="https://stackoverflow.com/"><img border="0" alt="W3Schools" src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/stackoverflow.png" height="105"></a> </p> ] --- # Support et Consulting <table class="tg" style="cellspacing:0; cellpadding:0; border:none"> <col width="22%"> <col width="22%"> <col width="22%"> <col width="22%"> <tr> <th class="tg-yw4l" valign='top'> <p align="center"><br> <a href="www.dirkwulff.org"><img border="0" alt="W3Schools" src="https://therbootcamp.github.io/img/team/1.png" height="230" style="border-radius:50%"></a><br> <p align="center"> <font size = 5>Dr. Dirk Wulff</font><br><br> <a href="www.dirkwulff.org"><b>dirkwulff.org</b></a><br> <a href="https://github.com/dwulff"><b>github.com/dwulff</b></a><br> <font size=4><i>packages: </i></font> <a href="https://cran.r-project.org/web/packages/cstab/index.html"><b>cstab</b></a>, <a href="https://github.com/dwulff/text2sdg"><b>text2sdg</b></a>, <br> <a href="https://cran.r-project.org/web/packages/mousetrap/index.html"><b>mousetrap</b></a>, <a href="https://cran.r-project.org/web/packages/memnet/index.html"><b>memnet</b>, <a href="https://github.com/dwulff/choicepp"><b>choicepp</b> </p> </th> <th class="tg-yw4l" valign='top'><p align="center"><br> <a href="https://psychologie.unibas.ch/en/persons/markus-steiner/about-me/"><img border="0" alt="W3Schools" src="https://therbootcamp.github.io/img/team/2.png" height="230" style="border-radius:50%"></a><br> <p align="center"> <font size = 5>Markus Steiner</font><br><br> <a href="https://github.com/mdsteiner"><b>github.com/mdsteiner</b></a><br> <font size=4><i>packages: </i></font> <a href="https://github.com/mdsteiner/ShinyPsych"><b>ShinyPsych</b></a>, <br> <a href="https://github.com/mdsteiner/EFAdiff"><b>EFAdiff</b></a> <br><br> </p> </th> <th class="tg-yw4l" valign='top'> <p align="center"> <p align="center"><br> <a href="https://www.schulte-mecklenbeck.com/"><img border="0" alt="W3Schools" src="https://therbootcamp.github.io/img/team/3.png" height="230" style="border-radius:50%"></a><br> <p align="center"> <font size = 5>Dr. Michael Schulte-<br>Mecklenbeck</font><br><br> <a href="www.schulte-mecklenbeck.com"><b>schulte-mecklenbeck.com</b></a><br> <a href="https://github.com/schultem"><b>github.com/schultem</b></a><br><br> </p> </th> <th class="tg-yw4l" valign='top'> <p align="center"> <p align="center"><br> <a href="https://www.joao-ramalho.ch/"><img border="0" alt="W3Schools" src="https://therbootcamp.github.io/img/team/4.png" height="230" style="border-radius:50%"></a><br> <p align="center"> <font size = 5>João Ramalho</font><br><br> <a href="https://www.joao-ramalho.ch/"><b>joao-ramalho.ch</b></a><br> <font size=4><i>book/package: </i></font> <a href="https://j-ramalho.github.io/industRial/"><b>industRial</b></a> <br> </p> </th> </tr> </table> --- # Cours avancés (Adaptation pour le français en préparation) <table class="tg" style="cellspacing:0; cellpadding:0; border:none;"> <tr valign="top"> <col width="20%"> <col width="20%"> <col width="20%"> <col width="20%"> <col width="20%"> <td> <p align="center"> <a class="project-link" href="https://therbootcamp.github.io/#courses" align="center"> <font style="font-size:20px;weight:700"><br>Einführung in die moderne Datenanalyse mit R</font><br> <br> <img src="https://therbootcamp.github.io/img/courses/0.png" height="180px" style="border-radius:50%;border:10px solid #E9ECEF"></img><br><br> <high><br></high><br><br><br><br> </p> </td> <td> <p align="center"> <a class="project-link" href="https://therbootcamp.github.io/#courses" align="center"> <font style="font-size:20px;weight:700"><br>Explorative Datenanalyse mit R<br></font><br> <br> <img src="https://therbootcamp.github.io/img/courses/1.png" height="180px" style="border-radius:50%;border:10px solid #E9ECEF"></img><br><br> <high><br></high><br><br><br><br> </p> </td> <td> <p align="center"> <a class="project-link" href="https://therbootcamp.github.io/#courses" align="center"> <font style="font-size:20px;weight:700"><br>Statistik mit R<br><br></font><br> <br> <img src="https://therbootcamp.github.io/img/courses/2.png" height="180px" style="border-radius:50%;border:10px solid #E9ECEF"></img><br><br> <high><br></high><br><br><br><br> </a> </p> </td> <td> <p align="center"> <a class="project-link" href="https://therbootcamp.github.io/#courses" align="center"> <font style="font-size:20px;weight:700"><br>Maschinelles Lernen mit R<br></font><br> <br> <img src="https://therbootcamp.github.io/img/courses/3.png" height="180px" style="border-radius:50%;border:10px solid #E9ECEF"></img><br><br> <high><br></high><br><br><br><br> </a> </p> </td> <td> <p align="center"> <a class="project-link" href="https://therbootcamp.github.io/#courses" align="center"> <font style="font-size:20px;weight:700"><br>Reporting mit R<br><br></font><br> <br> <img src="https://therbootcamp.github.io/img/courses/4.png" height="180px" style="border-radius:50%;border:10px solid #E9ECEF"></img><br><br> <high><br></high><br><br><br><br> </a> </p> </td> </tr> --- .pull-left45[ # Merci de votre feedback <br><br> <p align = "center"> <img src="image/feedback.png" height=350px></img><br> <font style="font-size:10px">from <a href="https://cdn-images-1.medium.com/max/1600/1*5OZNYAfzDZfM1lwJBZEuHQ.png">medium.com</a></font> </p> ] .pull-right45[ <p align="center"><br><br> ] --- class: center, middle # Merci!