class: center, middle, inverse, title-slide # Plotting ### Explorative Datenanalyse mit R
The R Bootcamp
@
CSS
### Dezember 2019 --- layout: true <div class="my-footer"> <span style="text-align:center"> <span> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/> </span> <a href="https://therbootcamp.github.io/"> <span style="padding-left:82px"> <font color="#7E7E7E"> www.therbootcamp.com </font> </span> </a> <a href="https://therbootcamp.github.io/"> <font color="#7E7E7E"> Explorative Datenanalyse mit R @ CSS | Dezember 2019 </font> </a> </span> </div> --- <p align="center"> <br> <img src="image/plots.png" height="540px"> <br> <font style="font-size:10px">from <a href="http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html">r-statistics.co</a></font> </p> --- .pull-left4[ # `base` Plotting <ul> <li class="m1"><span>Der <high>traditionelle</high> Ansatz für Plotting.</span></li> <li class="m2"><span><high>Separate Funktionen</high> für verschiedene Plots</span></li> </ul> <br> ```r # Histogram in base R hist(x = basel$alter, xlab = "Alter", ylab = "Häufigkeit", main = "Histogramm Alter") ``` ] .pull-right5[ <br><br><br> <img src="Plotting_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> ] --- .pull-left4[ # `base` Plotting <ul> <li class="m1"><span>Der <high>traditionelle</high> Ansatz für Plotting.</span></li> <li class="m2"><span><high>Separate Funktionen</high> für verschiedene Plots</span></li> </ul> <br> ```r # Boxplot in base R boxplot(formula = groesse ~ geschlecht, data = basel, xlab = "Geschlecht", ylab = "Groesse", main = "Box plot Groesse") ``` ] .pull-right5[ <br><br><br> <img src="Plotting_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> ] --- .pull-left4[ # `base` Plotting <ul> <li class="m1"><span>Der <high>traditionelle</high> Ansatz für Plotting.</span></li> <li class="m2"><span><high>Separate Funktionen</high> für verschiedene Plots</span></li> </ul> <br> ```r # Scatterplot in base R plot(x = basel$groesse, y = basel$einkommen, xlab = "Height", ylab = "Einkommen", main = "Scatterplot Groesse x Einkommen") ``` ] .pull-right5[ <br><br><br> <img src="Plotting_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> ] --- # Problems with Base R plotting .pull-left4[ <ul> <li class="m1"><span>Die <high>Ästhetik</high> ist nicht mehr zeitgemäss.</span></li> <li class="m2"><span>Schönere Plots = sehr <high>viel Code.</high></span></li> <li class="m3"><span>Generell sehr <high>unflexibel.</high></high></span></li> </ul> ] .pull-right5[ <p align="center"> <br> <img src="image/outdated.jpeg"> <br> <font style="font-size:10px">from <a href="https://www.healthhosts.com/4-signs-your-website-is-outdated/">healthhosts.com</a></font> </p> ] --- # Das mächtige `tidyverse` Das [`tidyverse`](https://www.tidyverse.org/) ist im Kern eine Sammlung hoch-performanter, nutzerfreundlicher Pakete, die speziell für eine effizientere Datenanalyse entwickelt wurden. 1. <high><mono>ggplot2</mono> für Grafiken</high>. 2. <mono>dplyr</mono> für Datenverarbeitung. 3. <mono>tidyr</mono> für Datenverarbeitung. 4. `readr` für Daten I/O. 5. `purrr` für funktionales Programmieren. 6. `tibble` für moderne `data.frame`s. <br><br> <table style="cellspacing:0; cellpadding:0; border:none;"> <col width="15%"> <col width="15%"> <col width="15%"> <col width="15%"> <col width="15%"> <col width="15%"> <tr> <td bgcolor="white"> <img src="image/hex-ggplot2.png" height=160px></img> </td> <td bgcolor="white"> <img src="image/hex-dplyr.png"height=160px style="opacity:.2"></img> </td> <td bgcolor="white"> <img src="image/hex-tidyr.png"height=160px style="opacity:.2"></img> </td> <td bgcolor="white"> <img src="image/hex-readr.png"height=160px style="opacity:.2"></img> </td> <td bgcolor="white"> <img src="image/hex-purrr.png"height=160px style="opacity:.2"></img> </td> <td bgcolor="white"> <img src="image/hex-tibble.png"height=160px style="opacity:.2"></img> </td> </tr> </table> --- # Modulare Graphiken in <mono>ggplot2</mono> .pull-left45[ <ul> <li class="m1"><span><high>data</high>: Der Datensatz</span></li> <li class="m2"><span><high>mapping</high>: Grobe Struktur des Plots <br><br> <ul class="level"> <li><span>Was soll auf die Achsen?</span></li> <li><span>Was soll Grösse/Farbe repräsentieren?</span></li> </ul> </span></li> <li class="m3"><span><high>geoms</high>: Objekte im Plot</high></span></li> <li class="m4"><span><high>labs</high>: Annotierung des Plots</high></span></li> <li class="m5"><span><high>themes</high>: Ästhetische Gestaltung</high></span></li> <li class="m6"><span><high>facets</high>: Auftrennung des Plots </high></span></li> <li class="m7"><span><high>scales</high>: Skalierung der Achsen</high></span></li> </ul> ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] --- # Ziel: Diesen Plot kreieren .pull-left45[ <ul> <li class="m1"><span><high>data</high> <br> <ul class="level"> <li><span>Der <mono>mpg</mono> Datensatz</span></li> </ul> </span></li> <li class="m2"><span><high>mapping</high> <br> <ul class="level"> <li><span>Hubraum auf die x-Achse</span></li> <li><span>Meilen pro Gallone auf die y-Achse</span></li> <li><span>Farbe der Objekte gemäss Autoklasse</span></li> </ul> </span></li> <li class="m3"><span><high>geoms</high> <br> <ul class="level"> <li><span>Daten als Punkte</span></li> <li><span>Ergänze Regressionslinie</span></li> </ul> </span></li> <li class="m4"><span><high>labs</high> <br> <ul class="level"> <li><span>Beschriftungen Achsen und Titel</span></li> </ul> </span></li> <li class="m5"><span><high>themes</high> <br> <ul class="level"> <li><span>Schwarz-weiss Ästhetik</span></li> </ul> </span></li> </ul> ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> ] --- # `ggplot()` .pull-left45[ <ul> <li class="m1"><span>Alle Plots beginnen mit <highm>ggplot()</highm></span></li> <li class="m2"><span>2 zentrale Argumente <br><br> <ul class="level"> <li><span><mono>data</mono> | Der Datensatz (<mono>tibble</mono>)</span></li> <li><span><mono>mapping</mono> | Die Struktur definiert mittels <mono>aes()</mono></span></li> </ul> </span></li> </ul> ] .pull-right45[ ```r ggplot(data = mpg) ``` <img src="Plotting_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] --- # `aes()` .pull-left45[ <ul> <li class="m1"><span><highm>aes()</highm> definiert die Struktur für das <mono>mapping</mono> Argument.</span></li> <li class="m2"><span>Zentrale Argumente: <br><br> <ul class="level"> <li><span><mono>x,y</mono> | Bestimmt die Achsen</span></li> <li><span><mono>color,fill</mono> | Bestimmt Farben</span></li> <li><span><mono>alpha</mono> | Bestimmt Transparenz</span></li> <li><span><mono>size</mono> | Bestimmt Grössen</span></li> <li><span><mono>shape</mono> | Bestimmt Objektypen (z.B. Kreise oder Quadrate)</span></li> </ul> </span></li> </ul> ] .pull-right45[ ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) ``` <img src="Plotting_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> ] --- # <mono>+</mono> .pull-left45[ <ul> <li class="m1"><span>Der <highm>+</highm> Operator erweitert den Plot um beliebige weitere Plotelemente.</span></li> </ul> <br> ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + # Plotte Daten als Punkte geom_point() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> ] --- # `geom_*()` .pull-left45[ <ul> <li class="m1"><span><highm>geom_*()</highm> Funktionen definieren die geometrischen Objekte mit denen die Daten repräsentiert werden.</span></li> <li class="m2"><span>Ein paar <mono>geoms</mono>: <br><br> <ul class="level"> <li><span><mono>geom_point()</mono> | für Punkte</span></li> <li><span><mono>geom_bar()</mono> | für Balken</span></li> <li><span><mono>geom_boxplot()</mono> | für Box-Plots </span></li> <li><span><mono>geom_count()</mono> | für Punkte skaliert nach Häufigkeit</span></li> <li><span><mono>geom_smooth()</mono> | für Kurven</span></li> </ul> </span></li> </ul> ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> ] --- .pull-left45[ <br> ## `geom_count()` <br> ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + geom_count() ``` <img src="Plotting_files/figure-html/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right45[ <br> ## `geom_bar()` <br> ```r ggplot(data = mpg, mapping = aes(x = class)) + geom_bar() ``` <img src="Plotting_files/figure-html/unnamed-chunk-17-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .pull-left45[ <br> ## `geom_boxplot()` <br> ```r ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + geom_boxplot() ``` <img src="Plotting_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> ] .pull-right45[ <br> ## `geom_violin()` <br> ```r ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + geom_violin() ``` <img src="Plotting_files/figure-html/unnamed-chunk-19-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Was fehlt? .pull-left45[ <img src="Plotting_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> ] --- # `aes()` .pull-left45[ <ul> <li class="m1"><span><highm>aes()</highm> definiert die Struktur für das <mono>mapping</mono> Argument.</span></li> <li class="m2"><span>Zentrale Argumente: <br><br> <ul class="level"> <li><span><mono>x,y</mono> | Bestimmt die Achsen</span></li> <li><span><high><mono>color,fill</mono> | Bestimmt Farben</high></span></li> </ul> </span></li> </ul> ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, # Farbe gemäss Klasse color = class)) + geom_point() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> ] --- # Was fehlt? .pull-left45[ <img src="Plotting_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> ] --- # `geom_smooth()` .pull-left45[ <ul> <li class="m1"><span>Mit <highm>geom_smooth()</highm> können Kurven an die Daten angepasst werden.</span></li> <li class="m2"><span>Zentrale Argumente: <br><br> <ul class="level"> <li><span><mono>method</mono> | Art der Datenanpassung</span></li> <li><span><mono>color</mono> | Farbe</high></span></li> </ul> </span></li> </ul> ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, col = class)) + geom_point() + # Ergänze Kurve geom_smooth(col = "blue") ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> ] --- # `geom_smooth()` .pull-left45[ <ul> <li class="m1"><span>Mit <highm>geom_smooth()</highm> können angepasste Kurven ergänzt werden.</span></li> <li class="m2"><span>Zentrale Argumente: <br><br> <ul class="level"> <li><span><mono>method</mono> | Art der Datenanpassung</span></li> <li><span><mono>color</mono> | Farbe</high></span></li> </ul> </span></li> </ul> ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, col = class)) + geom_point() + # Ergänze Kurve geom_smooth(col = "blue", method = "lm") ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-29-1.png" style="display: block; margin: auto;" /> ] --- # Vererbung .pull-left45[ <ul> <li class="m1"><span><highm>geom</highm>s erben ihre Eigenschaften von <mono>mapping</mono>.</span></li> <li class="m2"><span>Vererbte Eigenschaften können durch eigene Argumente <high>überschrieben</high> werden.</span></li> </ul> ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, col = class)) + geom_point() + geom_smooth() ``` ] .pull-right45[ <p align="center"> <br> <img src="image/question.png" height=280px> <br> <font style="font-size:10px">from <a href="http://catchingfire.ca/the-power-of-a-question-mark/">catchingfire.ca</a></font> </p> ] --- # Vererbung .pull-left45[ <ul> <li class="m1"><span><highm>geom</highm>s erben ihre Eigenschaften von <mono>mapping</mono>.</span></li> <li class="m2"><span>Vererbte Eigenschaften können durch eigene Argumente <high>überschrieben</high> werden.</span></li> </ul> ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, col = class)) + geom_point() + geom_smooth() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-32-1.png" style="display: block; margin: auto;" /> ] --- # Was fehlt? .pull-left45[ <img src="Plotting_files/figure-html/unnamed-chunk-33-1.png" style="display: block; margin: auto;" /> ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-34-1.png" style="display: block; margin: auto;" /> ] --- # `labs()` .pull-left45[ <ul> <li class="m1"><span>Mit <highm>labs()</highm> können alle Aspekte des Plots annotiert werden.</span></li> <li class="m2"><span>Zentrale Argumente: <br><br> <ul class="level"> <li><span><mono>x,y</mono> | Achsenbeschriftung</span></li> <li><span><mono>title, subtitle</mono> | Titel und Untertitel</high></span></li> <li><span><mono>caption</mono> | Bildunterschrift</high></span></li> </ul> </span></li> </ul> ```r ggplot(...) + labs(x = "Hubraum in Litern", y = "Autobahn Meilen pro Gallone", title = "MPG Datensatz", subtitle = "Autos mit mehr Hub...", caption = "Quelle: mpg Datensatz...") ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-36-1.png" style="display: block; margin: auto;" /> ] --- # Was fehlt? .pull-left45[ <img src="Plotting_files/figure-html/unnamed-chunk-37-1.png" style="display: block; margin: auto;" /> ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-38-1.png" style="display: block; margin: auto;" /> ] --- # Formatierung mit `theme_*()` .pull-left45[ <ul> <li class="m1"><span>Mit <highm>theme()</highm> können alle Aspekte eines Plots mit einem vorgefertigten Satz an Einstellungen ästhetisch formatiert werden.</span></li> <li class="m2"><span>Einige <mono>theme</mono>s: <br><br> <ul class="level"> <li><span><mono>theme_gray()</mono></span></li> <li><span><mono>theme_classic()</mono></span></li> <li><span><mono>theme_void()</mono></span></li> <li><span><mono>theme_excel() (<mono>ggthemes</mono>)</mono></span></li> <li><span><mono>theme_economist() (<mono>ggthemes</mono>)</mono></span></li> <li><span><mono>theme_bw()</mono></span></li> </ul> </span></li> </ul> ] .pull-right45[ ```r ggplot(...) + theme_gray() ``` <img src="Plotting_files/figure-html/unnamed-chunk-40-1.png" style="display: block; margin: auto;" /> ] --- # Formatierung mit `theme_*()` .pull-left45[ <ul> <li class="m1"><span>Mit <highm>theme()</highm> können alle Aspekte eines Plots mit einem vorgefertigten Satz an Einstellungen ästhetisch formatiert werden.</span></li> <li class="m2"><span>Einige <mono>theme</mono>s: <br><br> <ul class="level"> <li><span><mono>theme_gray()</mono></span></li> <li><span><mono>theme_classic()</mono></span></li> <li><span><mono>theme_void()</mono></span></li> <li><span><mono>theme_excel() (<mono>ggthemes</mono>)</mono></span></li> <li><span><mono>theme_economist() (<mono>ggthemes</mono>)</mono></span></li> <li><span><mono>theme_bw()</mono></span></li> </ul> </span></li> </ul> ] .pull-right45[ ```r ggplot(...) + theme_classic() ``` <img src="Plotting_files/figure-html/unnamed-chunk-42-1.png" style="display: block; margin: auto;" /> ] --- # Formatierung mit `theme_*()` .pull-left45[ <ul> <li class="m1"><span>Mit <highm>theme()</highm> können alle Aspekte eines Plots mit einem vorgefertigten Satz an Einstellungen ästhetisch formatiert werden.</span></li> <li class="m2"><span>Einige <mono>theme</mono>s: <br><br> <ul class="level"> <li><span><mono>theme_gray()</mono></span></li> <li><span><mono>theme_classic()</mono></span></li> <li><span><mono>theme_void()</mono></span></li> <li><span><mono>theme_excel() (<mono>ggthemes</mono>)</mono></span></li> <li><span><mono>theme_economist() (<mono>ggthemes</mono>)</mono></span></li> <li><span><mono>theme_bw()</mono></span></li> </ul> </span></li> </ul> ] .pull-right45[ ```r ggplot(...) + theme_void() ``` <img src="Plotting_files/figure-html/unnamed-chunk-44-1.png" style="display: block; margin: auto;" /> ] --- # Formatierung mit `theme_*()` .pull-left45[ <ul> <li class="m1"><span>Mit <highm>theme()</highm> können alle Aspekte eines Plots mit einem vorgefertigten Satz an Einstellungen ästhetisch formatiert werden.</span></li> <li class="m2"><span>Einige <mono>theme</mono>s: <br><br> <ul class="level"> <li><span><mono>theme_gray()</mono></span></li> <li><span><mono>theme_classic()</mono></span></li> <li><span><mono>theme_void()</mono></span></li> <li><span><mono>theme_excel() (<mono>ggthemes</mono>)</mono></span></li> <li><span><mono>theme_economist() (<mono>ggthemes</mono>)</mono></span></li> <li><span><mono>theme_bw()</mono></span></li> </ul> </span></li> </ul> ] .pull-right45[ ```r ggplot(...) + theme_excel() ``` <img src="Plotting_files/figure-html/unnamed-chunk-46-1.png" style="display: block; margin: auto;" /> ] --- # Formatierung mit `theme_*()` .pull-left45[ <ul> <li class="m1"><span>Mit <highm>theme()</highm> können alle Aspekte eines Plots mit einem vorgefertigten Satz an Einstellungen ästhetisch formatiert werden.</span></li> <li class="m2"><span>Einige <mono>theme</mono>s: <br><br> <ul class="level"> <li><span><mono>theme_gray()</mono></span></li> <li><span><mono>theme_classic()</mono></span></li> <li><span><mono>theme_void()</mono></span></li> <li><span><mono>theme_excel() (<mono>ggthemes</mono>)</mono></span></li> <li><span><mono>theme_economist() (<mono>ggthemes</mono>)</mono></span></li> <li><span><mono>theme_bw()</mono></span></li> </ul> </span></li> </ul> ] .pull-right45[ ```r ggplot(...) + theme_economist() ``` <img src="Plotting_files/figure-html/unnamed-chunk-48-1.png" style="display: block; margin: auto;" /> ] --- # Formatierung mit `theme_*()` .pull-left45[ <ul> <li class="m1"><span>Mit <highm>theme()</highm> können alle Aspekte eines Plots mit einem vorgefertigten Satz an Einstellungen ästhetisch formatiert werden.</span></li> <li class="m2"><span>Einige <mono>theme</mono>s: <br><br> <ul class="level"> <li><span><mono>theme_gray()</mono></span></li> <li><span><mono>theme_classic()</mono></span></li> <li><span><mono>theme_void()</mono></span></li> <li><span><mono>theme_excel() (<mono>ggthemes</mono>)</mono></span></li> <li><span><mono>theme_economist() (<mono>ggthemes</mono>)</mono></span></li> <li><span><mono>theme_bw()</mono></span></li> </ul> </span></li> </ul> ] .pull-right45[ ```r ggplot(...) + theme_bw() ``` <img src="Plotting_files/figure-html/unnamed-chunk-50-1.png" style="display: block; margin: auto;" /> ] --- # Et voila! .pull-left45[ ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, col = class)) + geom_point() + geom_smooth(col = "blue", method = "lm")+ labs( x = "Hubraum in Litern", y = "Autobahn Meilen pro Gallone", title = "MPG Datensatz", subtitle = "Autos mit mehr Hub...", caption = "Quelle: mpg Datensatz...") + theme_bw() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-52-1.png" style="display: block; margin: auto;" /> ] --- class: middle, center <h1><a href="https://therbootcamp.github.io/EDA_2019CSS/_sessions/Plotting/Plotting_practical.html">Practical</a></h1>