class: center, middle, inverse, title-slide # ggplot basics ### Intro to data visualization with ggplot2
The R Bootcamp
### November 2021 --- layout: true <div class="my-footer"> <span style="text-align:center"> <span> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/> </span> <a href="https://therbootcamp.github.io/"> <span style="padding-left:82px"> <font color="#7E7E7E"> therbootcamp.github.io </font> </span> </a> <a href="therbootcamp.github.io"> <font color="#7E7E7E"> Intro to data visualization with ggplot2 | Novemeber 2021 </font> </a> </span> </div> --- .pull-left3[ # Tidyverse <ul> <li class="m1"><span>The tidyverse is...</span></li><br> <ul class="level"> <li><span>A collection of user-friendly <high>packages</high> for analyzing <high>tidy data</high></span></li><br> <li><span>An <high>ecosystem</high> for analytics and data science with common design principles</span></li><br> <li><span>A <high>dialect</high> of the R language</span></li> </ul> </ul> ] .pull-right65[ <br><br> <p align="center"> <img src="image/tidyverse_ggplot.png" height = "520px"> </p> ] --- # Modular graphics in <mono>ggplot2</mono> .pull-left45[ <ul> <li class="m1"><span><highm>data</highm>: the data set</span></li> <li class="m2"><span><highm>mapping</highm>: the plot's structure</span></li> <ul class="level"> <li><span>What do the axes represent?</span></li> <li><span>What do size, shapes, colors, etc. represent?</span></li> </ul> <li class="m3"><span><highm>geoms</highm>: geometric shapes illustrating data</high></span></li> <li class="m4"><span><highm>labs</highm>: Plot annotation</high></span></li> <li class="m5"><span><highm>themes</highm>: Aesthetic details</high></span></li> <li class="m6"><span><highm>facets</highm>: Stratify plot according to variable</high></span></li> <li class="m7"><span><highm>scales</highm>: Scaling of dimensions</high></span></li> </ul> ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> ] --- # Wrangling .pull-left45[ ```r sdg_props = project_sdgs %>% # create year variable mutate(year = year(start_date)) %>% # count projects by system, sdg, and year group_by(system, sdg, year) %>% summarize(n = n()) %>% # filter year, system filter(year > 2008 & year < 2022, system == "elsevier", sdg %in% c("SDG-01", "SDG-06")) %>% # normalize within year group_by(year) %>% mutate(prop = n / sum(n)) ``` ] .pull-right45[ ```r sdg_props ``` ``` # A tibble: 26 × 5 # Groups: year [13] system sdg year n prop <chr> <chr> <dbl> <int> <dbl> 1 elsevier SDG-01 2009 14 0.609 2 elsevier SDG-01 2010 17 0.586 3 elsevier SDG-01 2011 24 0.686 4 elsevier SDG-01 2012 24 0.585 5 elsevier SDG-01 2013 29 0.725 6 elsevier SDG-01 2014 21 0.6 7 elsevier SDG-01 2015 26 0.765 8 elsevier SDG-01 2016 34 0.829 # … with 18 more rows ``` ] --- # `ggplot()` .pull-left45[ <ul> <li class="m1"><span>All plots start with <mono>ggplot()</mono></span></li> <li class="m2"><span>Two arguments</span></li> <ul class="level"> <li><span><mono>data</mono> | The data set (<mono>tibble</mono>)</span></li> <li><span><mono>mapping</mono> | The plot structure. Defined using <mono>aes()</mono> </ul> </span></li> </ul> ] .pull-right45[ ```r ggplot(data = sdg_props) ``` <img src="Plotting_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> ] --- # `aes()` .pull-left45[ <ul> <li class="m1"><span><mono>aes()</mono> helps define the structure of the <highm>mapping</highm> argument.</span></li> <li class="m2"><span>Key arguments:</span></li> <ul class="level"> <li><span><mono>x, y</mono> | Defines axes</span></li> <li><span><mono>color,fill</mono> | Defines colors</span></li> <li><span><mono>alpha</mono> | Defines opacity</span></li> <li><span><mono>size</mono> | Defines sizes</span></li> <li><span><mono>shape</mono> | Defines shapes (e.g., circles or squares)</span></li> </ul> </ul> <br> ```r ggplot(data = sdg_props, mapping = aes(x = year, y = prop)) ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> ] --- # <mono>+</mono> .pull-left45[ <ul> <li class="m1"><span>The <mono>+</mono> operator "adds" <high>additional elements</high> to the plot.</span></li> <li class="m1"><span>Not to be confused with the pipe <mono>%>%</mono>.</span></li> </ul> <br> ```r ggplot(data = sdg_props, mapping = aes(x = year, y = prop)) + # Show as points geom_point() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> ] --- # `geom_*()` .pull-left45[ <ul> <li class="m1"><span><mono>geom_*()</mono> functions define which geometric objects are used to illustrate the data.</span></li> <li class="m2"><span>A few examples <mono>geoms</mono>:</span></li> <ul class="level"> <li><span><mono>geom_point()</mono> | for points</span></li> <li><span><mono>geom_line()</mono> | for lines</span></li> <li><span><mono>geom_smooth()</mono> | for smooth curves</span></li> <li><span><mono>geom_bar()</mono> | for bars</span></li> <li><span><mono>geom_boxplot()</mono> | for box-plots </span></li> <li><span><mono>geom_violin()</mono> | for violin-plots </span></li> </ul> </ul> ```r ggplot(data = sdg_props, mapping = aes(x = year, y = prop)) + # Show as points geom_point() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> ] --- # `geom_*()` .pull-left45[ <ul> <li class="m1"><span><mono>geom_*()</mono> functions define which geometric objects are used to illustrate the data.</span></li> </ul> <br> ```r ggplot(data = sdg_props, mapping = aes(x = year, y = prop)) + # Show as lines geom_line() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> ] --- # `aes()` .pull-left45[ <ul> <li class="m1"><span><mono>aes()</mono> helps define the structure of the <highm>mapping</highm> argument.</span></li> <li class="m2"><span>Key arguments:</span></li> <ul class="level"> <li><span><mono>x, y</mono> | Defines axes</span></li> <li><span><mono>color,fill</mono> | Defines colors</span></li> <li><span><mono>alpha</mono> | Defines opacity</span></li> <li><span><mono>size</mono> | Defines sizes</span></li> <li><span><mono>shape</mono> | Defines shapes (e.g., circles or squares)</span></li> </ul> </ul> ```r ggplot(data = sdg_props, mapping = aes(x = year, y = prop, # add color aesthetic col = sdg)) + geom_point() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> ] --- # `geom_*()` .pull-left45[ <ul> <li class="m1"><span><mono>geom_*()</mono> functions define which geometric objects are used to illustrate the data.</span></li> </ul> <br> ```r ggplot(data = sdg_props, mapping = aes(x = year, y = prop, col = sdg)) + # Show as lines geom_line() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> ] --- # `geom_*()` .pull-left45[ <ul> <li class="m1"><span><mono>geom_*()</mono> functions define which geometric objects are used to illustrate the data.</span></li> </ul> <br> ```r ggplot(data = sdg_props, mapping = aes(x = year, y = prop, col = sdg)) + # Show as smoothed curve geom_smooth() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> ] --- # `geom_*()` .pull-left45[ <ul> <li class="m1"><span><mono>geom_*()</mono> functions define which geometric objects are used to illustrate the data.</span></li> </ul> <br> ```r ggplot(data = sdg_props, mapping = aes(x = year, y = prop, col = sdg)) + # Show as points and lines geom_point() + geom_line() ``` ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> ] --- # `facet_*()` .pull-left45[ <ul> <li class="m1"><span>Facetting creates the <high>same plot for groups</high> defined by another variable.</span></li> <li class="m2"><span>Key functions:</span></li> <ul class="level"> <li><span><mono>facet_wrap()</mono></span></li> <li><span><mono>facet_grid()</mono></span></li> </ul> </ul> <br> ] .pull-right45[ <img src="Plotting_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> ] --- # Wrangling .pull-left45[ ```r sdg_props = project_sdgs %>% mutate(year = year(start_date)) %>% group_by(system, sdg, year) %>% summarize(n = n()) %>% # no filtering of systems filter(year > 2008 & year < 2022, sdg %in% c("SDG-01", "SDG-06")) %>% # group by year and system group_by(year, system) %>% mutate(prop = n / sum(n)) ``` ] .pull-right45[ ``` # A tibble: 130 × 5 # Groups: year, system [65] system sdg year n prop <chr> <chr> <dbl> <int> <dbl> 1 aurora SDG-01 2009 8 0.4 2 aurora SDG-01 2010 4 0.222 3 aurora SDG-01 2011 11 0.407 4 aurora SDG-01 2012 17 0.586 5 aurora SDG-01 2013 16 0.5 6 aurora SDG-01 2014 24 0.632 7 aurora SDG-01 2015 15 0.577 8 aurora SDG-01 2016 13 0.542 9 aurora SDG-01 2017 16 0.516 10 aurora SDG-01 2018 25 0.658 11 aurora SDG-01 2019 21 0.677 12 aurora SDG-01 2020 16 0.533 13 aurora SDG-01 2021 13 0.565 14 aurora SDG-06 2009 12 0.6 # … with 116 more rows ``` ] --- .pull-left45[ # `facet_*()` <ul> <li class="m1"><span>Facetting creates the <high>same plot for groups</high> defined by another variable.</span></li> </ul> <br> ```r ggplot(data = sdg_props, mapping = aes(x = year, y = prop, col = sdg)) + geom_point() + geom_line() + # facet by system facet_wrap(~system) ``` ] .pull-right45[ <br><br><br> <img src="Plotting_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> ] --- class: middle, center <h1><a href="https://therbootcamp.github.io/SDGDataViz_2021Nov/">Schedule</a></h1>