Source: https://www.rstudio.com/

Source: https://www.rstudio.com/

Slides

Here are the introduction slides for this practical on Plotting 1.0: ggplot!

Overview

In this practical you’ll practice plotting data with the ggplot2 package.

Cheatsheet

If you don’t have it already, you can access the ggplot2 cheatsheet here https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf. This has a nice overview of all the major functions in ggplot2.

Examples

# -----------------------------------------------
# Examples of using ggplot2 on the mpg data
# ------------------------------------------------

library(tidyverse)         # Load tidyverse (which contains ggplot2!)

mpg # Look at the mpg data

# Just a blank space without any aesthetic mappings
ggplot(data = mpg)

# Now add a mapping where engine displacement (displ) and highway miles per gallon (hwy) are mapped to the x and y aesthetics
ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy))   # Map displ to x-axis and hwy to y-axis

#  Add points with geom_point()
ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy)) +
       geom_point()     

#  Add points with geom_count()
ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy)) +
       geom_count()   

# Again, but with some additional arguments
# Also using a new theme temporarily

ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy)) +
       geom_point(col = "red",                  # Red points
                  size = 3,                     # Larger size
                  alpha = .5,                   # Transparent points
                  position = "jitter") +        # Jitter the points         
         scale_x_continuous(limits = c(1, 15)) +  # Axis limits
         scale_y_continuous(limits = c(0, 50)) +
  theme_minimal()


# Assign class to the color aesthetic and add labels with labs()

ggplot(data = mpg, 
  mapping = aes(x = displ, y = hwy, col = class)) +  # Change color based on class column
  geom_point(size = 3, position = 'jitter') +
  labs(x = "Engine Displacement in Liters",
       y = "Highway miles per gallon",
       title = "MPG data",
       subtitle = "Cars with higher engine displacement tend to have lower highway mpg",
       caption = "Source: mpg data in ggplot2")
  

# Add a regression line for each class

ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy, color = class)) +
  geom_point(size = 3, alpha = .9) + 
  geom_smooth(method = "lm")

# Add a regression line for all classes

ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy, color = class)) +
  geom_point(size = 3, alpha = .9) + 
  geom_smooth(col = "blue", method = "lm")


# Facet by class
ggplot(data = mpg,
       mapping = aes(x = displ, 
                     y = hwy, 
                     color = factor(cyl))) + 
  geom_point() +
  facet_wrap(~ class) 


# Another fancier example

ggplot(data = mpg, 
       mapping = aes(x = cty, y = hwy)) + 
       geom_count(aes(color = manufacturer)) +     # Add count geom (see ?geom_count)
       geom_smooth() +                   # smoothed line without confidence interval
       geom_text(data = filter(mpg, cty > 25), 
                 aes(x = cty,y = hwy, 
                     label = rownames(filter(mpg, cty > 25))),
                     position = position_nudge(y = -1), 
                                check_overlap = TRUE, 
                     size = 5) + 
       labs(x = "City miles per gallon", 
            y = "Highway miles per gallon",
            title = "City and Highway miles per gallon", 
            subtitle = "Numbers indicate cars with highway mpg > 25",
            caption = "Source: mpg data in ggplot2",
            color = "Manufacturer", 
            size = "Counts")

Tasks

Getting the data and project setup

  1. Open a new R script and save it under the name plotting_practical.R

  2. Load the tidyverse package (which includes ggplot2).

Building a plot step-by-step

  1. The diamonds dataset in the ggplot2 package shows information about 50,000 round cut diamonds. Print the diamonds dataset, it should look like this:
diamonds
# A tibble: 53,940 x 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1 0.230 Ideal     E     SI2      61.5   55.   326  3.95  3.98  2.43
 2 0.210 Premium   E     SI1      59.8   61.   326  3.89  3.84  2.31
 3 0.230 Good      E     VS1      56.9   65.   327  4.05  4.07  2.31
 4 0.290 Premium   I     VS2      62.4   58.   334  4.20  4.23  2.63
 5 0.310 Good      J     SI2      63.3   58.   335  4.34  4.35  2.75
 6 0.240 Very Good J     VVS2     62.8   57.   336  3.94  3.96  2.48
 7 0.240 Very Good I     VVS1     62.3   57.   336  3.95  3.98  2.47
 8 0.260 Very Good H     SI1      61.9   55.   337  4.07  4.11  2.53
 9 0.220 Fair      E     VS2      65.1   61.   337  3.87  3.78  2.49
10 0.230 Very Good H     VS1      59.4   61.   338  4.00  4.05  2.39
# ... with 53,930 more rows
  1. Using ggplot(), create the following blank plot using the data and mapping arguments (but no geom).

  1. Now, using geom_point(), add points showing the relationship between the number of carats in the diamonds (carat) and its price (price)

  1. Make the points transparent using the alpha argument to geom_point()

  1. Using the color aesthetic mapping, color the points by their cut.

  1. Using the facet_wrap() function, create different plots for each value of cut.

  1. Using the geom_smooth() function, add a black, smoothed mean line to each plot (You can also try turning the line into a regression line using the method argument)

Playing with themes

  1. Look at the theme help menu with ?theme_bw() to see a list of all of the standard ggplot themes. Then, try adding one of these themes to your previous plots to see how they change.

  2. The ggthemes package contains many additional themes. If you don’t have the package already, install it. Then, look at the ggthemes() vignette by running the following code:

# Open the ggthemes vignette
vignette("ggthemes", package = "ggthemes")
  1. Now, create the following plot from the mpg data using the using the Five Thirty Eight theme. Note that cty is on the x axis, and hwy is on the y axis.

Density geom with geom_density()

  1. Create the following density plot of prices from the diamonds data using the following template:
  • Set the data argument to diamonds
  • Map carat to the x aesthetic
  • Add a density geom with geom_density() and set the fill color to "tomato1"
  • Add labels
  • Use the minimal theme with theme_minimal()
ggplot(data = XX, 
       mapping = aes(x = XX)) + 
       geom_density(fill = "XX") + 
       labs(x = "XX", 
            y = "XX", 
            title = "XX",
            subtitle = "XX",
            caption = "XX") +
  theme_XX()

Boxplot geom geom_boxplot()

  1. Look at the help menu for geom_boxplot(). Then, create the following boxplot using the following template
ggplot(data = XX,
  mapping = aes(x = XX, y = log(XX), fill = XX)) + 
  geom_boxplot()  + 
  labs(y = "XX", 
       x = "XX", 
       fill = "XX",
       title = "XX",
       subtitle = "XX") +
  scale_fill_brewer(palette = "XX")

Violin geom geom_violin()

  1. Now make the following plot using geom_violin(). You can also change the color palette in the palette argument to the scale_fill_brewer() function. Look at the help menu with ?scale_fill_brewer() to see all the possibilities. In the plot below, I’m using "Set1"

Summary statistics

  1. You can use the stat_summary() function to add summary statistics as geoms to plots. Using the following template, create the following plot showing the mean prices of diamonds for each level of clarity.
ggplot(data = XX,
  mapping = aes(x = XX, y = XX)) + 
stat_summary(fun.y = "mean", 
             geom = "bar", 
             fill = "white", 
             col = "black") +
  labs(y = "XX", 
       x = "XX", 
       title = "XX", 
       caption = "XX")

  1. Now, create the following plot from the mpg dataframe

  1. You can easily flip the coordinates of a plot by using coord_flip(). Using coord_flip(), flip the x and y coordinates of your previous plot so it looks like this:

Saving plots as objects

  1. Create the following plot from the mpg dataset, and save it as an object called myplot

  1. Now, using object assignment <- add a regression line to the myplot object with geom_smooth(). Then evaluate the object to see the updated version. It should now look like this:

  1. Using ggsave(), save the object as a pdf file called myplot.pdf. Set the width to 6 inches, and the height to 4 inches. Open the pdf outside of RStudio to make sure it worked!

Demographic information of midwest counties in the US

  1. Print the midwest dataset and look at the help menu to see what values it contains. It should look like this:
# A tibble: 437 x 28
     PID county    state   area poptotal popdensity popwhite popblack
   <int> <chr>     <chr>  <dbl>    <int>      <dbl>    <int>    <int>
 1   561 ADAMS     IL    0.0520    66090      1271.    63917     1702
 2   562 ALEXANDER IL    0.0140    10626       759.     7054     3496
 3   563 BOND      IL    0.0220    14991       681.    14477      429
 4   564 BOONE     IL    0.0170    30806      1812.    29344      127
 5   565 BROWN     IL    0.0180     5836       324.     5264      547
 6   566 BUREAU    IL    0.0500    35688       714.    35157       50
 7   567 CALHOUN   IL    0.0170     5322       313.     5298        1
 8   568 CARROLL   IL    0.0270    16805       622.    16519      111
 9   569 CASS      IL    0.0240    13437       560.    13384       16
10   570 CHAMPAIGN IL    0.0580   173025      2983.   146506    16559
# ... with 427 more rows, and 20 more variables: popamerindian <int>,
#   popasian <int>, popother <int>, percwhite <dbl>, percblack <dbl>,
#   percamerindan <dbl>, percasian <dbl>, percother <dbl>,
#   popadults <int>, perchsd <dbl>, percollege <dbl>, percprof <dbl>,
#   poppovertyknown <int>, percpovertyknown <dbl>, percbelowpoverty <dbl>,
#   percchildbelowpovert <dbl>, percadultpoverty <dbl>,
#   percelderlypoverty <dbl>, inmetro <int>, category <chr>
  1. Using the following code as a template, create the following plot showing the relationship between college education and poverty
ggplot(data = XX, 
    mapping = aes(x = XX, y = XX)) + 
    geom_point(aes(fill = XX, size = XX), shape = 21, color = "white") + 
    geom_smooth(aes(x = XX, y = XX)) +
    labs(
        x = "XX", 
        y = "XX", 
        title = "XX",
        subtitle = "XX",
        caption = "XX") + 
    scale_color_brewer(palette = "XX") + 
    scale_size(range = c(XX, XX)) +
    guides(size = guide_legend(override.aes = list(col = "black")), 
           fill = guide_legend(override.aes = list(size = 5)))

  1. Create the following density plot showing the density of inhabitants with a college education in different states using the following template
ggplot(data = XX, 
       mapping = aes(XX, fill = XX)) + 
  geom_density(alpha = XX) + 
  labs(title = "XX", 
       subtitle = "XX",
       caption = "XX",
       x = "XX",
       y = "XX",
       fill = "XX")

Heatplots with geom_tile()

  1. You can create heatplots using the geom_tile() function. Try creating the following heatplot of statistics of NBA players using the following template:
# Read in nba data
nba_long <- read_csv("https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_data/nba_long.csv")

# Look at the data
nba_long

ggplot(XX, 
       mapping = aes(x = XX, y = XX, fill = XX)) + 
  geom_tile(colour = "XX") + 
  scale_fill_gradientn(colors = c("XX", "XX", "XX"))+ 
  labs(x = "XX", 
       y = "XX", 
       fill = "XX", 
       title = "NBA XX performance",
       subtitle = "XX",
       caption = "XX") +
  coord_flip()

  1. Make the following plot of savings data (psavert) from the economics dataset.

  1. Make the following plot from the ACTG175 dataset (the dataset is contained in the speff2trial package). To do this, you’ll need to use both geom_boxplot() and geom_point(). To jitter the points, use the position argument to geom_point(), as well as the position_jitter() function to control how much to jitter the points.

  1. Create the following lollipop chart from the Midwest data.
midwest_IL <- midwest %>% 
  filter(state == "XX") %>%
  mutate(popdensity_z = (popdensity - mean(popdensity)) / sd(popdensity)) %>%
  arrange(desc(popdensity_z)) %>%
  mutate(county = factor(county, levels = county)) %>%
  slice(1:25)

ggplot(XX, aes(x = XX, y = XX)) + 
  geom_segment(aes(y = 0, 
                   x = county, 
                   yend = popdensity_z, 
                   xend = county, 
                   col = popdensity_z), size = XX) +
  geom_point(size = XX, fill = "white", shape = 21)  +
  labs(title = "XX", 
       subtitle = "XX",
       Y = "XX",
       X = "XX") + 
  ylim(XX, XX) +
  scale_colour_gradient(low = "XX", high = "XX", limits = c(-.1, 9)) +
  coord_flip() +
  geom_text(aes(label = 1:25)) +
  guides(col = FALSE) +
  theme_XX() +
  theme(panel.grid = element_blank())

References and Further Reading