In this practical you’ll practice plotting data with the ggplot2
package.
If you don’t have it already, you can access the ggplot2
cheatsheet here https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf. This has a nice overview of all the major functions in ggplot2.
ggplot2
. Try to go through each line of code and see how it works!# -----------------------------------------------
# Examples of using ggplot2 on the mpg data
# ------------------------------------------------
library(tidyverse) # Load tidyverse (which contains ggplot2!)
mpg # Look at the mpg data
# Just a blank space without any aesthetic mappings
ggplot(data = mpg)
# Now add a mapping where engine displacement (displ) and highway miles per gallon (hwy) are mapped to the x and y aesthetics
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) # Map displ to x-axis and hwy to y-axis
# Add points with geom_point()
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_point()
# Add points with geom_count()
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_count()
# Again, but with some additional arguments
# Also using a new theme temporarily
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_point(col = "red", # Red points
size = 3, # Larger size
alpha = .5, # Transparent points
position = "jitter") + # Jitter the points
scale_x_continuous(limits = c(1, 15)) + # Axis limits
scale_y_continuous(limits = c(0, 50)) +
theme_minimal()
# Assign class to the color aesthetic and add labels with labs()
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, col = class)) + # Change color based on class column
geom_point(size = 3, position = 'jitter') +
labs(x = "Engine Displacement in Liters",
y = "Highway miles per gallon",
title = "MPG data",
subtitle = "Cars with higher engine displacement tend to have lower highway mpg",
caption = "Source: mpg data in ggplot2")
# Add a regression line for each class
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(size = 3, alpha = .9) +
geom_smooth(method = "lm")
# Add a regression line for all classes
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(size = 3, alpha = .9) +
geom_smooth(col = "blue", method = "lm")
# Facet by class
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = factor(cyl))) +
geom_point() +
facet_wrap(~ class)
# Another fancier example
ggplot(data = mpg,
mapping = aes(x = cty, y = hwy)) +
geom_count(aes(color = manufacturer)) + # Add count geom (see ?geom_count)
geom_smooth() + # smoothed line without confidence interval
geom_text(data = filter(mpg, cty > 25),
aes(x = cty,y = hwy,
label = rownames(filter(mpg, cty > 25))),
position = position_nudge(y = -1),
check_overlap = TRUE,
size = 5) +
labs(x = "City miles per gallon",
y = "Highway miles per gallon",
title = "City and Highway miles per gallon",
subtitle = "Numbers indicate cars with highway mpg > 25",
caption = "Source: mpg data in ggplot2",
color = "Manufacturer",
size = "Counts")
Open a new R script and save it under the name plotting_practical.R
Load the tidyverse
package (which includes ggplot2).
diamonds
dataset in the ggplot2
package shows information about 50,000 round cut diamonds. Print the diamonds
dataset, it should look like this:diamonds
# A tibble: 53,940 x 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.230 Ideal E SI2 61.5 55. 326 3.95 3.98 2.43
2 0.210 Premium E SI1 59.8 61. 326 3.89 3.84 2.31
3 0.230 Good E VS1 56.9 65. 327 4.05 4.07 2.31
4 0.290 Premium I VS2 62.4 58. 334 4.20 4.23 2.63
5 0.310 Good J SI2 63.3 58. 335 4.34 4.35 2.75
6 0.240 Very Good J VVS2 62.8 57. 336 3.94 3.96 2.48
7 0.240 Very Good I VVS1 62.3 57. 336 3.95 3.98 2.47
8 0.260 Very Good H SI1 61.9 55. 337 4.07 4.11 2.53
9 0.220 Fair E VS2 65.1 61. 337 3.87 3.78 2.49
10 0.230 Very Good H VS1 59.4 61. 338 4.00 4.05 2.39
# ... with 53,930 more rows
ggplot()
, create the following blank plot using the data
and mapping
arguments (but no geom).geom_point()
, add points showing the relationship between the number of carats in the diamonds (carat
) and its price (price
)alpha
argument to geom_point()
color
aesthetic mapping, color the points by their cut.facet_wrap()
function, create different plots for each value of cut
.geom_smooth()
function, add a black, smoothed mean line to each plot (You can also try turning the line into a regression line using the method
argument)Look at the theme help menu with ?theme_bw()
to see a list of all of the standard ggplot themes. Then, try adding one of these themes to your previous plots to see how they change.
The ggthemes
package contains many additional themes. If you don’t have the package already, install it. Then, look at the ggthemes()
vignette by running the following code:
# Open the ggthemes vignette
vignette("ggthemes", package = "ggthemes")
mpg
data using the using the Five Thirty Eight theme. Note that cty
is on the x axis, and hwy
is on the y axis.geom_density()
diamonds
data using the following template:data
argument to diamonds
carat
to the x aestheticgeom_density()
and set the fill color to "tomato1"
theme_minimal()
ggplot(data = XX,
mapping = aes(x = XX)) +
geom_density(fill = "XX") +
labs(x = "XX",
y = "XX",
title = "XX",
subtitle = "XX",
caption = "XX") +
theme_XX()
geom_boxplot()
geom_boxplot()
. Then, create the following boxplot using the following templateggplot(data = XX,
mapping = aes(x = XX, y = log(XX), fill = XX)) +
geom_boxplot() +
labs(y = "XX",
x = "XX",
fill = "XX",
title = "XX",
subtitle = "XX") +
scale_fill_brewer(palette = "XX")
geom_violin()
geom_violin()
. You can also change the color palette in the palette
argument to the scale_fill_brewer()
function. Look at the help menu with ?scale_fill_brewer()
to see all the possibilities. In the plot below, I’m using "Set1"
stat_summary()
function to add summary statistics as geoms to plots. Using the following template, create the following plot showing the mean prices of diamonds for each level of clarity.ggplot(data = XX,
mapping = aes(x = XX, y = XX)) +
stat_summary(fun.y = "mean",
geom = "bar",
fill = "white",
col = "black") +
labs(y = "XX",
x = "XX",
title = "XX",
caption = "XX")
mpg
dataframecoord_flip()
. Using coord_flip()
, flip the x and y coordinates of your previous plot so it looks like this:mpg
dataset, and save it as an object called myplot
<-
add a regression line to the myplot
object with geom_smooth()
. Then evaluate the object to see the updated version. It should now look like this:ggsave()
, save the object as a pdf file called myplot.pdf
. Set the width to 6 inches, and the height to 4 inches. Open the pdf outside of RStudio to make sure it worked!midwest
dataset and look at the help menu to see what values it contains. It should look like this:# A tibble: 437 x 28
PID county state area poptotal popdensity popwhite popblack
<int> <chr> <chr> <dbl> <int> <dbl> <int> <int>
1 561 ADAMS IL 0.0520 66090 1271. 63917 1702
2 562 ALEXANDER IL 0.0140 10626 759. 7054 3496
3 563 BOND IL 0.0220 14991 681. 14477 429
4 564 BOONE IL 0.0170 30806 1812. 29344 127
5 565 BROWN IL 0.0180 5836 324. 5264 547
6 566 BUREAU IL 0.0500 35688 714. 35157 50
7 567 CALHOUN IL 0.0170 5322 313. 5298 1
8 568 CARROLL IL 0.0270 16805 622. 16519 111
9 569 CASS IL 0.0240 13437 560. 13384 16
10 570 CHAMPAIGN IL 0.0580 173025 2983. 146506 16559
# ... with 427 more rows, and 20 more variables: popamerindian <int>,
# popasian <int>, popother <int>, percwhite <dbl>, percblack <dbl>,
# percamerindan <dbl>, percasian <dbl>, percother <dbl>,
# popadults <int>, perchsd <dbl>, percollege <dbl>, percprof <dbl>,
# poppovertyknown <int>, percpovertyknown <dbl>, percbelowpoverty <dbl>,
# percchildbelowpovert <dbl>, percadultpoverty <dbl>,
# percelderlypoverty <dbl>, inmetro <int>, category <chr>
ggplot(data = XX,
mapping = aes(x = XX, y = XX)) +
geom_point(aes(fill = XX, size = XX), shape = 21, color = "white") +
geom_smooth(aes(x = XX, y = XX)) +
labs(
x = "XX",
y = "XX",
title = "XX",
subtitle = "XX",
caption = "XX") +
scale_color_brewer(palette = "XX") +
scale_size(range = c(XX, XX)) +
guides(size = guide_legend(override.aes = list(col = "black")),
fill = guide_legend(override.aes = list(size = 5)))
ggplot(data = XX,
mapping = aes(XX, fill = XX)) +
geom_density(alpha = XX) +
labs(title = "XX",
subtitle = "XX",
caption = "XX",
x = "XX",
y = "XX",
fill = "XX")
geom_tile()
geom_tile()
function. Try creating the following heatplot of statistics of NBA players using the following template:# Read in nba data
nba_long <- read_csv("https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_data/nba_long.csv")
# Look at the data
nba_long
ggplot(XX,
mapping = aes(x = XX, y = XX, fill = XX)) +
geom_tile(colour = "XX") +
scale_fill_gradientn(colors = c("XX", "XX", "XX"))+
labs(x = "XX",
y = "XX",
fill = "XX",
title = "NBA XX performance",
subtitle = "XX",
caption = "XX") +
coord_flip()
psavert
) from the economics
dataset.ACTG175
dataset (the dataset is contained in the speff2trial
package). To do this, you’ll need to use both geom_boxplot()
and geom_point()
. To jitter the points, use the position
argument to geom_point()
, as well as the position_jitter()
function to control how much to jitter the points.midwest_IL <- midwest %>%
filter(state == "XX") %>%
mutate(popdensity_z = (popdensity - mean(popdensity)) / sd(popdensity)) %>%
arrange(desc(popdensity_z)) %>%
mutate(county = factor(county, levels = county)) %>%
slice(1:25)
ggplot(XX, aes(x = XX, y = XX)) +
geom_segment(aes(y = 0,
x = county,
yend = popdensity_z,
xend = county,
col = popdensity_z), size = XX) +
geom_point(size = XX, fill = "white", shape = 21) +
labs(title = "XX",
subtitle = "XX",
Y = "XX",
X = "XX") +
ylim(XX, XX) +
scale_colour_gradient(low = "XX", high = "XX", limits = c(-.1, 9)) +
coord_flip() +
geom_text(aes(label = 1:25)) +
guides(col = FALSE) +
theme_XX() +
theme(panel.grid = element_blank())
Many of the plots in this practical were taken from Selva Prabhakaran’s website http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
The main ggplot2
webpage at http://ggplot2.tidyverse.org/ has great tutorials and examples.
ggplot2
is also great for making maps. For examples, check out Eric Anderson’s page at http://eriqande.github.io/rep-res-web/lectures/making-maps-with-R.html