In this practical you’ll practice plotting data with the amazing ggplot2
package.
File | Rows | Columns |
---|---|---|
mcdonalds.csv | 260 | 24 |
Package | Installation |
---|---|
tidyverse |
install.packages("tidyverse") |
ggthemes |
install.packages("ggthemes") |
plotly |
install.packages("plotly") |
The following examples will take you through the steps of creating both simple and complex plots with ggplot2
. Try to go through each line of code and see how it works!
# -----------------------------------------------
# Examples of using ggplot2 on the mpg data
# ------------------------------------------------
library(tidyverse) # Load tidyverse (which contains ggplot2!)
mpg # Look at the mpg data
# Just a blank space without any aesthetic mappings
ggplot(data = mpg)
# Now add a mapping where engine displacement (displ) and highway miles per gallon (hwy) are
# mapped to the x and y aesthetics
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) # Map displ to x-axis and hwy to y-axis
# Add points with geom_point()
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_point()
# Add points with geom_count()
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_count()
# Again, but with some additional arguments
# Also using a new theme temporarily
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_point(col = "red", # Red points
size = 3, # Larger size
alpha = .5, # Transparent points
position = "jitter") + # Jitter the points
scale_x_continuous(limits = c(1, 15)) + # Axis limits
scale_y_continuous(limits = c(0, 50)) +
theme_minimal()
# Assign class to the color aesthetic and add labels with labs()
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, col = class)) + # Change color based on class column
geom_point(size = 3, position = 'jitter') +
labs(x = "Engine Displacement in Liters",
y = "Highway miles per gallon",
title = "MPG data",
subtitle = "Cars with higher engine displacement tend to have lower highway mpg",
caption = "Source: mpg data in ggplot2")
# Add a regression line for each class
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(size = 3, alpha = .9) +
geom_smooth(method = "lm")
# Add a regression line for all classes
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(size = 3, alpha = .9) +
geom_smooth(col = "blue", method = "lm")
# Facet by class
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = factor(cyl))) +
geom_point() +
facet_wrap(~ class)
# Another fancier example
ggplot(data = mpg,
mapping = aes(x = cty, y = hwy)) +
geom_count(aes(color = manufacturer)) + # Add count geom (see ?geom_count)
geom_smooth() + # smoothed line without confidence interval
geom_text(data = filter(mpg, cty > 25),
aes(x = cty,y = hwy,
label = rownames(filter(mpg, cty > 25))),
position = position_nudge(y = -1),
check_overlap = TRUE,
size = 5) +
labs(x = "City miles per gallon",
y = "Highway miles per gallon",
title = "City and Highway miles per gallon",
subtitle = "Numbers indicate cars with highway mpg > 25",
caption = "Source: mpg data in ggplot2",
color = "Manufacturer",
size = "Counts")
The main ggplot2
webpage at http://ggplot2.tidyverse.org/ has great tutorials and examples.
Check out Selva Prabhakaran’s website for a nice gallery of ggplot2 graphics http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
ggplot2
is also great for making maps. For examples, check out Eric Anderson’s page at http://eriqande.github.io/rep-res-web/lectures/making-maps-with-R.html
Open your baselrbootcamp
R project. It should already have the folders 1_Data
and 2_Code
. Make sure that the data files listed in the Datasets
section above are in your 1_Data
folder
Open a new R script. At the top of the script, using comments, write your name and the date. Save it as a new file called plotting_practical.R
in the 2_Code
folder.
Using library()
load the set of packages for this practical listed in the packages section above.
## NAME
## DATE
## Plotting Practical
library(XX)
library(XX)
#...
For this practical, we’ll use the mcondalds.csv
data which contains nutrition information about items from McDonalds. Using read_csv()
, load the data into R and store it as a new object called mcdonalds
.
Take a look at the first few rows of the dataset(s) by printing them to the console.
You’ll notice that the mcdonalds
data frame as many column names with spaces and ‘bad’ characters like parentheses. Run the following code to fix that!
# Clean up the names of mcdonalds
mcdonalds <- mcdonalds %>%
select(-contains("% Daily Value")) %>% # Remove all '% Daily Value' columns
rename_all(.funs = ~ gsub(" ", "", .)) # no more spaces!
In this section, you’ll build the following plot step by step
ggplot()
, create the following blank plot using the data
and mapping
arguments (but no geom). Use calories
for the x aesthetic and SaturatedFat
for the y aestheticggplot(data = mcdonalds,
mapping = aes(x = XX, y = XX))
geom_point()
, add points to the plotggplot(data = mcdonalds,
mapping = aes(x = XX, y = XX)) +
geom_point()
color
aesthetic mapping, color the points by their Category.ggplot(mcdonalds, aes(x = XX, y = XX, col = XX)) +
geom_point()
geom_smooth()
.ggplot(mcdonalds, aes(x = XX, y = XX, col = XX)) +
geom_point() +
geom_smooth()
ggplot(mcdonalds, aes(x = XX, y = XX, col = XX)) +
geom_point() +
geom_smooth(col = "XX")
labs()
functionggplot(mcdonalds, aes(x = XX, y = XX, col = XX)) +
geom_point() +
geom_smooth(col = "XX") +
labs(title = "XX",
subtitle = "XX",
caption = "XX")
xlim()
ggplot(mcdonalds, aes(x = XX, y = XX, col = XX)) +
geom_point() +
geom_smooth(col = "XX") +
labs(title = "XX",
subtitle = "XX",
caption = "XX") +
xlim(XX, XX)
theme_minimal()
. You should now have the final plot!ggplot(mcdonalds, aes(x = XX, y = XX, col = XX)) +
geom_point() +
geom_smooth(col = "XX") +
labs(title = "XX",
subtitle = "XX",
caption = "XX")+
xlim(XX, XX) +
theme_minimal()
ggplot(data = mcdonalds, aes(x = XX, y = XX, fill = XX)) +
geom_violin() +
guides(fill = FALSE) +
labs(title = "XX",
subtitle = "XX")
+ stat_summary(fun.y = "mean", geom = "point", col = "white", size = 4)
to include points showing the mean of each distributionNow add + geom_jitter(width = .1, alpha = .5)
to your plot, what do you see?
Play around with your plotting arguments to see how the results change! Each time you make a change, run the plot again to see your new output!
stat_summary()
from “mean” to “median”stat_summary()
to something much bigger (or smaller).width
argument in geom_jitter()
to width = 0
geom_violin()
, try geom_boxplot()
fill = Category
aesthetic entirely.ggplot(XX, aes(x = XX, y = XX)) +
geom_point(alpha = .2) +
facet_wrap(~ XX) +
labs(title = "XX",
subtitle = "XX") +
theme_minimal()
Category
geom_smooth()
Create a scatterplot showing the relationship between Cholesterol and Protein
Color the points according to their Calories by specifying the col
aesthetic
Change the colors by including the additional argument + scale_colour_gradient(low = "blue", high = "red")
Use a gray color palette by using scale_color_grey()
instead of scale_colour_gradient()
Customize!
colors()
. Then, use two new colors in your plot.ggplot(XX, aes(x = XX, y = X)) +
stat_summary(geom = "bar",
fun.y = "mean") +
labs(title = "XX")
geom_point()
, geom_count()
or geom_jitter()
mcdonalds_gg
using mcdonalds_gg <- ggplot(...)
mcdonalds_gg <- ggplot(...) # Include your plotting code here
Evaluate your mcdonalds_gg
object to see that it does indeed contain your plot.
Save your plot to a pdf file called mcdonalds.pdf
using ggsave()
. When you finish, open your plot to see how it looks!
# Save mcdonalds_gg to a pdf file
ggsave(filename = "mcdonalds",
device = "pdf",
plot = mcdonalds_gg,
width = 4,
height = 4,
units = "in")
Play around with the width
and height
arguments to change the dimensions of the plot.
Customize your code to create a jpeg image called mcdonalds.jpeg
Let’s create the following plot with additional point labels using `geom_text():
ggplot(mcdonalds, aes(x = XX,
y = XX,
col = XX)) +
geom_point() +
xlim(XX, XX) +
ylim(XX, XX) +
theme_minimal() +
labs(title = "XX")
Try adding labels to the plot indicating which item each point represents by adding + geom_text()
.
Where are the labels? Ah we didn’t tell ggplot which column in the data represents the item descriptions. Fix this by specifying the label
aesthetic in your first call to the aes()
function. That is, include label = Item
underneath col = XX
. Now you should see lots of labels!
Customize your geom_text()
by including the arguments: geom_text(col = "black", check_overlap = TRUE, hjust = "left")
Using the data
argument in geom_text()
specify that the labels should only apply to items over 1100 calories (hint: geom_point(data = mcdonalds %>% filter(XX > XX))
)
Play around!
size
aesthetic.facet_wrap(~ Category)
theme_excel()
included in the ggthemes
package.plotly::ggplotly()
ggplotly()
function from the plotly
package, you can turn any ggplot object into an interactive plot like the one below! Run the following code to see it in action.# Create a standard ggplot object
MyPlot <- ggplot(data = mcdonalds,
aes(x = Calories, y = TotalFat, col = Category)) +
geom_point()
# Make it interactive with ggplotly()!
library(plotly)
ggplotly(MyPlot)
Play around with your plot! See what happens when you hover over the points with your mouse. You can even zoom in by dragging your mouse.
Try turning one of your favorite previous plots into an interactive Plotly plot using the ggplotly()
function!