In the spirit of Emily Riederer’s ugliest
ggplot ever, we’ll play around with ggplot
code in
order to learn how it works. The goal: make the ugliest plot
possible.
We’ll load in our packages below:
# general use
library(tidyverse) # general tidying and visualization: ggplot is loaded by default with tidyverse
library(lterdatasampler) # data we're using comes from this package
library(lubridate) # working with dates
library(here) # folder organization
# extras
library(patchwork) # arranging plots
library(magick) # putting images into ggplots
Note: lterdatasampler
has to be installed from the
GitHub repo using the code below (copy, paste, and run in the
console):
remotes::install_github("lter/lterdatasampler")
Today, we’ll use the Plum
Island fiddler crab data from lterdatasampler
to
visualize relationships between latitude and crab size. Read the linked
vignette to learn about Bergmann’s rule!
The Plum Island LTER has a data set of crab sizes (column
size
) from Summer 2016 at 13 different marshes spanning 12
degrees latitude. A sample is below, but try View(pie_crab)
in the console to see the whole data frame.
pie_crab %>%
slice_sample(n = 5)
## # A tibble: 5 × 9
## date latitude site size air_temp air_temp_sd water_temp water_…¹ name
## <date> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 2016-07-28 41.6 NB 17.0 12.2 9.48 17.5 7.86 Narr…
## 2 2016-08-12 42.2 BC 11.4 11.6 9.53 14.0 6.9 Bare…
## 3 2016-08-09 37.2 VCR 14.0 15.0 8.41 17.6 8.43 Virg…
## 4 2016-08-13 42.7 PIE 17.7 10.3 9.45 14.3 4.84 Plum…
## 5 2016-08-01 34.7 RC 17.4 18.6 8.40 20.5 7.00 Rach…
## # … with abbreviated variable name ¹water_temp_sd
Just to make things a little more interesting, I’m going to split up
the dates into years, months, and days and save that as a new data
frame, crab_data
.
crab_data <- pie_crab %>%
# extracting month from the date column using lubridate::month()
# also making this a factor (instead of numeric) using as.factor()
mutate(month = as.factor(month(date)))
ggplot
grammarggplot
works in layers. The code to make a plot can
vary, but always includes:
1. the ggplot()
call: this tells R that you want to use the
function ggplot()
in ggplot
to plot
things.
2. data
and aesthetics
within that
ggplot()
call: tells ggplot
to use a specific
data frame any variables in that data frame that should be represented
in the plot (for example, x- and y- axes, colors, shapes)
3. a geom_()
: short for “geometry”, geom_()
calls tell ggplot
what kind of plot you want to make. Try
?geom_
in the console to see the different options.
# step 1: call ggplot
ggplot(
# step 2: specify the data and the aesthetics
# plotting latitude on the x-axis and crab size on the y-axis
data = crab_data, aes(x = latitude, y = size)) +
# step 3: specify a geom - in this case, we're creating a scatter plot
geom_point()
Note that when you’re adding on layers in ggplot
, you’ll
use the +
instead of the %>%
operator. This
is because ggplot()
is the function call, but everything
else you add on is a modifier of the ggplot()
plotting
function (instead of a new function doing something different).
So we’ve just made this plot. But how can we make it worse?
ggplot()
takes aesthetics from the data frame, so I’m
going to color the points by site and make the shapes represent month.
I’m also going to make a jitter plot, which is a scatter plot with the
points “jittered”, or randomly shaken up so that it’s easier to see the
overlap (or be chaotic). I’m also going to facet the plot by month using
facet_wrap()
, which is a useful function when you’re trying
to see differences between variables in different panels (or you can use
facet_grid()
, which does essentially the same thing).
ggplot(data = crab_data, aes(x = latitude, y = size)) +
# putting the aesthetics in here: color points by site, shape points by month
geom_jitter(aes(color = site, shape = month),
# anything that doesn't have to do with variables (like point size or transparency) goes outside the aesthetics
size = 3, alpha = 0.6) +
# facet by month
facet_wrap(~ month)