This is the continuation of the previous tutorial. In the previous post, we made a histogram and provide annotations inside the graph showing the length-at-first maturity as well as the sections for juveniles, mature, and the mega-spawners. In this tutorial, I am going to show how to use facets
to display a subset of the data, also termed as small multiples. This type of graph is especially useful to compare data across groups, for example, length frequency distribution of a species per fishing gear.
Extracted from the official reference of ggplot2:
Facetting generates small multiples, each displaying a different subset of the data. Facets are an alternative to aesthetics for displaying additional discrete variables.
There are two functions: facet_wrap()
and facet_grid()
. In this tutorial, we will only be dealing with the facet_wrap()
.
Preliminaries
To begin, let us load the required packages.
library(ggplot2)
library(dplyr)
library(magrittr)
library(ggthemes)
We will use again the data of Coregonus artedii.
head(cisco_data)
## X lakeid year4 sampledate gearid spname length weight sex
## 1 1 TR 1981 8/11/1981 VGN032 CISCO 140 21.4 F
## 2 2 TR 1981 8/10/1981 VGN032 CISCO 146 22.3 F
## 3 3 TR 1981 8/11/1981 VGN032 CISCO 147 23.3 F
## 4 4 TR 1981 8/19/1981 VGN032 CISCO 153 23.5 F
## 5 5 TR 1981 8/19/1981 VGN032 CISCO 150 24.0 F
## 6 6 TR 1981 8/19/1981 VGN032 CISCO 152 24.0 F
Furthermore, the objects (variables) from the previous tutorial will be use as well as the custom theme. I modified the theme: axis.text.x = element_text(size = 10, angle = 25)
to axis.text.x = element_text(size = 10)
.
cisco_data_range <- max(cisco_data$length) - min(cisco_data$length)
class_size <- 20
class_interval <- cisco_data_range / class_size
cisco_lm_mm <- 171
Plotting
Length frequency distribution by fishing gear
We will begin to display the length frequency distribution by gear ID. In our data, there are 9 fishing gears used to catch this species.
First, we will make an object containing the title for our graph:
my_title_gear <- expression(paste("Length frequency of Cisco (", italic("Coregonus artedi"), ") per gear"))
gear_multiples <- ggplot(data = cisco_data, aes(x = length)) +
geom_histogram(binwidth = class_interval, color = "black", fill = "gray") +
labs(title = my_title_gear,
subtitle = "The dotted red line represents the length at first maturity (171 mm)",
x = "Total Length (mm)",
y = "Frequency") +
geom_vline(aes(xintercept = cisco_lm_mm), color = "red",
linetype = "dotted", size = 0.5) +
facet_wrap(~ gearid) +
theme_pub()
print(gear_multiples)
According to the book R for Data Science by Grolemund and Wickham, on the topic Facets:
To facet your plot by a single variable, use
facet_wrap()
. The first argument offacet_wrap()
should be a formula, which you create with~
followed by a variable name (here “formula” is the name of a data structure in R, not a synonym for “equation”). The variable that you pass tofacet_wrap()
should be discrete.
In our case, the gearid is the variable that must be supplied to the function facet_wrap()
to create a multiple subset of length frequency distribution based on the fishing gears used.
Length frequency distribution by year
Next, we will look at the length frequencies per year.
my_title_year <- expression(paste("Length frequency of Cisco (", italic("Coregonus artedi"), ") per year"))
year_multiples <- ggplot(data = cisco_data, aes(x = length)) +
geom_histogram(binwidth = class_interval, color = "black", fill = "gray") +
labs(title = my_title_year,
subtitle = "The dotted red line represents the length at first maturity (171 mm)",
x = "Total Length (mm)",
y = "Frequency") +
geom_vline(aes(xintercept = cisco_lm_mm), color = "red",
linetype = "dotted", size = 0.5) +
facet_wrap(~ year4, nrow = 5) +
theme_pub()
print(year_multiples)
Length frequency distribution by sex
Lastly, we will going to compare the length frequencies of this species per sex. This is tricky because the column for sex contains NAs. In this tutorial, we will only get the length frequencies of male and female Coregonus artedii, indicated by M and F, respectively, in the data.
We will make a new object containing the sexes of the species:
cisco_sex <- cisco_data %>%
filter(sex == c("F", "M"))
The filter
function is part of the dplyr
package included in the tidyverse
. The function will, as you guess, filter the rows of the data frame and will look only for the rows in the sex column containing the values inside the c("F", "M")
, which is the female and the male. The ==
is a relational operator which means equal to.
The same as above, we will make a separate title for the graph:
my_title_sex <- expression(paste("Length frequency of Cisco (", italic("Coregonus artedi"), ") per sex"))
sex_multiples <- ggplot(data = cisco_sex, aes(x = length)) +
geom_histogram(binwidth = class_interval, color = "black", fill = "gray") +
labs(title = my_title_sex,
subtitle = "The dotted red line represents the length at first maturity (171 mm)",
x = "Total Length (mm)",
y = "Frequency") +
geom_vline(aes(xintercept = cisco_lm_mm), color = "red",
linetype = "dotted", size = 0.5) +
facet_wrap(~ sex) +
theme_pub()
print(sex_multiples)
Conclusion
Hooray! We made it. As you can see, it is very easy to make a small multiples using facet_wrap()
if we are only subsetting single variable. It is also easy to reproduce the graph even if you update the original data - you will not going to do a series of manual adjustments if there are modifications in your data.
Hope you enjoy this short tutorial.