This tutorial continues from the previous one, where we created a histogram with annotations for length-at-first maturity, juveniles, mature, and mega-spawners. In this tutorial, we will learn how to use facets
to display a subset of the data, also known as small multiples. This type of graph is useful for comparing data across groups, such as the length frequency distribution of a species by fishing gear.
Extracted from the official reference of ggplot2:
Facetting generates small multiples, each displaying a different subset of the data. Facets are an alternative to aesthetics for displaying additional discrete variables.
There are two functions: facet_wrap()
and facet_grid()
. In this tutorial, we will only be dealing with the facet_wrap()
.
Preliminaries
To begin, let us load the required packages.
library(ggplot2)
library(dplyr)
library(magrittr)
library(ggthemes)
We will use again the data of Coregonus artedii.
head(cisco_data)
## V1 lakeid year4 sampledate gearid spname length weight sex
## <int> <char> <int> <char> <char> <char> <int> <num> <char>
## 1: 1 TR 1981 8/11/1981 VGN032 CISCO 140 21.4 F
## 2: 2 TR 1981 8/10/1981 VGN032 CISCO 146 22.3 F
## 3: 3 TR 1981 8/11/1981 VGN032 CISCO 147 23.3 F
## 4: 4 TR 1981 8/19/1981 VGN032 CISCO 153 23.5 F
## 5: 5 TR 1981 8/19/1981 VGN032 CISCO 150 24.0 F
## 6: 6 TR 1981 8/19/1981 VGN032 CISCO 152 24.0 F
Additionally, we will use the objects (variables) from the previous tutorial, as well as the custom theme. However, I made some modifications to the theme: axis.text.x = element_text(size = 10, angle = 25)
to axis.text.x = element_text(size = 10)
.
cisco_data_range <- max(cisco_data$length) - min(cisco_data$length)
class_size <- 20
class_interval <- cisco_data_range / class_size
cisco_lm_mm <- 171
Plotting
Length frequency distribution by fishing gear
We will now display the length frequency distribution by gear ID. There are 9 fishing gears used to catch this species in our data.
First, we will create an object that contains the title for our graph.
my_title_gear <- expression(paste("Length frequency of Cisco (", italic("Coregonus artedi"), ") per gear"))
gear_multiples <- ggplot(data = cisco_data, aes(x = length)) +
geom_histogram(binwidth = class_interval, color = "black", fill = "gray") +
labs(title = my_title_gear,
subtitle = "The dotted red line represents the length at first maturity (171 mm)",
x = "Total Length (mm)",
y = "Frequency") +
geom_vline(aes(xintercept = cisco_lm_mm), color = "red",
linetype = "dotted", size = 0.5) +
facet_wrap(~ gearid) +
theme_pub()
print(gear_multiples)
According to the book R for Data Science by Grolemund and Wickham, on the topic Facets:
To facet your plot by a single variable, use
facet_wrap()
. The first argument offacet_wrap()
should be a formula, which you create with~
followed by a variable name (here “formula” is the name of a data structure in R, not a synonym for “equation”). The variable that you pass tofacet_wrap()
should be discrete.
In our case, the gearid
variable is the input to the facet_wrap()
function, which creates multiple subsets of the length frequency distribution based on the fishing gears used.
Length frequency distribution by year
Next, we will look at the length frequencies per year.
my_title_year <- expression(paste("Length frequency of Cisco (", italic("Coregonus artedi"), ") per year"))
year_multiples <- ggplot(data = cisco_data, aes(x = length)) +
geom_histogram(binwidth = class_interval, color = "black", fill = "gray") +
labs(title = my_title_year,
subtitle = "The dotted red line represents the length at first maturity (171 mm)",
x = "Total Length (mm)",
y = "Frequency") +
geom_vline(aes(xintercept = cisco_lm_mm), color = "red",
linetype = "dotted", size = 0.5) +
facet_wrap(~ year4, nrow = 5) +
theme_pub()
print(year_multiples)
Length frequency distribution by sex
Finally, we will compare the length frequencies of this species by sex. This is a bit tricky because the sex
column contains some NA values. In this tutorial, we will only consider the length frequencies of male and female Coregonus artedii, which are indicated by M and F, respectively, in the data.
To do this, we will create a new object containing the sexes of the species:
cisco_sex <- cisco_data %>%
filter(sex == c("F", "M"))
The filter()
function is part of the dplyr
package. As you can guess, this function filters the rows of a data frame. In this case, it will look for rows in the sex
column that contain the values F or M, which represent female and male, respectively. The ==
(double equals sign) operator is a relational operator that means “equal to”.
We will also create a separate title for the graph. This will make it easier to identify and understand the data that is being represented.
my_title_sex <- expression(paste("Length frequency of Cisco (", italic("Coregonus artedi"), ") per sex"))
sex_multiples <- ggplot(data = cisco_sex, aes(x = length)) +
geom_histogram(binwidth = class_interval, color = "black", fill = "gray") +
labs(title = my_title_sex,
subtitle = "The dotted red line represents the length at first maturity (171 mm)",
x = "Total Length (mm)",
y = "Frequency") +
geom_vline(aes(xintercept = cisco_lm_mm), color = "red",
linetype = "dotted", size = 0.5) +
facet_wrap(~ sex) +
theme_pub()
print(sex_multiples)
Conclusion
Congratulations! We have successfully created a small multiples plot using the facet_wrap()
function. As you can see, this is a very easy process, especially if we are only subsetting a single variable.
One of the best things about this method is that it is very easy to reproduce the plot even if you update the original data. This is because the plot is generated based on the data frame, not the individual values. As a result, you will not need to make any manual adjustments to the plot if there are any changes to your data.
I hope you enjoyed this short tutorial on how to create small multiples plots using the facet_wrap()
function in R
.