Small multiple charts for length frequency distribution

This tutorial continues from the previous one, where we created a histogram with annotations for length-at-first maturity, juveniles, mature, and mega-spawners. In this tutorial, we will learn how to use facets to display a subset of the data, also known as small multiples. This type of graph is useful for comparing data across groups, such as the length frequency distribution of a species by fishing gear.

Extracted from the official reference of ggplot2:

Facetting generates small multiples, each displaying a different subset of the data. Facets are an alternative to aesthetics for displaying additional discrete variables.

There are two functions: facet_wrap() and facet_grid(). In this tutorial, we will only be dealing with the facet_wrap().

Preliminaries

To begin, let us load the required packages.

library(ggplot2)
library(dplyr)
library(magrittr)
library(ggthemes)

We will use again the data of Coregonus artedii.

head(cisco_data)
##       V1 lakeid year4 sampledate gearid spname length weight    sex
##    <int> <char> <int>     <char> <char> <char>  <int>  <num> <char>
## 1:     1     TR  1981  8/11/1981 VGN032  CISCO    140   21.4      F
## 2:     2     TR  1981  8/10/1981 VGN032  CISCO    146   22.3      F
## 3:     3     TR  1981  8/11/1981 VGN032  CISCO    147   23.3      F
## 4:     4     TR  1981  8/19/1981 VGN032  CISCO    153   23.5      F
## 5:     5     TR  1981  8/19/1981 VGN032  CISCO    150   24.0      F
## 6:     6     TR  1981  8/19/1981 VGN032  CISCO    152   24.0      F

Additionally, we will use the objects (variables) from the previous tutorial, as well as the custom theme. However, I made some modifications to the theme: axis.text.x = element_text(size = 10, angle = 25) to axis.text.x = element_text(size = 10).

cisco_data_range <- max(cisco_data$length) - min(cisco_data$length)
class_size <- 20
class_interval <- cisco_data_range / class_size
cisco_lm_mm <- 171

Plotting

Length frequency distribution by fishing gear

We will now display the length frequency distribution by gear ID. There are 9 fishing gears used to catch this species in our data.

First, we will create an object that contains the title for our graph.

my_title_gear <- expression(paste("Length frequency of Cisco (", italic("Coregonus artedi"), ") per gear"))
gear_multiples <- ggplot(data = cisco_data, aes(x = length)) +
  geom_histogram(binwidth = class_interval, color = "black", fill = "gray") +
  labs(title = my_title_gear,
       subtitle = "The dotted red line represents the length at first maturity (171 mm)",
       x = "Total Length (mm)", 
       y = "Frequency") +
  geom_vline(aes(xintercept = cisco_lm_mm), color = "red",
             linetype = "dotted", size = 0.5) +
  facet_wrap(~ gearid) +
  theme_pub()

print(gear_multiples)

According to the book R for Data Science by Grolemund and Wickham, on the topic Facets:

To facet your plot by a single variable, use facet_wrap(). The first argument of facet_wrap() should be a formula, which you create with ~ followed by a variable name (here “formula” is the name of a data structure in R, not a synonym for “equation”). The variable that you pass to facet_wrap() should be discrete.

In our case, the gearid variable is the input to the facet_wrap() function, which creates multiple subsets of the length frequency distribution based on the fishing gears used.

Length frequency distribution by year

Next, we will look at the length frequencies per year.

my_title_year <- expression(paste("Length frequency of Cisco (", italic("Coregonus artedi"), ") per year"))
year_multiples <- ggplot(data = cisco_data, aes(x = length)) +
  geom_histogram(binwidth = class_interval, color = "black", fill = "gray") +
  labs(title = my_title_year,
       subtitle = "The dotted red line represents the length at first maturity (171 mm)",
       x = "Total Length (mm)", 
       y = "Frequency") +
  geom_vline(aes(xintercept = cisco_lm_mm), color = "red",
             linetype = "dotted", size = 0.5) +
  facet_wrap(~ year4, nrow = 5) +
  theme_pub()

print(year_multiples)

Length frequency distribution by sex

Finally, we will compare the length frequencies of this species by sex. This is a bit tricky because the sex column contains some NA values. In this tutorial, we will only consider the length frequencies of male and female Coregonus artedii, which are indicated by M and F, respectively, in the data.

To do this, we will create a new object containing the sexes of the species:

cisco_sex <- cisco_data %>% 
  filter(sex == c("F", "M"))

The filter() function is part of the dplyr package. As you can guess, this function filters the rows of a data frame. In this case, it will look for rows in the sex column that contain the values F or M, which represent female and male, respectively. The == (double equals sign) operator is a relational operator that means “equal to”.

We will also create a separate title for the graph. This will make it easier to identify and understand the data that is being represented.

my_title_sex <- expression(paste("Length frequency of Cisco (", italic("Coregonus artedi"), ") per sex"))
sex_multiples <- ggplot(data = cisco_sex, aes(x = length)) +
  geom_histogram(binwidth = class_interval, color = "black", fill = "gray") +
  labs(title = my_title_sex,
       subtitle = "The dotted red line represents the length at first maturity (171 mm)",
       x = "Total Length (mm)", 
       y = "Frequency") +
  geom_vline(aes(xintercept = cisco_lm_mm), color = "red",
             linetype = "dotted", size = 0.5) +
  facet_wrap(~ sex) +
  theme_pub()

print(sex_multiples)

Conclusion

Congratulations! We have successfully created a small multiples plot using the facet_wrap() function. As you can see, this is a very easy process, especially if we are only subsetting a single variable.

One of the best things about this method is that it is very easy to reproduce the plot even if you update the original data. This is because the plot is generated based on the data frame, not the individual values. As a result, you will not need to make any manual adjustments to the plot if there are any changes to your data.

I hope you enjoyed this short tutorial on how to create small multiples plots using the facet_wrap() function in R.


See also