library(data.table)
Small multiple charts for length frequency distribution
This tutorial continues from the previous one, where we created a histogram with annotations for length-at-first maturity, juveniles, mature, and mega-spawners. In this tutorial, we will learn how to display a subset of the data, also known as small multiples. This type of graph is useful for comparing data across groups, such as the length frequency distribution of a species by fishing gear.
Preliminaries
To begin, let us load the required packages.
We will use again the data of Coregonus artedii.
<- data.table::fread("ciscoTL.csv") cisco_data
Additionally, we will use the objects (variables) from the previous tutorial:
<- 1
class_interval <- 17.1 cisco_lm_cm
Plotting
Length frequency distribution by fishing gear
We will now display the length frequency distribution by gear ID. There are 9 fishing gears used to catch this species in our data.
# Get the unique fishing gears in the data
<- unique(cisco_data$gearid)
unique_gears
# Prepare histogram data for each Gear ID with specified breaks
<- lapply(unique_gears, function(gear) {
gear_hist_data hist(subset(cisco_data, gearid == gear)$length_cm, breaks = brks, plot = FALSE)
})
# Find the maximum frequency across all gears
<- max(sapply(gear_hist_data, function(hist_data) {
max_frequency max(hist_data$counts)
}))
# get the number of unique fishing gears
<- length(unique_gears)
num_gears
# Calculate layout dimensions
<- ceiling(sqrt(num_gears))
num_cols_gears <- ceiling(num_gears / num_cols_gears)
num_rows_gears
par(mfrow = c(num_rows_gears, num_cols_gears), mai = c(0.5, 0.5, 0.35, 0.5),
omi = c(0.25, 0.25, 0.5, 0.5), mgp = c(2.5, 0.5, 0))
# Plot histograms with dynamic y-axis limits
for (i in 1:length(unique_gears)) {
hist(subset(cisco_data, gearid == unique_gears[i])$length_cm,
breaks = brks,
freq = TRUE,
ann = FALSE,
ylim = c(0, max_frequency),
col = "lightblue",
border = "black")
mtext(unique_gears[i], side = 3, line = 0.25, font = 3, outer = FALSE)
}
mtext("Length (in cm)", side = 1, font = 2, adj = 0.5, outer = TRUE, xpd = TRUE)
mtext("Frequency", side = 2, font = 2, adj = 0.5, outer = TRUE, xpd = TRUE)
mtext(expression(paste("Length Frequency of Cisco (", italic("Coregonus artedi"), ") per Gear")),
side = 3, font = 2, adj = 0.5, cex = 1.2, outer = TRUE, xpd = TRUE)
Let’s break down and explain what each section of this code does:
- Identify unique gear IDs
<- unique(cisco_data$gearid) unique_gears
This code identifies the unique gear IDs present in the dataset cisco_data
. unique(cisco_data$gearid)
returns a vector of unique gear IDs, which is stored in the variable unique_gears
.
- Prepare histogram data for each gear ID
# Prepare histogram data for each Gear ID with specified breaks
<- lapply(unique_gears, function(gear) {
gear_hist_data hist(subset(cisco_data, gearid == gear)$length_cm, breaks = brks, plot = FALSE)
})
This section uses lapply()
to iterate over each unique gear ID, generating histogram data for the length of fish (length_cm
) caught by each gear. The hist()
function calculates the histogram without plotting it (plot = FALSE
), and the results are stored in gear_hist_data
. The histograms use predefined breaks (brks
).
- Find the maximum frequency across all gears
# Find the maximum frequency across all gears
<- max(sapply(gear_hist_data, function(hist_data) {
max_frequency max(hist_data$counts)
}))
This calculates the maximum frequency (the highest count in any bin of the histograms) across all gears. sapply()
is used to extract the maximum count from each histogram’s counts vector, and max()
finds the highest value among these maxima, stored in max_frequency
.
- Calculate layout dimensions
# Get the number of unique fishing gears
<- length(unique_gears)
num_gears
# Calculate layout dimensions
<- ceiling(sqrt(num_gears))
num_cols_gears <- ceiling(num_gears / num_cols_gears) num_rows_gears
This part calculates the layout dimensions for plotting multiple histograms. It determines the number of rows and columns needed to fit all histograms in a grid, aiming for a nearly square layout. num_cols_gears
and num_rows_gears
represent the number of columns and rows in the grid, respectively.
- Set up plotting area
par(mfrow = c(num_rows_gears, num_cols_gears), mai = c(0.5, 0.5, 0.35, 0.5),
omi = c(0.25, 0.25, 0.5, 0.5), mgp = c(2.5, 0.5, 0))
This configures the plotting parameters using par:
mfrow = c(num_rows_gears, num_cols_gears)
sets the layout to a grid with the specified number of rows and columns.mai
specifies the margins of individual plots (bottom, left, top, right), in inches.omi
sets the outer margins for the entire plotting area, in inches.mgp
adjusts the margin line settings for axis labels.
- Plot histograms
# Plot histograms with dynamic y-axis limits
for (i in 1:length(unique_gears)) {
hist(subset(cisco_data, gearid == unique_gears[i])$length_cm,
breaks = brks,
freq = TRUE,
ann = FALSE,
ylim = c(0, max_frequency),
col = "lightblue",
border = "black")
mtext(unique_gears[i], side = 3, line = 0.25, font = 3, outer = FALSE)
}
This loop iterates over each unique Gear ID, plotting a histogram for each one:
subset(cisco_data, gearid == unique_gears[i])$length_cm
extracts the length data for the current gear.breaks = brks
uses the predefined breaks.ylim = c(0, max_frequency)
sets the y-axis limit to the maximum frequency found earlier.col = "lightblue"
andborder = "black"
set the colors of the histogram bars and borders.mtext(unique_gears[i], side = 3, line = 0.25, font = 3, outer = FALSE)
adds the Gear ID as a title above each histogram.
- Add common axis labels and title
mtext("Length (in cm)", side = 1, font = 2, adj = 0.5, outer = TRUE, xpd = TRUE)
mtext("Frequency", side = 2, font = 2, adj = 0.5, outer = TRUE, xpd = TRUE)
mtext(expression(paste("Length Frequency of Cisco (", italic("Coregonus artedi"), ") per Gear")),
side = 3, font = 2, adj = 0.5, cex = 1.2, outer = TRUE, xpd = TRUE)
Length frequency distribution by year
To gain further insights, we will analyze the length frequencies of fish caught per year. Given the extensive dataset spanning 26 years, we will focus on a subset of five years for simplicity and clarity.
For this tutorial, we will create a new data frame focusing on the years 2002, 2003, 2004, 2005, and 2006.
<- cisco_data[year4 %in% c(2002, 2003, 2004, 2005, 2006)] cisco_data5
This code filters the original dataset to include only the data from the specified years, storing the result in cisco_data5
.
Here’s how to plot a histogram for each year. I won’t go into detail about each line of code, as the process is similar to what was explained earlier. The primary differences are the data source and variable names.
# Get the unique years in the data
<- unique(cisco_data5$year4)
unique_years
# Prepare histogram data for each year with specified breaks
<- lapply(unique_years, function(year) {
year_hist_data hist(subset(cisco_data5, year4 == year)$length_cm, breaks = brks, plot = FALSE)
})
# Find the maximum frequency across all years
<- max(sapply(year_hist_data, function(hist_data) {
max_frequency max(hist_data$counts)
}))
# get the number of unique years
<- length(unique_years)
num_years
# Calculate layout dimensions
<- ceiling(sqrt(num_years))
num_cols_years <- ceiling(num_years / num_cols_years)
num_rows_years
par(mfrow = c(num_rows_years, num_cols_years), mai = c(0.5, 0.5, 0.5, 0.2),
omi = c(0.25, 0.25, 0.5, 0.25), mgp = c(2.5, 0.5, 0))
for (i in 1:length(unique_years)) {
hist(subset(cisco_data, year4 == unique_years[i])$length_cm,
breaks = brks,
freq = TRUE,
ann = FALSE,
ylim = c(0, max_frequency),
col = "lightblue",
border = "black")
mtext(unique_years[i], side = 3, line = 0.25, font = 3, outer = FALSE)
}
mtext("Length (in cm)", side = 1, font = 2, adj = 0.5, outer = TRUE, xpd = TRUE)
mtext("Frequency", side = 2, font = 2, adj = 0.5, outer = TRUE, xpd = TRUE)
mtext(expression(paste("Length frequency of Cisco (", italic("Coregonus artedi"), ") per Year")),
side = 3, font = 2, adj = 0.5, cex = 1.2, outer = TRUE, xpd = TRUE)
Conclusion
Congratulations! You’ve successfully created a small multiples plot using base R graphics functions. This process is straightforward, particularly when subsetting a single variable.
One of the greatest advantages of this method is its reproducibility. Since the plot is generated from the data frame rather than individual values, you can easily update the original data without needing to make manual adjustments to the plot.
I hope you found this short tutorial enjoyable!