Here we define our Knit settings, to make the output more user friendly, and to cache output for faster knitting.
This is the the R markdown script written in R studio (2023.09.0+463 “Desert Sunflower” Release) used to summarise the systematic map database from Martin et al. 2024 “Evidence of the impacts of pharmaceuticals on aquatic animal behaviour (EIPAAB): a systematic map and open access database” (doi: 🚧to be added🚧).
It is designed to act as a starting point for anyone who wishes to use the ‘Evidence of the Impacts of Pharmaceuticals on Aquatic Animal Behaviour’ (EIPAAB) database for their own projects. But also as a description of how the summary present in the Manuscript was conducted.
I recommended you read the ‘README’ files before you work with the EIPAAB database, there are two ‘“’README.md’ and ‘README.csv’, start with the ‘.md’ file (https://github.com/JakeMartinResearch/EIPAAB-database/blob/main/README.md)
P.s Apologies for any spelling mistakes in the script I (Jake Martin) am dyslexic and this is a very long document
Jake M. Martin
📧 Email: jake.martin@deakin.edu.au
📧 Alt Email: jake.martin.research@gmail.com
🌐 Web: jakemartin.org
🐙 GitHub: JakeMartinResearch
If you are not familiar with R, here’s a beginners guide 📹
Here’s a link to download R 📥 and download R studio 📥
This is an R markdown file, which makes annotating and running R code more user friendly, it is also easy to reproducible and share in a variate of formates (e.g. htlm). The R code is embed within chunks, and the output for code will be embedded under the chuck.
# This is a chuck
All other text outside of the chucks are annotations (like this). Hashtags used outside of chucks are used to create headers and to structure the file. Hashtags within the chucks are used for more precise annotation within the code.
If you are not familiar with R markdown, here’s a guide
These are the R packages required to run the script. I have added them to a list so that I can install them all in one go using the function below called loaded_packages. This function I have made will load all the packages in the list below, if the packages are not already installed, this function will first install them.
If you want to install and load each package separately, you can use
the code install.packages() and require(), I have given a example below.
To run this code you will need to install pacman first
install.packages("pacman")
# install.packages('pacman')
pacman::p_load(tidyverse, ggraph, igraph, ggrepel, RColorBrewer, ggtree, treeio,
ape, gridExtra, ggdist, highcharter, pander, here, gt, readxl)
For ggtree
and treeio
you may need to run
this code for instillation
# if (!requireNamespace('BiocManager', quietly = TRUE))
# install.packages('BiocManager') BiocManager::install('ggtree')
Creating out input and output directories. They will be made within the current parent directory (i.e. where the R sciprt is saved)
This is code creates a folder and saves the directory as figure_path. This is where we will export our figures
figures_path <- here("figures")
if (!dir.exists(figures_path)) {
dir.create(figures_path)
}
This is code creates a folder and saves the directory as output_path This is where we will export our data
output_path <- here("output-data")
if (!dir.exists(output_path)) {
dir.create(output_path)
}
input_path <- here("input-data")
if (!dir.exists(figures_path)) {
dir.create(figures_path)
}
The ‘Evidence of the Impacts of Pharmaceuticals on Aquatic Animal Behaviour’ (EIPAAB) database has 111 columns and 1754 rows. The columns represent various forms of metadata extracted from articles that were included in Martin et al. 2024 “Evidence of the impacts of pharmaceuticals on aquatic animal behaviour: a systematic map and open access database” (doi: 🚧to be added🚧).
The 💿 READ-ME.csv file which explains what each metadata is, how it was extracted, what structure it has, and at what level it applies, is available at on GitHub (https://github.com/JakeMartinResearch/EIPAAB-database/tree/main/input-data) or on the OSF (10.17605/OSF.IO/ATWY6) Below I have imported the read me for accessibility. I highly recommend you read the READ-ME before conducting any of your own meta-analysis to make sure you have interoperated the data correctly.
More generally, column names that start with ‘validity’ are metadata relating to study validity, those that start with ‘specie’s relate to species information (population), those that start with ’compound’ relate to the chemical information (exposure), those that start with ‘behav’ relate to behaviour information (outcome). The order of columns reflects both the level the metadata is extracted at (i.e. article level or species by compound level; see level in READ-ME), as well as the general category of metadata (i.e. validity, species, compound, behaviour).
Here’s the first 10 rows of the READ ME file as an example
setwd(input_path)
READ_ME <- read.csv("READ-ME.csv", na = "NA") # loading the READ-ME file
READ_ME %>%
head() %>%
gt()
column_name | type | structure | level | description | validity_assessment | CRED_criteria |
---|---|---|---|---|---|---|
article_id | string | free | article | A unique article id assigned during title and abstract screening | 0 | NA |
response_id | string | free | article | A unique response id generated by the survey used to extract the article | 0 | NA |
doi | string | free | article | The DOI of the article | 0 | NA |
title | string | free | article | The title of the atricle | 0 | NA |
year | interger | free | article | The year the article was publish | 0 | NA |
journal | string | free | article | The name of the journal in which the article was published | 0 | NA |
Importing the 💿 EIPAAB-database.csv database
It can be assessed from…
GitHub (https://github.com/JakeMartinResearch/EIPAAB-database/tree/main/input-data)
OSF (https://osf.io/atwy6/)
My personal webpage (https://jakemartin.org/eipaab-database/)
If the CSV files are in the same working directory (wd) as this R script, you will not need to use setwd(), but if the files are located elsewhere you will need to specify this in setwd(), and run all lines at once. In R markdown the working directory changes back to default after the chuck is run.
setwd(input_path)
EIPAAB_database <- read.csv("EIPAAB-database.csv", na = "NA")
The first thing we will look at is how many unique (distinct) articles there are in the database, and how many rows of data there are.
There are 901 articles
EIPAAB_database %>%
dplyr::distinct(article_id) %>% # Returns a list of distinct article_id
nrow(.) # Returns the length of the current file (which is the list of distinct article_id)
## [1] 901
There are 1739 rows of data
EIPAAB_database %>%
nrow(.) # Returns the length of the current file (which is the length of the whole datafile)
## [1] 1739
Each row represent a unique species by compound combination within a given article. This is represented by the column unique_row_id This is a combination of the extractors response id, specie,s and compound. For example, R_0Bqz2RQ4JxPfBkZ_Danio_rerio_Diazepam, response id = R_0Bqz2RQ4JxPfBkZ, species = Danio rerio, and compound = Diazepam
EIPAAB_database %>%
dplyr::select(unique_row_id) %>% # selects just the unique_column_id column
dplyr::arrange(unique_row_id) %>% # arranges the column alphabetically so the same examples will be given everytime
dplyr::slice(1:10) %>% # Returns only the first 10 rows
gt()
unique_row_id |
---|
R_0Bqz2RQ4JxPfBkZ_Danio_rerio_Diazepam |
R_0CHlDBs9ipt4suZ_Astyanax_mexicanus_Aripiprazole |
R_0Ck0AOjLDWukBUt_Procambarus_clarkii_Chlordiazepoxide |
R_0JvaI9dlvTbozUl_Daphnia_magna_Fluoxetine |
R_0JvaI9dlvTbozUl_Daphnia_magna_Sertraline |
R_0Srt7zn9MwHKne1_Danio_rerio_Escitalopram |
R_0p8ZEROmCGlSR7r_Oryzias_latipes_Fluoxetine |
R_10C0XxjAUoZmibO_Amphiprion_ocellaris_17-alpha-ethinylestradiol |
R_10GdzsXlrkwamUt_Daphnia_magna_Cisplatin |
R_10NOT0XWL5TXN5m_Coenagrion_hastulatum_Diphenhydramine |
Now the number of total treatments represented in the data, this is the total number of unique doses per species by compound combination.
In the map the number of treatments was only extracted for water-borne exposures, the NAs, represent other exposure routes. Therefore, the number of water-borne exposures treatments are 6294, and there are an additional 226 articles that don’t have treatment numbers. We know they all have at least two treatments, a control and a compound of interest, because that is part of the inclusion criteria. So we could add the number of NAs * 2 to the total, this would be 6746 total treatment groups. Although, this would likely be an underestimate of the true total.
EIPAAB_database %>%
dplyr::summarise(
groups = sum(compound_treatment_levels, na.rm = TRUE), # Calculate the sum of 'compound_treatment_levels' while ignoring NA values
nas = sum(is.na(compound_treatment_levels)), # Count the number of NA values in 'compound_treatment_levels'
total = groups + (nas * 2) # Calculate the total by adding 'groups' to twice the number of NAs
) %>%
gt()
groups | nas | total |
---|---|---|
6152 | 226 | 6604 |
Let’s look at how the evidence collected breaks down by the three study motivations
total_atricles <- EIPAAB_database %>%
dplyr::distinct(article_id) %>%
nrow()
# Analyze the study motivations in the dataset
EIPAAB_database %>%
dplyr::group_by(article_id) %>% # Group the data by 'article_id'
dplyr::sample_n(1) %>% # Randomly sample one row from each group (i.e., each unique 'article_id')
dplyr::ungroup() %>% # Ungroup the data to remove the previous grouping
dplyr::group_by(study_motivation) %>% # Group the data by 'study_motivation'
dplyr::reframe( # Create a summary data frame with the count and percentage of each study motivation
n = length(study_motivation), # Count the number of occurrences of each study motivation
`%` = round(n / total_atricles, 3) * 100 # Calculate the percentage of total articles
) %>%
dplyr::arrange(desc(n)) %>% # Arrange the resulting data frame in descending order of the count
gt()
study_motivation | n | % |
---|---|---|
Environmental | 510 | 56.6 |
Medical | 233 | 25.9 |
Basic research | 158 | 17.5 |
Here we are changing the order of these in the database to “Environmental”, “Medical”, “Basic research” for plots.
EIPAAB_database <- EIPAAB_database %>%
dplyr::mutate(study_motivation = fct_relevel(study_motivation, "Environmental",
"Medical", "Basic research"))
Year range is 1974 to 2022, so 48 years worth of empirical research has contributed to this evidence base.
EIPAAB_database %>%
dplyr::reframe(min_year = min(year), max_year = max(year), total_years = max_year -
min_year) %>%
gt()
min_year | max_year | total_years |
---|---|---|
1974 | 2022 | 48 |
Now making a summary for the number of publications per year based on study motivation
# Create a complete sequence of years and all unique study motivations
all_years <- as.character(1974:2022)
all_study_motivations <- unique(EIPAAB_database$study_motivation)
# Create a data frame with all combinations of year and study motivation
all_combinations <- expand.grid(year = all_years, study_motivation = all_study_motivations,
stringsAsFactors = FALSE)
# Summarize the data
pub_year <- EIPAAB_database %>%
group_by(year, study_motivation) %>%
summarize(n = length(unique(article_id)), .groups = "drop") %>%
mutate(year = as.character(year))
# Join with the complete grid of years and study motivations
pub_year_complete <- all_combinations %>%
left_join(pub_year, by = c("year", "study_motivation")) %>%
mutate(n = if_else(is.na(n), 0, n), year = as.numeric(year))
Here’s a summary figure for the manuscript (MS).
# Define the colour palette
motivation_colour_theme <- c("#60BD6C", "#D359A1", "#3C82C4") # Making colour theme to apply to plot
# Create the plot
pub_year_fig <- pub_year_complete %>%
# Group years before 1996 and reformat the year column
dplyr::mutate(
year = as.character(if_else(year < 1996, 1996, year)), # Grouping years before 1996
year = if_else(year == "1996", "<1997", year) # Renaming 1995 group to "<1996"
) %>%
# Creating the plot
ggplot(aes(y = n, x = year, fill = study_motivation)) +
geom_bar(stat = "identity", width = 0.9) +
# Apply the custom colours
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
# Customizing the theme
theme(
legend.position = c(0.05, 1), # Positioning the legend in the top-left corner within the plot
legend.justification = c(0, 1), # Ensuring the legend box aligns properly at the top-left corner
axis.text.x = element_text(angle = 90, vjust = 0.5) # Rotating x-axis labels for better readability
) +
# Adding axis labels
labs(
x = "Year of publication",
y = "Number of articles"
)
# Display the plot
pub_year_fig
Saving the figure as a PDF. I have hased this so I don’t re-write this file each time.
# setwd(figures_path) ggsave('pub_year_fig.png', plot = pub_year_fig, width =
# 10, height = 5, dpi = 300) #if you want to save a png
# ggsave('study_pub_year_fig.pdf', plot = pub_year_fig, width = 10, height = 5)
Making values for cumulative and relative growth in articles. This is the cumulative number of articles per year for each study moitvation, as well as the relative growth based on 2007. We selected 2007 for a 15 year overview in growth.
pub_year_growth <- pub_year_complete %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(
n_cumulative = cumsum(n), # Calculate the cumulative sum of 'n'
n_cumulative_prop = n_cumulative / max(n_cumulative), # Calculate the cumulative proportion
n_2007 = ifelse(year == 2007, n, NA_real_), # Get n value for year 2007
n_2007 = first(na.omit(n_2007)), # Propagate the n_2012 value within the group
n_ratio_to_2007 = n / n_2007 # Calculate number of articles relative to that of 2007
) %>%
dplyr::ungroup() %>%
dplyr::select(study_motivation, year, n, n_cumulative, n_cumulative_prop, n_ratio_to_2007)
Making a plot for each motivation cumulative growth since 1974 (the first identified study)
cumulative_articles_fig <- pub_year_growth %>%
ggplot(aes(y = n_cumulative, x = year, colour = study_motivation)) +
geom_line(stat = "identity", linewidth = 1.5) +
geom_hline(yintercept = 0) +
scale_x_continuous(breaks = seq(1974, 2022, by = 1)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
# Customizing the theme
theme(
legend.position = c(0.05, 1), # Positioning the legend in the top-left corner within the plot
legend.justification = c(0, 1), # Ensuring the legend box aligns properly at the top-left corner
axis.text.x = element_text(angle = 90, vjust = 0.5) # Rotating x-axis labels for better readability
) +
# Adding axis labels
labs(
x = "Year of publication",
y = "Articles cumulative growth"
)
cumulative_articles_fig
Saving this plot
# setwd(figures_path) ggsave('study_cumulative_articles_fig.pdf', plot =
# cumulative_articles_fig, width = 10, height = 5)
Let’s look at relative growth compared to the research area more broadly
I will identify the most common research area based on each study motivation.
Environmental motivation = Environmental Sciences & Ecology
Medical motivation = Neurosciences & Neurology
Basic research = Neurosciences & Neurology
EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(study_motivation, wos_research_areas) %>%
dplyr::reframe(n = length(doi)) %>%
dplyr::mutate(wos_research_areas = str_trim(wos_research_areas)) %>%
tidyr::separate_rows(wos_research_areas, sep = ";") %>%
dplyr::mutate(wos_research_areas = str_trim(wos_research_areas)) %>%
dplyr::group_by(study_motivation, wos_research_areas) %>%
dplyr::reframe(n_total = sum(n)) %>%
dplyr::arrange(desc(n_total)) %>%
dplyr::arrange(study_motivation) %>%
dplyr::group_by(study_motivation) %>%
dplyr::slice(1:2) %>%
dplyr::ungroup() %>%
gt()
study_motivation | wos_research_areas | n_total |
---|---|---|
Environmental | Environmental Sciences & Ecology | 321 |
Environmental | Toxicology | 193 |
Medical | Neurosciences & Neurology | 106 |
Medical | Pharmacology & Pharmacy | 72 |
Basic research | Neurosciences & Neurology | 56 |
Basic research | Behavioral Sciences | 43 |
We will now compare the proportion cumulative growth of each study motivation against the most common research areas based on WoS.
I have searched articles published within these research areas from 1992-2022, and create a database to compare against.
Each search indued only a date range (e.g. PY=(1992-2021)) AND the given web of science resarch area (e.g. WC=(Pharmacology & Pharmacy)). Searchers were done on the 04/07/2024. Only the total number of articles each year was taken.
First we will import the research field annual number of articles database I create (martin-et-al-additional-file-10-wos-research-areas-1992-2022.xlsx). It is provide as supplementary file 9.
setwd(input_path)
wos_research_areas_n <- read_excel("martin-et-al-additional-file-10-wos-research-areas-1992-2022.xlsx", sheet = "sheet1") %>%
dplyr::arrange(year) %>%
dplyr::group_by(research_area) %>%
dplyr::mutate(
n_cumulative = cumsum(n), # Calculate the cumulative sum of 'n'
n_cumulative_prop = n_cumulative / max(n_cumulative), # Calculate the cumulative proportion
n_2007 = ifelse(year == 2007, n, NA_real_), # Get n value for year 2011
n_2007= first(na.omit(n_2007)), # Propagate the n_2011 value within the group
n_ratio_to_2007 = n / n_2007 # Calculate n_ratio_to_2000
) %>%
dplyr::ungroup() %>%
dplyr::select(research_area, year, n, n_cumulative, n_cumulative_prop, n_ratio_to_2007)
Combined the number of articles with those in the EIPAAB database
pub_year_growth_comp <- pub_year_growth %>%
dplyr::rename(research_area = study_motivation)
wos_research_areas_comp <- wos_research_areas_n %>%
rbind(., pub_year_growth_comp) %>%
dplyr::mutate(research_area = factor(research_area, levels = c("Environmental",
"Medical", "Basic research", "Environmental Sciences and Ecology", "Toxicology",
"Neurosciences and Neurology", "Pharmacology and Pharmacy", "All Research Areas")))
Let’s see what the relative growth was in 2022 (the latest included year in the evidence base)
wos_research_areas_comp %>%
dplyr::filter(year == 2022) %>%
gt()
research_area | year | n | n_cumulative | n_cumulative_prop | n_ratio_to_2007 |
---|---|---|---|---|---|
Pharmacology and Pharmacy | 2022 | 80433 | 1458833 | 1 | 1.765779 |
Neurosciences and Neurology | 2022 | 24448 | 450515 | 1 | 1.604094 |
Toxicology | 2022 | 16913 | 391858 | 1 | 1.409299 |
Environmental Sciences and Ecology | 2022 | 4599 | 103123 | 1 | 1.066064 |
All Research Areas | 2022 | 3618082 | 66839129 | 1 | 1.790462 |
Environmental | 2022 | 57 | 510 | 1 | 19.000000 |
Medical | 2022 | 30 | 233 | 1 | 10.000000 |
Basic research | 2022 | 9 | 158 | 1 | 2.250000 |
Let’s now compare the relative growth in Environmental research
# Define the colour palette
env_colour_theme <- c("#60BD6C", "#2E4B22", "black") # Making colour theme to apply to plot
enviro_comp <- c("Environmental", "Environmental Sciences and Ecology", "All Research Areas")
line_types <- c("Environmental" = "solid",
"Environmental Sciences and Ecology" = "dashed",
"All Research Areas" = "solid")
relative_growth_env_fig <- wos_research_areas_comp %>%
dplyr::filter(research_area %in% enviro_comp, year > 2006) %>%
ggplot(aes(y = n_ratio_to_2007, x = year, colour = research_area, linetype = research_area)) +
geom_line(stat = "identity", linewidth = 1.5) +
scale_x_continuous(breaks = seq(2007, 2022, by = 1)) +
scale_y_continuous(limits = c(0, 22), breaks = seq(0, 22, by = 4)) + # Scale Y axis from 0 to 22
scale_colour_manual(values = env_colour_theme, name = "Research Area") +
scale_linetype_manual(values = line_types) + # Apply manual line types
guides(linetype = "none") + # Remove legend for line types
theme_classic() +
# Customizing the theme
theme(
legend.position = c(0.05, 1), # Positioning the legend in the top-left corner within the plot
legend.justification = c(0, 1), # Ensuring the legend box aligns properly at the top-left corner
axis.text.x = element_text(angle = 90, vjust = 0.5) # Rotating x-axis labels for better readability
) +
# Adding axis labels
labs(
x = "Year of publication",
y = "Relative growth compared to 2007 (15 year baseline)"
)
relative_growth_env_fig
Save this figure
# setwd(figures_path) ggsave('study_relative_growth_env_fig.pdf', plot =
# relative_growth_env_fig, width = 5, height = 5)
Comparing the relative growth in medical research
# Define the colour palette
med_colour_theme <- c("#D359A1", "#D2137F", "black") # Making colour theme to apply to plot
med_comp <- c("Medical", "Neurosciences and Neurology", "All Research Areas")
line_types <- c("Medical" = "solid",
"Neurosciences and Neurology" = "dashed",
"All Research Areas" = "solid")
relative_growth_med_fig <- wos_research_areas_comp %>%
dplyr::filter(research_area %in% med_comp, year > 2006) %>%
ggplot(aes(y = n_ratio_to_2007, x = year, colour = research_area, linetype = research_area)) +
geom_line(stat = "identity", linewidth = 1.5) +
scale_x_continuous(breaks = seq(2007, 2022, by = 1)) +
scale_y_continuous(limits = c(0, 22), breaks = seq(0, 22, by = 4)) + # Scale Y axis from 0 to 22
scale_colour_manual(values = med_colour_theme, name = "Research Area") +
scale_linetype_manual(values = line_types) + # Apply manual line types
guides(linetype = "none") + # Remove legend for line types
theme_classic() +
# Customizing the theme
theme(
legend.position = c(0.05, 1), # Positioning the legend in the top-left corner within the plot
legend.justification = c(0, 1), # Ensuring the legend box aligns properly at the top-left corner
axis.text.x = element_text(angle = 90, vjust = 0.5) # Rotating x-axis labels for better readability
) +
# Adding axis labels
labs(
x = "Year of publication",
y = "Relative growth compared to 2007 (15 year baseline)"
)
relative_growth_med_fig
Save this figure
# setwd(figures_path) ggsave('study_relative_growth_med_fig.pdf', plot =
# relative_growth_med_fig, width = 5, height = 5)
Comparing relative growth in basic research
# Define the colour palette
basic_colour_theme <- c("#3C82C4", "#26276D", "black") # Making colour theme to apply to plot
basic_comp <- c("Basic research", "Neurosciences and Neurology", "All Research Areas")
line_types <- c("Basic research" = "solid",
"Neurosciences and Neurology" = "dashed",
"All Research Areas" = "solid")
relative_growth_base_fig <- wos_research_areas_comp %>%
dplyr::filter(research_area %in% basic_comp, year > 2006) %>%
ggplot(aes(y = n_ratio_to_2007, x = year, colour = research_area, linetype = research_area)) +
geom_line(stat = "identity", linewidth = 1.5) +
scale_x_continuous(breaks = seq(2007, 2022, by = 1)) +
scale_y_continuous(limits = c(0, 22), breaks = seq(0, 22, by = 4)) + # Scale Y axis from 0 to 22
scale_colour_manual(values = basic_colour_theme, name = "Research Area") +
scale_linetype_manual(values = line_types) + # Apply manual line types
guides(linetype = "none") + # Remove legend for line types
theme_classic() +
# Customizing the theme
theme(
legend.position = c(0.05, 1), # Positioning the legend in the top-left corner within the plot
legend.justification = c(0, 1), # Ensuring the legend box aligns properly at the top-left corner
axis.text.x = element_text(angle = 90, vjust = 0.5) # Rotating x-axis labels for better readability
) +
# Adding axis labels
labs(
x = "Year of publication",
y = "Relative growth compared to 2007 (15 year baseline)"
)
relative_growth_base_fig
Save this figure
# setwd(figures_path) ggsave('study_relative_growth_base_fig.pdf', plot =
# relative_growth_base_fig, width = 5, height = 5)
Looking at the link between the PEO elements. What was the average study design.
Below I have made a table that groups by these elements to see the average study design
It was 1 compound, 1 species and 1 behavioural class (41%)
behav_boolean <- c("behav_movement_boolean", "behav_boldness_boolean", "behav_foraging_boolean", "behav_antipredator_boolean", "behav_mating_boolean", "behav_post_mating_boolean", "behav_agression_boolean", "behav_sociality_boolean", "behav_cognition_boolean", "behav_noncat_boolean")
EIPAAB_database %>%
dplyr::mutate(behav_n = rowSums(across(all_of(behav_boolean)), na.rm = TRUE)) %>% # how many behav class measured
dplyr::group_by(article_id) %>%
dplyr::arrange(desc(behav_n)) %>%
dplyr::slice(1) %>%
dplyr::ungroup() %>%
dplyr::select(compound_n, species_n, behav_n) %>%
dplyr::group_by(compound_n, species_n, behav_n) %>%
dplyr::reframe(n= length(compound_n),
'%' = round(n/902*100,1)) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>% # Only the 10 most common combinations
gt()
compound_n | species_n | behav_n | n | % |
---|---|---|---|---|
1 | 1 | 1 | 374 | 41.5 |
1 | 1 | 2 | 160 | 17.7 |
2 | 1 | 1 | 78 | 8.6 |
1 | 1 | 3 | 63 | 7.0 |
3 | 1 | 1 | 45 | 5.0 |
2 | 1 | 2 | 34 | 3.8 |
4 | 1 | 1 | 24 | 2.7 |
3 | 1 | 2 | 15 | 1.7 |
1 | 2 | 1 | 13 | 1.4 |
5 | 1 | 1 | 9 | 1.0 |
I summary df that has the number of PEO elements in each of the 901 studies
PEO_element_summary <- EIPAAB_database %>%
dplyr::mutate(behav_n = rowSums(across(all_of(behav_boolean)), na.rm = TRUE)) %>%
dplyr::group_by(article_id) %>%
dplyr::arrange(desc(behav_n)) %>%
dplyr::slice(1) %>%
dplyr::ungroup() %>%
dplyr::select(compound_n, species_n, behav_n)
Looking at the number of species used
PEO_element_summary %>%
dplyr::group_by(species_n) %>%
dplyr::reframe(n = length(species_n), `%` = n/901) %>%
gt()
species_n | n | % |
---|---|---|
1 | 873 | 0.968923418 |
2 | 25 | 0.027746948 |
3 | 1 | 0.001109878 |
4 | 1 | 0.001109878 |
5 | 1 | 0.001109878 |
Looking at the number of compounds used
PEO_element_summary %>%
dplyr::group_by(compound_n) %>%
dplyr::reframe(n = length(compound_n), `%` = n/901) %>%
dplyr::slice(1:10) %>%
gt()
compound_n | n | % |
---|---|---|
1 | 624 | 0.692563818 |
2 | 127 | 0.140954495 |
3 | 67 | 0.074361820 |
4 | 32 | 0.035516093 |
5 | 16 | 0.017758047 |
6 | 8 | 0.008879023 |
7 | 6 | 0.006659267 |
8 | 5 | 0.005549390 |
9 | 1 | 0.001109878 |
10 | 2 | 0.002219756 |
The number of different behaviours
PEO_element_summary %>%
dplyr::group_by(behav_n) %>%
dplyr::reframe(n = length(species_n), `%` = n/901) %>%
gt()
behav_n | n | % |
---|---|---|
1 | 583 | 0.647058824 |
2 | 227 | 0.251942286 |
3 | 78 | 0.086570477 |
4 | 10 | 0.011098779 |
5 | 2 | 0.002219756 |
7 | 1 | 0.001109878 |
Let’s take a closer look at species information
There are 173 species in the EIPAAB database
EIPAAB_database %>%
dplyr::distinct(species_name) %>%
nrow()
## [1] 173
The number of spp each study motivation
EIPAAB_database %>%
dplyr::group_by(study_motivation) %>%
dplyr::reframe(n_spp = length(unique(species_name)), total_study = length(unique(article_id)),
rel_n = n_spp/total_study) %>%
gt()
study_motivation | n_spp | total_study | rel_n |
---|---|---|---|
Environmental | 143 | 510 | 0.2803922 |
Medical | 25 | 233 | 0.1072961 |
Basic research | 43 | 158 | 0.2721519 |
There are 21 class
EIPAAB_database %>%
dplyr::distinct(species_class) %>%
nrow()
## [1] 21
There are 935 different groups of animals used across all 901 studies (i.e. some studies had more then one species)
EIPAAB_database %>%
dplyr::distinct(unique_population_id) %>%
nrow(.)
## [1] 935
Let’s make a Cladogram to get an overview of what taxa are in the database
spp_taxonomy <- EIPAAB_database %>%
dplyr::group_by(species_name) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::filter(!is.na(species_family), !str_detect(species_species, "spp."), species_kingdom !=
"Chromista") %>%
dplyr::select("species_kingdom", "species_phylum", "species_class", "species_order",
"species_family", "species_genus", "species_species") %>%
dplyr::mutate(species_species = paste0(substr(species_genus, 1, 1), ". ", sub("^[^ ]+ ",
"", species_species)))
# The mutate changes the spp name to abbreviate the Genus (e.g. Aeshna cyanea
# to A. cyanea)
Create a hierarchical structure for the plot
taxonomy <- spp_taxonomy[, c("species_kingdom", "species_phylum", "species_class",
"species_order", "species_family", "species_genus", "species_species")]
taxonomy[] <- lapply(taxonomy, factor)
# Create a phylogenetic tree
phylo_tree <- as.phylo.formula(~species_kingdom/species_phylum/species_class/species_order/species_family/species_genus/species_species,
data = taxonomy)
Manual creating a phylo_tree (with equal branches)
ggtree_obj <- ggtree(phylo_tree, branch.length='none', layout='circular')
# Extract the phylum information for coloring
#taxonomy$label <- paste0(substr(spp_taxonomy$species_genus, 1, 1), ". ", spp_taxonomy$species_species)
class_info <- taxonomy$species_class[match(phylo_tree$tip.label, taxonomy$species_species)]
# Add the phylum information to the ggtree object
ggtree_obj <- ggtree_obj %<+% data.frame(label = phylo_tree$tip.label, class = class_info)
# Create a color vector for the phylum levels
class_colors <- rainbow(length(unique(class_info)))
names(class_colors) <- unique(class_info)
# Plot the cladogram with colored branches
spp_cladogram <- ggtree_obj +
geom_tiplab(size=3) + #If you want to add species names
geom_tree(aes(color=class)) +
scale_color_manual(values = class_colors) +
theme(legend.position = "right")
spp_cladogram
Save the figure
# setwd(figures_path) ggsave('spp_cladogram.pdf', plot = spp_cladogram, width =
# 8, height = 5)
Let’s group by class to see the major taxonomic Classes used
First removing cases where species_species was “spp.” replacing with NA for taxonomic classification
EIPAAB_database <- EIPAAB_database %>%
dplyr::mutate(species_species = if_else(species_species == "spp.", NA, species_species))
Let’s look at the major Class
# Total number of spp
n_spp <- EIPAAB_database %>%
dplyr::distinct(species_name) %>%
nrow()
# Number of spp per Class and per phylum
spp_classes <- EIPAAB_database %>%
dplyr::group_by(species_class, species_phylum) %>%
dplyr::reframe(count_class = length(unique(species_name)), percent_class = round(count_class/n_spp *
100, 1)) %>%
dplyr::group_by(species_phylum) %>%
dplyr::mutate(count_phylum = sum(count_class), percent_phylum = round(count_phylum/n_spp *
100, 1)) %>%
dplyr::ungroup() %>%
dplyr::arrange(desc(percent_class))
spp_classes %>%
dplyr::slice(1:10) %>%
gt()
species_class | species_phylum | count_class | percent_class | count_phylum | percent_phylum |
---|---|---|---|---|---|
Actinopterygii | Chordata | 71 | 41.0 | 87 | 50.3 |
Malacostraca | Arthropoda | 21 | 12.1 | 42 | 24.3 |
Gastropoda | Mollusca | 19 | 11.0 | 28 | 16.2 |
Amphibia | Chordata | 12 | 6.9 | 87 | 50.3 |
Branchiopoda | Arthropoda | 10 | 5.8 | 42 | 24.3 |
Bivalvia | Mollusca | 8 | 4.6 | 28 | 16.2 |
Insecta | Arthropoda | 8 | 4.6 | 42 | 24.3 |
Rhabditophora | Platyhelminthes | 5 | 2.9 | 6 | 3.5 |
Reptilia | Chordata | 3 | 1.7 | 87 | 50.3 |
Copepoda | Arthropoda | 2 | 1.2 | 42 | 24.3 |
Making a figure for the 15 most abundant Class, it terms of species diversity in the EIPAAB database
class_n_spp_fig <- spp_classes %>%
dplyr::arrange(desc(percent_class)) %>% # arrange the dataset
dplyr::slice(1:15) %>% # Take only the most diverse 15 Class
dplyr::mutate(species_class = fct_reorder(species_class, percent_class), # Order by diversity
species_phylum = fct_reorder(species_phylum, desc(percent_phylum))) %>% # Order by diversity
ggplot(aes(x=species_class, y=count_class, color = species_phylum)) +
geom_segment(aes(x=species_class, xend=species_class, y=0, yend=count_class)) +
geom_point(size=4) +
geom_text(aes(label = count_class),
hjust=-1.2,
size=3.5,
color="black") +
scale_colour_brewer(palette= "Dark2") +
coord_flip() +
ylim(0, 75) +
theme_classic() +
labs(
x = "",
y = "Number of distict species in the database"
) +
theme(legend.position = c(0., 0.05), # Positioning the legend inside the plot
legend.justification = c(-3, 0), # Bottom left inside the plot
legend.box.just = "right",
legend.background = element_rect(fill=alpha('white', 0.5))
)
class_n_spp_fig
Save the figure
# setwd(figures_path) ggsave('spp_class_n_spp_fig.pdf', plot = class_n_spp_fig,
# width = 5, height = 5)
Here’s a look at the % at the phylum level
# In the class summary we also included percent_phylum
spp_classes %>%
dplyr::group_by(species_phylum) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::select(-species_class, -count_class, -percent_class) %>%
dplyr::arrange(desc(percent_phylum)) %>%
gt()
species_phylum | count_phylum | percent_phylum |
---|---|---|
Chordata | 87 | 50.3 |
Arthropoda | 42 | 24.3 |
Mollusca | 28 | 16.2 |
Platyhelminthes | 6 | 3.5 |
Annelida | 3 | 1.7 |
Echinodermata | 3 | 1.7 |
Cnidaria | 2 | 1.2 |
Rotifera | 2 | 1.2 |
A ring chart at the phylum level
ring_plot_df <- spp_classes %>%
dplyr::slice(1:15) %>%
dplyr::group_by(species_phylum) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::select(species_phylum, count_phylum) %>%
dplyr::mutate(percent_phylum = count_phylum/sum(count_phylum)) %>%
dplyr::arrange(desc(percent_phylum))
# Compute the cumulative percentages (top of each rectangle)
ring_plot_df$ymax = cumsum(ring_plot_df$percent_phylum)
# Compute the bottom of each rectangle
ring_plot_df$ymin = c(0, head(ring_plot_df$ymax, n = -1))
# Compute label position
ring_plot_df$labelPosition <- (ring_plot_df$ymax + ring_plot_df$ymin)/2
# Compute a good label
ring_plot_df$label <- paste0(ring_plot_df$species_phylum, "\n (n = ", ring_plot_df$count_phylum,
")")
phylum_ring_fig <- ring_plot_df %>%
dplyr::mutate(species_phylum = fct_reorder(species_phylum, desc(percent_phylum))) %>%
ggplot(aes(ymax = ymax, ymin = ymin, xmax = 4, xmin = 3, fill = species_phylum)) +
geom_rect() + coord_polar(theta = "y") + geom_label(x = 5, aes(y = labelPosition,
label = label), size = 3, alpha = 0.8) + scale_fill_brewer(palette = "Dark2") +
xlim(c(2, 5)) + theme_void() + theme(legend.position = "none")
phylum_ring_fig
Save this plot
# setwd(figures_path) ggsave('spp_phylum_ring_fig.pdf', plot = phylum_ring_fig,
# width = 5, height = 5)
Now we will look at how many times each phylum, class, order, family, genus, species appear in the database
First we will make a dataset that counts the number of species within each phylum, class, order, family, and genus.
# Step 1: Separate and pivot the data
lineage_data <- EIPAAB_database %>%
pivot_longer(cols = c("species_phylum", "species_class", "species_order", "species_family",
"species_genus", "species_species"), names_to = "lineage_level", values_to = "classification") %>%
dplyr::mutate(lineage_level = str_remove(lineage_level, "species_"))
# Define the order
lineage_levels_order <- c("phylum", "class", "order", "family", "genus", "species")
# Step 2: Create parent-child relationships
lineage_data <- lineage_data %>%
group_by(unique_row_id) %>%
mutate(parent = case_when(lineage_level == "phylum" ~ "Animalia", lineage_level ==
"class" ~ lag(classification, 1), lineage_level == "order" ~ lag(classification,
1), lineage_level == "family" ~ lag(classification, 1), lineage_level ==
"genus" ~ lag(classification, 1), lineage_level == "species" ~ lag(classification,
1), )) %>%
ungroup()
Here we sum the total number of species used in the database across each taxonomic classification
n_rows <- EIPAAB_database %>%
nrow()
lineage_count_use <- lineage_data %>%
dplyr::group_by(classification, lineage_level, parent) %>%
dplyr::reframe(classification_count = length(unique_row_id), classification_percent = round(classification_count/n_rows *
100, 1))
Making a plot to look at the 15 most commonly used class, but you can do this at any of the taxonomic levels
class_use_fig <- lineage_count_use %>%
dplyr::filter(lineage_level == "class") %>%
dplyr::arrange(desc(classification_percent)) %>%
dplyr::slice(1:15) %>%
dplyr::mutate(classification = fct_reorder(classification, classification_percent),
parent = fct_reorder(parent, desc(classification_percent))) %>%
ggplot(aes(x=classification, y=classification_percent, color = parent)) +
geom_segment(aes(x=classification, xend=classification, y=0, yend=classification_percent)) +
geom_point(size=4) +
geom_text(aes(label = paste0(round(classification_percent,2), "%")),
hjust=-0.5,
size=3.5,
color="black") +
scale_colour_brewer(palette= "Dark2") +
coord_flip() +
ylim(0, 100) +
theme_classic() +
labs(
x = "",
y = "Percentage representation in the database"
) +
theme(legend.position = c(0., 0.05), # Positioning the legend inside the plot
legend.justification = c(-3, 0), # Bottom left inside the plot
legend.box.just = "right",
legend.background = element_rect(fill=alpha('white', 0.5))
)
class_use_fig
Save plot
# setwd(figures_path) ggsave('spp_class_use_fig.pdf', plot = class_use_fig,
# width = 5, height = 5)
Table of occurrence, and percentage occurrence
lineage_count_use %>%
dplyr::filter(lineage_level == "class") %>%
dplyr::group_by(classification) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::arrange(desc(classification_percent)) %>%
gt()
classification | lineage_level | parent | classification_count | classification_percent |
---|---|---|---|---|
Actinopterygii | class | Chordata | 1311 | 75.4 |
Branchiopoda | class | Arthropoda | 130 | 7.5 |
Gastropoda | class | Mollusca | 69 | 4.0 |
Malacostraca | class | Arthropoda | 61 | 3.5 |
Amphibia | class | Chordata | 44 | 2.5 |
Rhabditophora | class | Platyhelminthes | 29 | 1.7 |
Bivalvia | class | Mollusca | 25 | 1.4 |
Hydrozoa | class | Cnidaria | 22 | 1.3 |
Insecta | class | Arthropoda | 17 | 1.0 |
Cephalopoda | class | Mollusca | 7 | 0.4 |
Reptilia | class | Chordata | 6 | 0.3 |
Polychaeta | class | Annelida | 4 | 0.2 |
Trematoda | class | Platyhelminthes | 3 | 0.2 |
Asteroidea | class | Echinodermata | 1 | 0.1 |
Clitellata | class | Annelida | 1 | 0.1 |
Copepoda | class | Arthropoda | 2 | 0.1 |
Echinoidea | class | Echinodermata | 1 | 0.1 |
Holothuroidea | class | Echinodermata | 1 | 0.1 |
Mammalia | class | Chordata | 2 | 0.1 |
Monogononta | class | Rotifera | 2 | 0.1 |
Thecostraca | class | Arthropoda | 1 | 0.1 |
Here’s break down by Phylum
ring_use_plot_df <- lineage_count_use %>%
dplyr::filter(lineage_level == "phylum") %>%
dplyr::arrange(desc(classification_percent)) %>%
dplyr::mutate(classification_percent = round(classification_percent, 2)) %>%
dplyr::slice(1:8)
# Compute the cumulative percentages (top of each rectangle)
ring_use_plot_df$ymax = cumsum(ring_use_plot_df$classification_percent)
# Compute the bottom of each rectangle
ring_use_plot_df$ymin = c(0, head(ring_use_plot_df$ymax, n = -1))
# Compute label position
ring_use_plot_df$labelPosition <- (ring_use_plot_df$ymax + ring_use_plot_df$ymin)/2
# Compute a good label
ring_use_plot_df$label <- paste0(ring_use_plot_df$classification, "\n (", ring_use_plot_df$classification_percent,
"%)")
phylum_use_ring_fig <- ring_use_plot_df %>%
dplyr::mutate(classification = fct_reorder(classification, desc(classification_percent))) %>%
ggplot(aes(ymax = ymax, ymin = ymin, xmax = 4, xmin = 3, fill = classification)) +
geom_rect() + coord_polar(theta = "y") + geom_label(x = 5, aes(y = labelPosition,
label = label), size = 3, alpha = 0.8) + scale_fill_brewer(palette = "Dark2") +
xlim(c(2, 5)) + theme_void()
# theme(legend.position = 'none')
phylum_use_ring_fig
Save the figure
# setwd(figures_path) ggsave('spp_phylum_use_ring_fig.pdf', plot =
# phylum_use_ring_fig, width = 5, height = 5)
Making a data set that looks at relative representation by each motivation, and the total
lineage_count_use_motivation <- lineage_data %>%
dplyr::group_by(study_motivation, lineage_level, parent, classification) %>%
dplyr::reframe(classification_count = length(unique_row_id)) %>%
dplyr::group_by(study_motivation, lineage_level) %>%
dplyr::mutate(n_motivation = sum(classification_count)) %>%
dplyr::ungroup() %>%
dplyr::mutate(classification_percent = (classification_count/n_motivation) *
100)
lineage_count_use_all <- lineage_data %>%
dplyr::group_by(lineage_level, parent, classification) %>%
dplyr::reframe(classification_count = length(unique_row_id)) %>%
dplyr::group_by(lineage_level) %>%
dplyr::mutate(n_motivation = sum(classification_count)) %>%
dplyr::ungroup() %>%
dplyr::mutate(classification_percent = (classification_count/n_motivation) *
100) %>%
dplyr::mutate(study_motivation = "All")
lineage_count_use_motivation <- lineage_count_use_motivation %>%
rbind(., lineage_count_use_all) %>%
dplyr::mutate(study_motivation = fct_relevel(study_motivation, "All", "Environmental",
"Medical", "Basic research"))
Here’s a tile plot to compare the use of different taxa across the study motivations
class_order_df <- lineage_count_use %>%
dplyr::filter(lineage_level == "class") %>%
dplyr::arrange(classification_percent) %>%
dplyr::mutate(class_order = 1:nrow(.)) %>%
dplyr::select(classification, class_order)
class_use_motivation_fig <- lineage_count_use_motivation %>%
dplyr::filter(lineage_level == "class") %>%
dplyr::full_join(., class_order_df, by = "classification") %>%
dplyr::mutate(classification = fct_reorder(classification, class_order), parent = fct_reorder(parent,
desc(class_order))) %>%
ggplot(aes(x = study_motivation, y = classification, fill = classification_percent)) +
geom_tile() + geom_text(aes(label = paste0(round(classification_percent, 1),
"%"))) + scale_fill_gradient(name = expression("Relative\nabudance (%)"), low = "#FFFFFF",
high = "#231F20") + theme_classic() + labs(x = "Study motivation", y = "Taxonomic class")
class_use_motivation_fig
Save the figure
# setwd(figures_path) ggsave('spp_class_use_motivation_fig.pdf', plot =
# class_use_motivation_fig, width = 5, height = 5)
Let’s see what the most common species were. This is calculated at the population level (i.e. doesn’t count each species mutiple times if multiple compounds were used in a single article; unique_population_id)
n_total <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>%
dplyr::reframe(n = length(unique_population_id)) %>%
nrow(.)
n_spp <- EIPAAB_database %>%
dplyr::group_by(unique_population_id, species_ncbi_taxonomy_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(species_name, species_ncbi_taxonomy_id) %>%
dplyr::reframe(n = length(species_name), percent = round(n/n_total * 100, 1)) %>%
dplyr::arrange(desc(n)) %>%
dplyr::mutate(spp_number = 1:nrow(.))
n_spp %>%
dplyr::slice(1:15) %>%
gt()
species_name | species_ncbi_taxonomy_id | n | percent | spp_number |
---|---|---|---|---|
Danio rerio | NCBI:txid7955 | 412 | 44.1 | 1 |
Daphnia magna | NCBI:txid35525 | 54 | 5.8 | 2 |
Pimephales promelas | NCBI:txid90988 | 32 | 3.4 | 3 |
Betta splendens | NCBI:txid158456 | 26 | 2.8 | 4 |
Poecilia reticulata | NCBI:txid8081 | 26 | 2.8 | 5 |
Gambusia holbrooki | NCBI:txid37273 | 18 | 1.9 | 6 |
Carassius auratus | NCBI:txid7957 | 16 | 1.7 | 7 |
Oryzias latipes | NCBI:txid8090 | 16 | 1.7 | 8 |
Gasterosteus aculeatus | NCBI:txid69293 | 14 | 1.5 | 9 |
Oncorhynchus mykiss | NCBI:txid8022 | 12 | 1.3 | 10 |
Perca fluviatilis | NCBI:txid8168 | 11 | 1.2 | 11 |
Xenopus laevis | NCBI:txid8355 | 10 | 1.1 | 12 |
Salmo trutta | NCBI:txid8032 | 8 | 0.9 | 13 |
Salmo salar | NCBI:txid8030 | 7 | 0.7 | 14 |
Sepia officinalis | NCBI:txid6610 | 7 | 0.7 | 15 |
Making a broad category of abundance (article_n_group) to make the summary and figure more digestible
n_spp_fig_data <- n_spp %>%
dplyr::mutate(species_ncbi_taxonomy_id = fct_reorder(species_ncbi_taxonomy_id,
desc(n))) %>%
dplyr::mutate(article_n_group = case_when(n == 1 ~ "One only", n >= 2 & n <=
5 ~ "Between 2 and 5", n >= 5 & n <= 10 ~ "Between 6 and 10", n >= 10 ~ "Greater than 10",
TRUE ~ "Others"))
This is the number of species in each category
article_n_group_summary <- n_spp_fig_data %>%
dplyr::group_by(article_n_group) %>%
dplyr::reframe(n_cat = length(species_name))
article_n_group_summary %>%
gt()
article_n_group | n_cat |
---|---|
Between 2 and 5 | 53 |
Between 6 and 10 | 6 |
Greater than 10 | 11 |
One only | 103 |
Making a plot to show the distribution of species use in the EIPAAB database
n_spp_fig <- n_spp_fig_data %>%
dplyr::mutate(article_n_group = fct_relevel(article_n_group, "Greater than 10", "Between 6 and 10", "Between 2 and 5", "One only")) %>%
ggplot(aes(y = n, x = spp_number, color = article_n_group)) +
geom_line(linewidth = 1, alpha = 0.2) +
geom_point(stat = "identity", size = 1, alpha = 0.8) +
scale_color_manual(
values = c(
"One only" = "#E94039",
"Between 2 and 5" = "#F18E76",
"Between 6 and 10" = "#877FBC",
"Greater than 10" = "#4D479D"
),
labels = c(
"One only" = "One only (n = 104)",
"Between 2 and 5" = "Between 2 and 5 (n = 53)",
"Between 6 and 10" = "Between 6 and 10 (n = 11)",
"Greater than 10" = "Greater than 10 (n = 6)"
)
) + # Set colours for each category
theme_classic() +
theme(
legend.position = c(-0.3, 0.4), # Positioning the legend inside the plot
legend.justification = c(-3, 0), # Bottom left inside the plot
legend.box.just = "right"
) +
labs(
x = paste0("Species (1-174)"),
y = "Number of articles",
color = "Article number category"
)
n_spp_fig
Save the figure
# setwd(figures_path) ggsave('spp_n_spp_fig.pdf', plot = n_spp_fig, width = 5,
# height = 5)
Making a list of the most common 15 species
n_spp_used <- EIPAAB_database %>%
dplyr::distinct(unique_population_id) %>%
nrow(.)
top_15_spp_list <- EIPAAB_database %>%
dplyr::group_by(species_name) %>%
dplyr::summarise(n = length(unique(unique_population_id)), .groups = "drop") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:15) %>%
dplyr::pull(species_name) %>%
as.list()
Making a summary dataframe base on study motivation
common_species <- EIPAAB_database %>%
dplyr::filter(species_name %in% top_15_spp_list) %>%
dplyr::group_by(species_name, study_motivation) %>%
dplyr::summarise(n = length(unique(article_id)), .groups = "drop") %>%
tidyr::complete(species_name, study_motivation, fill = list(n = 0))
species_order <- common_species %>%
group_by(species_name) %>%
summarise(total_n = sum(n), .groups = "drop") %>%
arrange(desc(total_n)) %>%
ungroup()
common_species <- common_species %>%
inner_join(species_order, by = "species_name") %>%
mutate(species_name = fct_reorder(species_name, total_n), study_motivation = fct_relevel(study_motivation,
"Environmental", "Medical", "Basic research"))
A plot of the number of times each of the 15 overall most common species appeared in articles within the EIPAAB databse by study motivation. It’s a little hard to see in the chunk output, try viewing in an external window.
top_15_spp_fig <- common_species %>%
ggplot(aes(x = species_name, y = n, colour = study_motivation, fill = study_motivation,
group = study_motivation)) + geom_col(position = position_dodge(width = 0.8),
width = 0.1) + geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = n), hjust = -0.6, size = 3.5, color = "black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
coord_flip() + theme_classic() + labs(x = "", y = "Number of studies") + theme()
top_15_spp_fig
Save the figure
# setwd(figures_path) ggsave('spp_top_15_spp_fig.pdf', plot = top_15_spp_fig,
# width = 5, height = 10)
Summarising the top 10 in each motivation more specifically
spp_moitivation_summary <- EIPAAB_database %>%
dplyr::group_by(species_name, study_motivation) %>%
dplyr::summarise(n = length(unique(unique_population_id)), .groups = "drop") %>%
tidyr::complete(species_name, study_motivation, fill = list(n = 0)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(prop = n/total) %>%
dplyr::select(-total)
This plot shows the top 10 in each motivation
top_10_env_spp <- spp_moitivation_summary %>%
dplyr::filter(study_motivation == "Environmental") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::mutate(species_name = fct_reorder(species_name, n)) %>%
ggplot(aes(x = species_name, y = n)) + geom_col(width = 0.01, colour = "#60BD6C",
fill = "#60BD6C") + geom_point(size = 2, colour = "#60BD6C", fill = "#60BD6C") +
geom_text(aes(label = n), hjust = -0.6, size = 3.5, color = "black") + coord_flip() +
theme_classic() + labs(x = "", y = "Number of studies", title = "Environmental") +
theme(plot.title = element_text(size = 12))
top_10_med_spp <- spp_moitivation_summary %>%
dplyr::filter(study_motivation == "Medical") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::mutate(species_name = fct_reorder(species_name, n)) %>%
ggplot(aes(x = species_name, y = n)) + geom_col(width = 0.01, colour = "#D359A1",
fill = "#D359A1") + geom_point(size = 2, colour = "#D359A1", fill = "#D359A1") +
geom_text(aes(label = n), hjust = -0.6, size = 3.5, color = "black") + coord_flip() +
theme_classic() + labs(x = "", y = "Number of studies", title = "Medical") +
theme(plot.title = element_text(size = 12))
top_10_base_spp <- spp_moitivation_summary %>%
dplyr::filter(study_motivation == "Basic research") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::mutate(species_name = fct_reorder(species_name, n)) %>%
ggplot(aes(x = species_name, y = n)) + geom_col(width = 0.01, colour = "#3C82C4",
fill = "#3C82C4") + geom_point(size = 2, colour = "#3C82C4", fill = "#3C82C4") +
geom_text(aes(label = n), hjust = -0.6, size = 3.5, color = "black") + coord_flip() +
theme_classic() + labs(x = "", y = "Number of studies", title = "Basic") + theme(plot.title = element_text(size = 12))
Here’s a plot to compare the top 10 spp in each study motivation more specifically
top_10_combind_plot <- grid.arrange(top_10_env_spp, top_10_med_spp, top_10_base_spp,
ncol = 3)
Looking at the number of distinct species used in each motivation group
EIPAAB_database %>%
dplyr::group_by(study_motivation) %>%
dplyr::reframe(n_motivation = n_distinct(article_id), n_distinct_spp = n_distinct(species_name)) %>%
gt()
study_motivation | n_motivation | n_distinct_spp |
---|---|---|
Environmental | 510 | 143 |
Medical | 233 | 25 |
Basic research | 158 | 43 |
First checking how many species have IUCN data, which we will use to assess habitat differences
species_iucn_summary <- EIPAAB_database %>%
dplyr::group_by(species_name) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::mutate(species_iucn_bin = if_else(is.na(species_iucn_doi), "No", "Yes")) %>%
dplyr::group_by(species_iucn_bin) %>%
dplyr::reframe(n = length(species_iucn_bin))
species_iucn_summary %>%
gt()
species_iucn_bin | n |
---|---|
No | 67 |
Yes | 106 |
Summarising the IUNC habitat type, some species will have multiple habitats, so we split the string by the sepreate (;)
habitat_summary <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(species_iucn_habitat) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_iucn_habitat = str_trim(species_iucn_habitat)) %>%
tidyr::separate_rows(species_iucn_habitat, sep = ";") %>% # each spp has multiple habitats the string needs spliting
dplyr::group_by(species_iucn_habitat) %>%
dplyr::reframe(n_articles = sum(n)) %>% # now a sum for each habitat
arrange(desc(n_articles))
habitat_summary %>%
gt()
species_iucn_habitat | n_articles |
---|---|
Wetlands inland | 738 |
Artificial or Aquatic and Marine | 187 |
NA | 164 |
Marine Neritic | 99 |
Marine Coastal or Supratidal | 44 |
Marine Intertidal | 27 |
Forest | 23 |
Grassland | 22 |
Artificial or Terrestrial | 21 |
Shrubland | 18 |
Savanna | 13 |
Marine Oceanic | 10 |
Unknown | 2 |
Checking how many freshwater vs marine species there are. Wetlands inland categories are freshwater bodies where as Marine have multiple categories (Marine Neritic, Marine Coastal or Supratidal, Marine Intertidal, Marine Oceanic).
habitat_summary %>%
dplyr::filter(str_starts(species_iucn_habitat, "Marine") | species_iucn_habitat == "Wetlands inland") %>% # Only habitats of interest
dplyr::mutate(aquatic_type = if_else(species_iucn_habitat == "Wetlands inland", "Freshwater", "Marine")) %>% # New category
dplyr::group_by(aquatic_type) %>%
dplyr::reframe(n_articles = sum(n_articles)) %>% # Final sums
dplyr::ungroup() %>%
dplyr::mutate(n_total = sum(n_articles),
percent = round(n_articles/n_total*100,1)) %>%
gt()
aquatic_type | n_articles | n_total | percent |
---|---|---|---|
Freshwater | 738 | 918 | 80.4 |
Marine | 180 | 918 | 19.6 |
Lets break this up by study motivation
habitat_summary_all <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(study_motivation, species_iucn_habitat) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_iucn_habitat = str_trim(species_iucn_habitat)) %>%
tidyr::separate_rows(species_iucn_habitat, sep = ";") %>% # each spp has multiple habitats the string needs spliting
dplyr::group_by(species_iucn_habitat, study_motivation) %>%
dplyr::reframe(n_articles = sum(n)) %>% # now a sum for each habitat
arrange(desc(n_articles))
habitat_summary_all %>%
gt()
species_iucn_habitat | study_motivation | n_articles |
---|---|---|
Wetlands inland | Environmental | 384 |
Wetlands inland | Medical | 223 |
Artificial or Aquatic and Marine | Environmental | 135 |
Wetlands inland | Basic research | 131 |
NA | Environmental | 131 |
Marine Neritic | Environmental | 80 |
Artificial or Aquatic and Marine | Basic research | 41 |
Marine Coastal or Supratidal | Environmental | 33 |
NA | Basic research | 22 |
Marine Intertidal | Environmental | 19 |
Artificial or Terrestrial | Environmental | 15 |
Marine Neritic | Basic research | 15 |
Grassland | Environmental | 13 |
Shrubland | Environmental | 13 |
Forest | Environmental | 12 |
Artificial or Aquatic and Marine | Medical | 11 |
NA | Medical | 11 |
Savanna | Environmental | 8 |
Forest | Basic research | 7 |
Marine Coastal or Supratidal | Basic research | 7 |
Marine Intertidal | Basic research | 7 |
Grassland | Basic research | 5 |
Artificial or Terrestrial | Basic research | 4 |
Forest | Medical | 4 |
Grassland | Medical | 4 |
Marine Coastal or Supratidal | Medical | 4 |
Marine Neritic | Medical | 4 |
Marine Oceanic | Environmental | 4 |
Marine Oceanic | Medical | 4 |
Savanna | Basic research | 3 |
Shrubland | Basic research | 3 |
Artificial or Terrestrial | Medical | 2 |
Marine Oceanic | Basic research | 2 |
Savanna | Medical | 2 |
Shrubland | Medical | 2 |
Unknown | Environmental | 2 |
Marine Intertidal | Medical | 1 |
Selecting only habitats of interest and allocating to Freshwater or Marine
freshwater_marine <- habitat_summary_all %>%
dplyr::filter(str_starts(species_iucn_habitat, "Marine") | species_iucn_habitat == "Wetlands inland") %>% # Only habitats of interest
dplyr::mutate(aquatic_type = if_else(species_iucn_habitat == "Wetlands inland", "Freshwater", "Marine")) %>% # New category
dplyr::group_by(aquatic_type, study_motivation) %>%
dplyr::reframe(n_articles = sum(n_articles)) %>% # Final sums
dplyr::group_by(study_motivation) %>%
dplyr::mutate(n_cat = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(prop = n_articles/n_cat) # A proportion of those identify as freshwater vs marine
freshwater_marine %>%
gt()
aquatic_type | study_motivation | n_articles | n_cat | prop |
---|---|---|---|---|
Freshwater | Environmental | 384 | 520 | 0.73846154 |
Freshwater | Medical | 223 | 236 | 0.94491525 |
Freshwater | Basic research | 131 | 162 | 0.80864198 |
Marine | Environmental | 136 | 520 | 0.26153846 |
Marine | Medical | 13 | 236 | 0.05508475 |
Marine | Basic research | 31 | 162 | 0.19135802 |
Here’s a figure version of this summary
aquatic_type_order <- c("Freshwater", "Marine")
# Calculate cumulative positions for text labels
freshwater_marine <- freshwater_marine %>%
dplyr::mutate(aquatic_type = factor(aquatic_type, levels = aquatic_type_order)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::arrange(desc(aquatic_type)) %>%
dplyr::mutate(cumulative_prop = cumsum(prop) - prop/2)
# Define the black and grey color theme
color_theme <- c("#7FAB91", "#2A4A64")
# Create the plot
habitat_fig <- freshwater_marine %>%
mutate(study_motivation = fct_relevel(study_motivation, "Basic research", "Medical",
"Environmental")) %>%
ggplot(aes(y = prop, x = study_motivation, fill = aquatic_type, group = aquatic_type)) +
geom_bar(stat = "identity", width = 0.9) + geom_text(aes(label = round(prop,
2), y = cumulative_prop), color = "white", size = 3) + scale_fill_manual(values = color_theme,
name = "Habitat") + theme_classic() + theme(legend.position = "right") + labs(x = "Study motivation",
y = "Proportion of all species assigned to freshwater or marine habitat") + coord_flip()
habitat_fig
Save the figure
# setwd(figures_path) ggsave('spp_habitat_fig.pdf', plot = habitat_fig, width =
# 10, height = 5)
We should also consider how many records did not inculde IUCN reports and thus habitat.
Let’s make a plot that shows how many didn’t have an assigned IUNC habitat. To add to the above figure.
no_habitat <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::mutate(species_iucn_bin = if_else(is.na(species_iucn_doi), "No", "Yes")) %>%
dplyr::group_by(species_iucn_bin) %>%
dplyr::reframe(n = length(species_iucn_bin))
n_total <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
nrow(.)
no_habitat <- no_habitat %>%
dplyr::mutate(prop = n/n_total)
no_habitat %>%
gt()
species_iucn_bin | n | prop |
---|---|---|
No | 163 | 0.1743316 |
Yes | 772 | 0.8256684 |
Here’s the plot
# Define the black and grey color theme
black_and_grey <- c("#BCBEC0", "#414042")
yes_order <- c("Yes", "No")
# Calculate cumulative positions for text labels
habitat_info_df_fig <- no_habitat %>%
dplyr::mutate(species_iucn_bin = factor(species_iucn_bin, levels = yes_order)) %>%
dplyr::arrange(desc(species_iucn_bin)) %>%
dplyr::mutate(cumulative_prop = cumsum(prop) - prop / 2)
habitat_info_fig <- habitat_info_df_fig %>%
dplyr::mutate(species_iucn_bin = factor(species_iucn_bin, levels = yes_order)) %>%
ggplot(aes(y = prop, x = 1, fill = species_iucn_bin)) +
geom_bar(stat = "identity", width = 0.1) +
geom_text(aes(label = round(prop, 2), y = cumulative_prop), color = "white") +
scale_fill_manual(values = black_and_grey, name = "Habitat") +
theme_classic() +
theme(legend.position = "right",
axis.text.x = element_blank(), # Remove x-axis text
axis.ticks.x = element_blank() # Remove x-axis ticks
) +
labs(
x = "",
y = "Proportion of species tested in the database"
)
habitat_info_fig
Save the plot
# setwd(figures_path) ggsave('spp_habitat_info_fig.pdf', plot =
# habitat_info_fig, width = 5, height = 10)
Let’s get an overall summary first, without Unknown or not specified life stages
EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(species_stage) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_stage = str_trim(species_stage)) %>%
tidyr::separate_rows(species_stage, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::filter(species_stage != "Unknown or not specified") %>%
dplyr::group_by(species_stage) %>%
dplyr::reframe(n_articles = sum(n)) %>% # now a sum for each habitat
dplyr::mutate(total_stages = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(overall_percent = round(n_articles/total_stages*100,1)) %>%
gt()
species_stage | n_articles | total_stages | overall_percent |
---|---|---|---|
Adult | 443 | 831 | 53.3 |
Egg or embryo | 46 | 831 | 5.5 |
Juvenile | 123 | 831 | 14.8 |
Larvae | 219 | 831 | 26.4 |
Let’s take a look at spp life stages used in the EIPAAB database on study motivation
First those that didn’t report it
stage_summary_all <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(study_motivation, species_stage) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_stage = str_trim(species_stage)) %>%
tidyr::separate_rows(species_stage, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(species_stage, study_motivation) %>%
dplyr::reframe(n_total = sum(n)) %>% # now a sum for each habitat
dplyr::group_by(study_motivation) %>%
dplyr::mutate(n_motivation = sum(n_total),
percent = round(n_total/n_motivation*100,1)) %>%
dplyr::ungroup()
stage_summary_all %>%
dplyr::filter(species_stage == "Unknown or not specified") %>%
dplyr::mutate(percent_reported = 100-percent) %>%
dplyr::rename(percent_not_report = percent) %>%
gt()
species_stage | study_motivation | n_total | n_motivation | percent_not_report | percent_reported |
---|---|---|---|---|---|
Unknown or not specified | Environmental | 100 | 586 | 17.1 | 82.9 |
Unknown or not specified | Medical | 22 | 245 | 9.0 | 91.0 |
Unknown or not specified | Basic research | 44 | 166 | 26.5 | 73.5 |
A table by study motivation and age class
stage_order <- c("Adult", "Juvenile", "Larvae", "Egg or embryo")
stage_summary_reported <- stage_summary_all %>%
dplyr::filter(species_stage != "Unknown or not specified") %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(n_motivation = sum(n_total), percent = round(n_total/n_motivation *
100, 1), species_stage = factor(species_stage, levels = stage_order)) %>%
dplyr::ungroup() %>%
dplyr::arrange(species_stage)
stage_summary_reported %>%
gt()
species_stage | study_motivation | n_total | n_motivation | percent |
---|---|---|---|---|
Adult | Environmental | 236 | 486 | 48.6 |
Adult | Medical | 134 | 223 | 60.1 |
Adult | Basic research | 73 | 122 | 59.8 |
Juvenile | Environmental | 92 | 486 | 18.9 |
Juvenile | Medical | 14 | 223 | 6.3 |
Juvenile | Basic research | 17 | 122 | 13.9 |
Larvae | Environmental | 127 | 486 | 26.1 |
Larvae | Medical | 64 | 223 | 28.7 |
Larvae | Basic research | 28 | 122 | 23.0 |
Egg or embryo | Environmental | 31 | 486 | 6.4 |
Egg or embryo | Medical | 11 | 223 | 4.9 |
Egg or embryo | Basic research | 4 | 122 | 3.3 |
Same data presented as a figure
For life stages that are described, what is the breakdown
# Define the black and grey color theme
color_theme <- c("#A14323", "#D86C2F", "#EE9E5A", "#F3E9A5")
# Calculate cumulative positions for text labels
stage_summary_reported <- stage_summary_reported %>%
dplyr::group_by(study_motivation) %>%
dplyr::arrange(desc(species_stage)) %>%
dplyr::mutate(cumulative_percent = cumsum(percent) - percent/4)
# Create the plot
life_stage_fig <- stage_summary_reported %>%
mutate(study_motivation = fct_relevel(study_motivation, "Basic research", "Medical",
"Environmental")) %>%
ggplot(aes(y = percent, x = study_motivation, fill = species_stage, group = species_stage)) +
geom_bar(stat = "identity", width = 0.9) + geom_text(aes(label = percent, y = cumulative_percent),
color = "white", size = 3) + scale_fill_manual(values = color_theme, name = "Life stage") +
theme_classic() + theme(legend.position = "right") + labs(x = "Study motivation",
y = "Percent of all species assigned to a life stage") + coord_flip()
life_stage_fig
Save the figure
# setwd(figures_path) ggsave('spp_life_stage_fig.pdf', plot = life_stage_fig,
# width = 10, height = 5)
Let’s also look at how many where unknown or not described overall
stage_summary_info <- stage_summary_all %>%
dplyr::mutate(stage_reported = if_else(species_stage == "Unknown or not specified",
"No", "Yes")) %>%
dplyr::group_by(stage_reported) %>%
dplyr::reframe(n = sum(n_total))
n_total <- stage_summary_info %>%
dplyr::reframe(n_total = sum(n)) %>%
pull(n_total)
stage_summary_info <- stage_summary_info %>%
dplyr::mutate(prop = n/n_total)
Here’s the plot
# Define the black and grey color theme
black_and_grey <- c("#BCBEC0", "#414042")
yes_order <- c("Yes", "No")
# Calculate cumulative positions for text labels
stage_summary_info <- stage_summary_info %>%
dplyr::mutate(stage_reported = factor(stage_reported, levels = yes_order)) %>%
dplyr::arrange(desc(stage_reported)) %>%
dplyr::mutate(cumulative_prop = cumsum(prop) - prop / 2)
stage_info_fig <- stage_summary_info %>%
dplyr::mutate(stage_reported = factor(stage_reported, levels = yes_order)) %>%
ggplot(aes(y = prop, x = 1, fill = stage_reported)) +
geom_bar(stat = "identity", width = 0.9) +
geom_text(aes(label = round(prop, 2), y = cumulative_prop), color = "white") +
scale_fill_manual(values = black_and_grey, name = "Stage reported") +
theme_classic() +
theme(legend.position = "right",
axis.text.x = element_blank(), # Remove x-axis text
axis.ticks.x = element_blank() # Remove x-axis ticks
) +
labs(
x = "",
y = "Proportion of species tested in the database"
)
stage_info_fig
Save the figure
# setwd(figures_path) ggsave('spp_stage_info_fig.pdf', plot = stage_info_fig,
# width = 2.5, height = 5)
First overall breakdown by female and male without including unreported or hermaphroditic animals
EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(species_sex) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_sex = str_trim(species_sex)) %>%
tidyr::separate_rows(species_sex, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(species_sex) %>%
dplyr::summarise(n_articles = sum(n), .groups = 'drop') %>% # now a sum for each habitat
tidyr::complete(species_sex,
fill = list(n_articles = 0)) %>% # make a full df with empty categories = 0
dplyr::ungroup() %>%
dplyr::filter(species_sex == "Female" | species_sex == "Male") %>%
dplyr::mutate(total_sex = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(overall_percent = round(n_articles/total_sex*100,1)) %>%
gt()
species_sex | n_articles | total_sex | overall_percent |
---|---|---|---|
Female | 280 | 623 | 44.9 |
Male | 343 | 623 | 55.1 |
Let’s take a look at the sex of spp used in the EIPAAB database
sex_summary_all <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(study_motivation, species_sex) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_sex = str_trim(species_sex)) %>%
tidyr::separate_rows(species_sex, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(species_sex, study_motivation) %>%
dplyr::summarise(n_articles = sum(n), .groups = 'drop') %>% # now a sum for each habitat
tidyr::complete(species_sex, study_motivation,
fill = list(n_articles = 0)) %>% # make a full df with empty categories = 0
dplyr::ungroup() %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_sex = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(overall_prop = n_articles/total_sex)
sex_summary_all %>%
gt()
species_sex | study_motivation | n_articles | total_sex | overall_prop |
---|---|---|---|---|
Female | Environmental | 136 | 645 | 0.21085271 |
Female | Medical | 91 | 322 | 0.28260870 |
Female | Basic research | 53 | 206 | 0.25728155 |
Hermaphrodites | Environmental | 4 | 645 | 0.00620155 |
Hermaphrodites | Medical | 0 | 322 | 0.00000000 |
Hermaphrodites | Basic research | 0 | 206 | 0.00000000 |
Male | Environmental | 180 | 645 | 0.27906977 |
Male | Medical | 99 | 322 | 0.30745342 |
Male | Basic research | 64 | 206 | 0.31067961 |
Unknown or not specified | Environmental | 325 | 645 | 0.50387597 |
Unknown or not specified | Medical | 132 | 322 | 0.40993789 |
Unknown or not specified | Basic research | 89 | 206 | 0.43203883 |
Let’s look at just the proportion of those defined as male and female
sex_male_female <- sex_summary_all %>%
dplyr::filter(species_sex == "Female" | species_sex == "Male") %>%
dplyr::select(-total_sex, -overall_prop) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_male_female = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n_articles/total_male_female * 100, 1))
sex_male_female %>%
gt()
species_sex | study_motivation | n_articles | total_male_female | percent |
---|---|---|---|---|
Female | Environmental | 136 | 316 | 43.0 |
Female | Medical | 91 | 190 | 47.9 |
Female | Basic research | 53 | 117 | 45.3 |
Male | Environmental | 180 | 316 | 57.0 |
Male | Medical | 99 | 190 | 52.1 |
Male | Basic research | 64 | 117 | 54.7 |
Making the plot
sex_order <- c("Female", "Male")
color_theme <- c("#eb4729", "#1b909a")
# Calculate cumulative positions for text labels
sex_male_female <- sex_male_female %>%
dplyr::mutate(species_sex = factor(species_sex, levels = sex_order)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::arrange(desc(species_sex)) %>%
dplyr::mutate(cumulative_percent = cumsum(percent) - percent/2)
# Create the plot
sex_fig <- sex_male_female %>%
mutate(study_motivation = fct_relevel(study_motivation, "Basic research", "Medical",
"Environmental")) %>%
ggplot(aes(y = percent, x = study_motivation, fill = species_sex, group = species_sex)) +
geom_bar(stat = "identity", width = 0.9) + geom_text(aes(label = round(percent,
2), y = cumulative_percent), color = "white", size = 3) + scale_fill_manual(values = color_theme,
name = "Sex") + theme_classic() + theme(legend.position = "right") + labs(x = "Study motivation",
y = "Percentage of all species assigned to female or male") + coord_flip()
sex_fig
Save the figure
# setwd(figures_path) ggsave('spp_sex_fig.pdf', plot = sex_fig, width = 10,
# height = 5)
Now let’s look at those not assigned to a sex
sex_summary_info <- sex_summary_all %>%
dplyr::mutate(sex_reported = if_else(species_sex == "Unknown or not specified",
"No", "Yes")) %>%
dplyr::group_by(sex_reported) %>%
dplyr::reframe(n = sum(n_articles))
n_total <- sex_summary_info %>%
dplyr::reframe(n_total = sum(n)) %>%
dplyr::pull(n_total)
sex_summary_info <- sex_summary_info %>%
dplyr::mutate(prop = n/n_total)
Here’s the plot
# Define the black and grey color theme
black_and_grey <- c("#BCBEC0", "#414042")
yes_order <- c("Yes", "No")
# Calculate cumulative positions for text labels
sex_summary_info <- sex_summary_info %>%
dplyr::mutate(sex_reported = factor(sex_reported, levels = yes_order)) %>%
dplyr::arrange(desc(sex_reported)) %>%
dplyr::mutate(cumulative_prop = cumsum(prop) - prop / 2)
sex_info_fig <- sex_summary_info %>%
dplyr::mutate(sex_reported = factor(sex_reported, levels = yes_order)) %>%
ggplot(aes(y = prop, x = 1, fill = sex_reported)) +
geom_bar(stat = "identity", width = 0.9) +
geom_text(aes(label = round(prop, 2), y = cumulative_prop), color = "white") +
scale_fill_manual(values = black_and_grey, name = "Sex reported") +
theme_classic() +
theme(legend.position = "right",
axis.text.x = element_blank(), # Remove x-axis text
axis.ticks.x = element_blank() # Remove x-axis ticks
) +
labs(
x = "",
y = "Proportion of all species"
)
sex_info_fig
Save the figure
# setwd(figures_path) ggsave('spp_sex_info_fig.pdf', plot = sex_info_fig, width
# = 2.5, height = 5)
Breakdown without unreported
EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(species_source) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_source = str_trim(species_source)) %>%
tidyr::separate_rows(species_source, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(species_source) %>%
dplyr::summarise(n_articles = sum(n), .groups = 'drop') %>% # now a sum for each habitat
tidyr::complete(species_source,
fill = list(n_articles = 0)) %>% # make a full df with empty categories = 0
dplyr::filter(species_source != "Not reported") %>%
dplyr::ungroup() %>%
dplyr::mutate(total_source = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(overall_percent = round(n_articles/total_source*100, 1)) %>%
dplyr::arrange(desc(overall_percent)) %>%
gt()
species_source | n_articles | total_source | overall_percent |
---|---|---|---|
Commercial supplier or fish farm | 306 | 802 | 38.2 |
Lab stock of undisclosed origin | 213 | 802 | 26.6 |
Wild collected | 195 | 802 | 24.3 |
Lab stock from commercial supplier | 55 | 802 | 6.9 |
Lab stock from wild population | 33 | 802 | 4.1 |
Let’s look at where the animals were sourced for articles in the EIPAAB database
source_summary_all <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(study_motivation, species_source) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_source = str_trim(species_source)) %>%
tidyr::separate_rows(species_source, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(species_source, study_motivation) %>%
dplyr::summarise(n_articles = sum(n), .groups = 'drop') %>% # now a sum for each habitat
tidyr::complete(species_source, study_motivation,
fill = list(n_articles = 0)) %>% # make a full df with empty categories = 0
dplyr::ungroup() %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_source = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(overall_percent = round(n_articles/total_source*100,1))
source_summary_all %>%
gt()
species_source | study_motivation | n_articles | total_source | overall_percent |
---|---|---|---|---|
Commercial supplier or fish farm | Environmental | 134 | 548 | 24.5 |
Commercial supplier or fish farm | Medical | 101 | 240 | 42.1 |
Commercial supplier or fish farm | Basic research | 71 | 162 | 43.8 |
Lab stock from commercial supplier | Environmental | 29 | 548 | 5.3 |
Lab stock from commercial supplier | Medical | 16 | 240 | 6.7 |
Lab stock from commercial supplier | Basic research | 10 | 162 | 6.2 |
Lab stock from wild population | Environmental | 25 | 548 | 4.6 |
Lab stock from wild population | Medical | 1 | 240 | 0.4 |
Lab stock from wild population | Basic research | 7 | 162 | 4.3 |
Lab stock of undisclosed origin | Environmental | 119 | 548 | 21.7 |
Lab stock of undisclosed origin | Medical | 63 | 240 | 26.2 |
Lab stock of undisclosed origin | Basic research | 31 | 162 | 19.1 |
Not reported | Environmental | 72 | 548 | 13.1 |
Not reported | Medical | 51 | 240 | 21.2 |
Not reported | Basic research | 25 | 162 | 15.4 |
Wild collected | Environmental | 169 | 548 | 30.8 |
Wild collected | Medical | 8 | 240 | 3.3 |
Wild collected | Basic research | 18 | 162 | 11.1 |
Only those with reported source
source_order <- c("Wild collected", "Lab stock from wild population", "Lab stock of undisclosed origin",
"Lab stock from commercial supplier", "Commercial supplier or fish farm")
source_summary <- source_summary_all %>%
dplyr::filter(species_source != "Not reported") %>%
dplyr::select(species_source, study_motivation, n_articles) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_reported = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n_articles/total_reported * 100, 1), species_source = factor(species_source,
levels = source_order)) %>%
dplyr::arrange(species_source)
source_summary %>%
gt()
species_source | study_motivation | n_articles | total_reported | percent |
---|---|---|---|---|
Wild collected | Environmental | 169 | 476 | 35.5 |
Wild collected | Medical | 8 | 189 | 4.2 |
Wild collected | Basic research | 18 | 137 | 13.1 |
Lab stock from wild population | Environmental | 25 | 476 | 5.3 |
Lab stock from wild population | Medical | 1 | 189 | 0.5 |
Lab stock from wild population | Basic research | 7 | 137 | 5.1 |
Lab stock of undisclosed origin | Environmental | 119 | 476 | 25.0 |
Lab stock of undisclosed origin | Medical | 63 | 189 | 33.3 |
Lab stock of undisclosed origin | Basic research | 31 | 137 | 22.6 |
Lab stock from commercial supplier | Environmental | 29 | 476 | 6.1 |
Lab stock from commercial supplier | Medical | 16 | 189 | 8.5 |
Lab stock from commercial supplier | Basic research | 10 | 137 | 7.3 |
Commercial supplier or fish farm | Environmental | 134 | 476 | 28.2 |
Commercial supplier or fish farm | Medical | 101 | 189 | 53.4 |
Commercial supplier or fish farm | Basic research | 71 | 137 | 51.8 |
A figure with the same information
source_order <- c("Wild collected", "Lab stock from wild population", "Lab stock of undisclosed origin",
"Lab stock from commercial supplier", "Commercial supplier or fish farm")
# Define the black and grey color theme
color_theme <- c("#607C3B", "#A7D271", "#6B6E70", "#A66EAF", "#61346B")
# Calculate cumulative positions for text labels
source_summary <- source_summary %>%
dplyr::group_by(study_motivation) %>%
dplyr::arrange(desc(species_source)) %>%
dplyr::mutate(cumulative_percent = cumsum(percent) - percent/5)
# Create the plot
source_fig <- source_summary %>%
mutate(study_motivation = fct_relevel(study_motivation, "Basic research", "Medical",
"Environmental")) %>%
ggplot(aes(y = percent, x = study_motivation, fill = species_source, group = species_source)) +
geom_bar(stat = "identity", width = 0.9) + geom_text(aes(label = round(percent,
2), y = cumulative_percent), color = "white", size = 3) + scale_fill_manual(values = color_theme,
name = "Life stage") + theme_classic() + theme(legend.position = "right") + labs(x = "Study motivation",
y = "Proportion of all species with a described source") + coord_flip()
source_fig
Save the figure
# setwd(figures_path) ggsave('spp_source_fig.pdf', plot = source_fig, width =
# 10, height = 5)
Now let’s look at those not assigned a source
source_summary_info <- source_summary_all %>%
dplyr::mutate(source_reported = if_else(species_source == "Not reported", "No",
"Yes")) %>%
dplyr::group_by(source_reported) %>%
dplyr::reframe(n = sum(n_articles))
n_total <- source_summary_info %>%
dplyr::reframe(n_total = sum(n)) %>%
dplyr::pull(n_total)
source_summary_info <- source_summary_info %>%
dplyr::mutate(prop = n/n_total)
Here’s the plot
# Define the black and grey color theme
black_and_grey <- c("#BCBEC0", "#414042")
yes_order <- c("Yes", "No")
# Calculate cumulative positions for text labels
source_summary_info <- source_summary_info %>%
dplyr::mutate(source_reported = factor(source_reported, levels = yes_order)) %>%
dplyr::arrange(desc(source_reported)) %>%
dplyr::mutate(cumulative_prop = cumsum(prop) - prop / 2)
source_info_fig <- source_summary_info %>%
dplyr::mutate(source_reported = factor(source_reported, levels = yes_order)) %>%
ggplot(aes(y = prop, x = 1, fill = source_reported)) +
geom_bar(stat = "identity", width = 0.9) +
geom_text(aes(label = round(prop, 2), y = cumulative_prop), color = "white") +
scale_fill_manual(values = black_and_grey, name = "Source reported") +
theme_classic() +
theme(legend.position = "right",
axis.text.x = element_blank(), # Remove x-axis text
axis.ticks.x = element_blank() # Remove x-axis ticks
) +
labs(
x = "",
y = "Proportion of all species"
)
source_info_fig
Save the figure
# setwd(figures_path) ggsave('spp_source_info_fig.pdf', plot = source_info_fig,
# width = 5, height = 10)
There are 426 distinct compounds in the database
EIPAAB_database %>%
dplyr::distinct(compound_name) %>%
nrow()
## [1] 426
article_n <- EIPAAB_database %>%
dplyr::distinct(article_id) %>%
nrow(.)
compound_n_summary <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(compound_n) %>%
dplyr::reframe(n = n(),
percent = round((n/article_n)*100,1))
compound_n_summary %>%
dplyr::slice(1:10) %>% #10 most common number of compounds
gt()
compound_n | n | percent |
---|---|---|
1 | 624 | 69.3 |
2 | 127 | 14.1 |
3 | 67 | 7.4 |
4 | 32 | 3.6 |
5 | 16 | 1.8 |
6 | 8 | 0.9 |
7 | 6 | 0.7 |
8 | 5 | 0.6 |
9 | 1 | 0.1 |
10 | 2 | 0.2 |
How many used more then 5
compound_n_summary %>%
dplyr::filter(compound_n > 5) %>%
reframe(n = sum(n), percent = round((n/article_n) * 100, 1)) %>%
gt()
n | percent |
---|---|
35 | 3.9 |
By motivation summary data frame
compound_n_summary <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(compound_n, study_motivation) %>%
dplyr::summarise(n = n(), .groups = "drop") %>%
tidyr::complete(compound_n, study_motivation, fill = list(n = 0)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(prop_motivation = n/total_motivation)
# Define the colour palette
motivation_colour_theme <- c("#60BD6C", "#D359A1", "#3C82C4") # Making colour theme to apply to plot
compound_n_oder <- c(1:9, ">10")
compound_n_fig <- compound_n_summary %>%
dplyr::mutate(
compound_n = as.character(if_else(compound_n > 10, 10, compound_n)), # Grouping cases above 10
compound_n = if_else(compound_n == "10", ">10", compound_n)
) %>%
dplyr::group_by(compound_n, study_motivation) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(compound_n = factor(compound_n, levels = compound_n_oder)) %>%
ggplot(aes(x=compound_n, y=n, colour = study_motivation, fill = study_motivation, group = study_motivation)) +
geom_col(position = position_dodge(width = 0.8), width = 0.1) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = n), vjust=-0.6, size=3.5, color="black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
labs(
x = "",
y = "Number of studies"
) +
theme()
compound_n_fig
Save the figure
# setwd(figures_path) ggsave('comp_compound_n_fig.pdf', plot = compound_n_fig,
# width = 10, height = 5)
First lets see how many compounds have an ATC classification.
305 out of 426 (71.6%)
EIPAAB_database %>%
dplyr::group_by(compound_name) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(compound_atc_boolean) %>%
dplyr::reframe(n = length(compound_atc_boolean)) %>%
gt()
compound_atc_boolean | n |
---|---|
No | 121 |
Yes | 305 |
Let’s seem how many classes there are at the Anatomical Therapeutic Chemical (ATC) level 1
There are 14 classes at the 1st ATC level (the highest class of the ATC). This is ever class at the first level
n_compound_atc <- EIPAAB_database %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
dplyr::distinct(compound_name) %>%
nrow(.)
compound_ATC_L1_summary <- EIPAAB_database %>%
dplyr::group_by(compound_name) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
dplyr::group_by(compound_atc_level_1) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(compound_atc_level_1 = str_trim(compound_atc_level_1)) %>%
tidyr::separate_rows(compound_atc_level_1, sep = ";") %>%
dplyr::group_by(compound_atc_level_1) %>%
dplyr::reframe(n = sum(n),
percent = round(n/n_compound_atc*100,1),
measure = "compounds") %>%
arrange(desc(n))
compound_ATC_L1_summary %>%
gt()
compound_atc_level_1 | n | percent | measure |
---|---|---|---|
n nervous system | 137 | 44.9 | compounds |
c cardiovascular system | 49 | 16.1 | compounds |
a alimentary tract and metabolism | 35 | 11.5 | compounds |
s sensory organs | 34 | 11.1 | compounds |
g genito urinary system and sex hormones | 30 | 9.8 | compounds |
j antiinfectives for systemic use | 28 | 9.2 | compounds |
d dermatologicals | 27 | 8.9 | compounds |
r respiratory system | 26 | 8.5 | compounds |
l antineoplastic and immunomodulating agents | 19 | 6.2 | compounds |
m musculo-skeletal system | 12 | 3.9 | compounds |
v various | 9 | 3.0 | compounds |
h systemic hormonal preparations, excl. sex hormones and insulins | 8 | 2.6 | compounds |
p antiparasitic products, insecticides and repellents | 6 | 2.0 | compounds |
b blood and blood forming organs | 4 | 1.3 | compounds |
Now we will make a similar data file to look at the overall use in the database at each ATC level 1
n_data_atc <- EIPAAB_database %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
nrow(.)
compound_ATC_L1_data_summary <- EIPAAB_database %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
dplyr::group_by(compound_atc_level_1) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(compound_atc_level_1 = str_trim(compound_atc_level_1)) %>%
tidyr::separate_rows(compound_atc_level_1, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(compound_atc_level_1) %>%
dplyr::reframe(n = sum(n),
percent =round(n/n_data_atc*100,1),
measure = "data") %>% # now a sum for each habitat
arrange(desc(percent))
compound_ATC_L1_data_summary %>%
dplyr::select(-measure) %>%
gt()
compound_atc_level_1 | n | percent |
---|---|---|
n nervous system | 1119 | 72.9 |
g genito urinary system and sex hormones | 201 | 13.1 |
c cardiovascular system | 158 | 10.3 |
d dermatologicals | 130 | 8.5 |
s sensory organs | 102 | 6.6 |
a alimentary tract and metabolism | 93 | 6.1 |
l antineoplastic and immunomodulating agents | 89 | 5.8 |
r respiratory system | 76 | 4.9 |
m musculo-skeletal system | 58 | 3.8 |
j antiinfectives for systemic use | 57 | 3.7 |
v various | 32 | 2.1 |
h systemic hormonal preparations, excl. sex hormones and insulins | 15 | 1.0 |
b blood and blood forming organs | 12 | 0.8 |
p antiparasitic products, insecticides and repellents | 7 | 0.5 |
ATC_L1_summary <- compound_ATC_L1_summary %>%
rbind(., compound_ATC_L1_data_summary) %>%
dplyr::mutate(value = if_else(measure == "compounds", n, percent))
This plot shows the number of different compounds in each ATC classification as well as the total proportion of data it makes up
measure_colour_theme <- c("black", "grey") # Making colour theme to apply to plot
# Making a list of act names in the order that we want them in the plot
level_1_order <- ATC_L1_summary %>%
dplyr::filter(measure == "data") %>%
dplyr::arrange(value) %>%
dplyr::pull(compound_atc_level_1)
# Making the plot
atc_level_1_fig <- ATC_L1_summary %>%
dplyr::mutate(compound_atc_level_1 = factor(compound_atc_level_1, levels = level_1_order)) %>%
ggplot(aes(x = compound_atc_level_1, y = value, colour = measure, fill = measure,
group = measure)) + geom_col(position = position_dodge(width = 0.8), width = 0.2,
colour = NA) + geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = value), hjust = -0.6, size = 3.5, color = "black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = measure_colour_theme, name = "Value type") + scale_fill_manual(values = measure_colour_theme,
name = "Value type") + coord_flip() + scale_y_continuous(name = "Number of distinct compounds",
sec.axis = sec_axis(~., name = "Total proportion of the database") # Adjust scaling if needed
) +
theme_classic() + labs(x = "", y = "Number of distict species in the database") +
theme()
atc_level_1_fig
# setwd(figures_path) ggsave('comp_atc_level_1_fig.pdf', plot =
# atc_level_1_fig, width = 10, height = 10)
Let’s seem how many classes there are at the Anatomical Therapeutic Chemical (ATC) level 3
There are 131 distinct classes, I am creating a table with the top 15.
n_compound_atc <- EIPAAB_database %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
dplyr::distinct(compound_name) %>%
nrow(.)
compound_ATC_L3_summary <- EIPAAB_database %>%
dplyr::group_by(compound_name) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
dplyr::group_by(compound_atc_level_3) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(compound_atc_level_3 = str_trim(compound_atc_level_3)) %>%
tidyr::separate_rows(compound_atc_level_3, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(compound_atc_level_3) %>%
dplyr::reframe(n = sum(n),
percent = round(n/n_compound_atc*100,1),
measure = "compounds") %>% # now a sum for each habitat
arrange(desc(n))
compound_ATC_L3_summary %>%
dplyr::select(-measure) %>%
dplyr::slice(1:15) %>% # only the first 15
gt()
compound_atc_level_3 | n | percent |
---|---|---|
n06a antidepressants | 27 | 8.9 |
n03a antiepileptics | 18 | 5.9 |
n05a antipsychotics | 14 | 4.6 |
a01a stomatological preparations | 12 | 3.9 |
n05b anxiolytics | 11 | 3.6 |
n05c hypnotics and sedatives | 11 | 3.6 |
n06b psychostimulants, agents used for adhd and nootropics | 11 | 3.6 |
r06a antihistamines for systemic use | 11 | 3.6 |
c07a beta blocking agents | 9 | 3.0 |
d04a antipruritics, incl. antihistamines, anesthetics, etc. | 9 | 3.0 |
s01e antiglaucoma preparations and miotics | 9 | 3.0 |
c05a agents for treatment of hemorrhoids and anal fissures for topical use | 8 | 2.6 |
g03c estrogens | 8 | 2.6 |
n01a anesthetics, general | 8 | 2.6 |
c10a lipid modifying agents, plain | 7 | 2.3 |
Now we will make a similar data file to look at the overall use in the database at each ATC level 3.
Below I make a table with the top 15
n_data_atc <- EIPAAB_database %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
nrow(.)
compound_ATC_L3_data_summary <- EIPAAB_database %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
dplyr::group_by(compound_atc_level_3) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(compound_atc_level_3 = str_trim(compound_atc_level_3)) %>%
tidyr::separate_rows(compound_atc_level_3, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(compound_atc_level_3) %>%
dplyr::reframe(n = sum(n),
percent = round(n/n_data_atc*100,1),
measure = "data") %>% # now a sum for each habitat
arrange(desc(percent))
compound_ATC_L3_data_summary %>%
dplyr::select(-measure) %>%
dplyr::slice(1:15) %>%
gt()
compound_atc_level_3 | n | percent |
---|---|---|
n06a antidepressants | 424 | 27.6 |
n03a antiepileptics | 164 | 10.7 |
n05b anxiolytics | 149 | 9.7 |
g03c estrogens | 121 | 7.9 |
n06b psychostimulants, agents used for adhd and nootropics | 84 | 5.5 |
l02a hormones and related agents | 64 | 4.2 |
d11a other dermatological preparations | 62 | 4.0 |
n05a antipsychotics | 60 | 3.9 |
m01a antiinflammatory and antirheumatic products, non-steroids | 50 | 3.3 |
n02a opioids | 51 | 3.3 |
m02a topical products for joint and muscular pain | 48 | 3.1 |
n02b other analgesics and antipyretics | 45 | 2.9 |
n05c hypnotics and sedatives | 40 | 2.6 |
c07a beta blocking agents | 38 | 2.5 |
r02a throat preparations | 35 | 2.3 |
Here we make a new column called value where we combined the count of distinct compounds and proportion of data
ATC_L3_summary <- compound_ATC_L3_summary %>%
rbind(., compound_ATC_L3_data_summary) %>%
dplyr::mutate(value = if_else(measure == "compounds", n, percent))
This plot shows the number of different compounds in each ATC classification as well as the total proportion of data it makes up. This is done for only the 15 most commonly used groups.
measure_colour_theme <- c("black", "grey") # Making colour theme to apply to plot
# Making a list of act names in the order that we want them in the plot
level_3_order_top_15 <- ATC_L3_summary %>%
dplyr::filter(measure == "data") %>%
dplyr::arrange(desc(value)) %>%
dplyr::slice(1:15) %>%
dplyr::arrange(desc(value)) %>%
dplyr::pull(compound_atc_level_3)
# Making the plot
atc_level_3_fig <- ATC_L3_summary %>%
dplyr::filter(compound_atc_level_3 %in% level_3_order_top_15) %>%
dplyr::mutate(compound_atc_level_3 = factor(compound_atc_level_3, levels = level_3_order_top_15)) %>%
ggplot(aes(x=compound_atc_level_3, y=value, fill = measure, colour = measure,
group = measure)) +
geom_col(position = position_dodge(width = 1), width = 0.2, colour = NA,) +
geom_point(position = position_dodge(width = 1), size = 3) +
geom_text(aes(label = value), vjust=-0.6, size=3.5, position = position_dodge(width = 1)) +
scale_colour_manual(values = measure_colour_theme, name = "Value type",
labels = c("Compounds (n)", "Percentage of data")) +
scale_fill_manual(values = measure_colour_theme, name = "Value type",
labels = c("Compounds (n)", "Percentage of data")) +
scale_y_continuous(
name = "Distinct compounds",
limits = c(0, 30), # Set the range of y-axis
breaks = c(0, 10, 20, 30), # Set the labels at 0, 10, 20, 30
sec.axis = sec_axis(~ . , name = "Percentage of database") # Adjust scaling if needed
) +
theme_classic() +
labs(
x = "",
y = ""
) +
theme(
axis.text.y = element_text(size = 8), # Change y-axis labels size
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size = 8), # Change x-axis labels orientation
legend.title = element_text(size = 12), # Change legend title size if needed
legend.text = element_text(size = 10)
)
atc_level_3_fig
Save the figure
# setwd(figures_path) ggsave('atc_level_3_fig.pdf', plot = atc_level_3_fig,
# width = 10, height = 5)
Overall the most common compounds are Fluoxetine, Diazepam and 17-alpha-ethinylestradiol
Below is a table of the top 15 compounds
n_row <- EIPAAB_database %>%
nrow(.)
compound_use <- EIPAAB_database %>%
dplyr::group_by(compound_name) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(prop = n/n_row) %>%
arrange(desc(prop))
compound_use %>%
dplyr::slice(1:15) %>%
gt()
compound_name | n | prop |
---|---|---|
Fluoxetine | 200 | 0.11500863 |
Diazepam | 67 | 0.03852789 |
17-alpha-ethinylestradiol | 63 | 0.03622772 |
Caffeine | 45 | 0.02587694 |
Venlafaxine | 43 | 0.02472685 |
Citalopram | 42 | 0.02415181 |
Sertraline | 39 | 0.02242668 |
Carbamazepine | 38 | 0.02185164 |
Buspirone | 30 | 0.01725129 |
Morphine | 27 | 0.01552616 |
Oxazepam | 26 | 0.01495112 |
Valproate | 24 | 0.01380104 |
17-beta-estradiol | 23 | 0.01322599 |
Ibuprofen | 20 | 0.01150086 |
Nicotine | 19 | 0.01092582 |
Let’s see what the numbers are for each motivation, but let’s also maintain the overall numbers so we can add it to the figure
n_row <- EIPAAB_database %>%
nrow(.)
compound_use_motivation <- EIPAAB_database %>%
dplyr::group_by(study_motivation, compound_name) %>%
dplyr::summarise(n = n(), .groups = "drop") %>%
tidyr::complete(compound_name, study_motivation, fill = list(n = 0)) # Making sure we have a complete dataframe
compound_use_motivation <- compound_use %>%
dplyr::select(-prop) %>%
dplyr::mutate(study_motivation = "All") %>%
rbind(., compound_use_motivation)
The top 10 based on each study motivation as well as the overall total
top_10_comp_all_fig <- compound_use_motivation %>%
dplyr::filter(study_motivation == "All") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::mutate(compound_name = fct_reorder(compound_name, n)) %>%
ggplot(aes(x = compound_name, y = n)) + geom_col(width = 0.1, colour = NA, fill = "grey") +
geom_point(size = 3, colour = "grey", fill = "grey") + geom_text(aes(label = n),
hjust = -0.6, size = 3.5, color = "black") + coord_flip() + theme_classic() +
labs(title = "All", x = "", y = "Total use in the database") + theme(plot.title = element_text(size = 11))
top_10_comp_env_fig <- compound_use_motivation %>%
dplyr::filter(study_motivation == "Environmental") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::mutate(compound_name = fct_reorder(compound_name, n)) %>%
ggplot(aes(x = compound_name, y = n)) + geom_col(width = 0.1, colour = NA, fill = "#60BD6C") +
geom_point(size = 3, colour = "#60BD6C", fill = "#60BD6C") + geom_text(aes(label = n),
hjust = -0.6, size = 3.5, color = "black") + coord_flip() + theme_classic() +
labs(title = "Environmental", x = "", y = "Total use in the database") + theme(plot.title = element_text(size = 11))
top_10_comp_med_fig <- compound_use_motivation %>%
dplyr::filter(study_motivation == "Medical") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::mutate(compound_name = fct_reorder(compound_name, n)) %>%
ggplot(aes(x = compound_name, y = n)) + geom_col(width = 0.1, colour = NA, fill = "#D359A1") +
geom_point(size = 3, colour = "#D359A1", fill = "#D359A1") + geom_text(aes(label = n),
hjust = -0.6, size = 3.5, color = "black") + coord_flip() + theme_classic() +
labs(title = "Medical", x = "", y = "Total use in the database") + theme(plot.title = element_text(size = 11))
top_10_comp_base_fig <- compound_use_motivation %>%
dplyr::filter(study_motivation == "Basic research") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::mutate(compound_name = fct_reorder(compound_name, n)) %>%
ggplot(aes(x = compound_name, y = n)) + geom_col(width = 0.1, colour = NA, fill = "#3C82C4") +
geom_point(size = 3, colour = "#3C82C4", fill = "#3C82C4") + geom_text(aes(label = n),
hjust = -0.6, size = 3.5, color = "black") + coord_flip() + theme_classic() +
labs(title = "Basic Research", x = "", y = "Total use in the database") + theme(plot.title = element_text(size = 11))
Here are the resulting figures
top_10_comp_all_fig
top_10_comp_env_fig
top_10_comp_med_fig
top_10_comp_base_fig
Saving as PDFs
# setwd(figures_path) ggsave('comp_top_10_comp_all_fig.pdf', plot =
# top_10_comp_all_fig, width = 5, height = 10)
# ggsave('comp_top_10_comp_env_fig.pdf', plot = top_10_comp_env_fig, width = 5,
# height = 10) ggsave('comp_top_10_comp_med_fig.pdf', plot =
# top_10_comp_med_fig, width = 5, height = 10)
# ggsave('comp_top_10_comp_base_fig.pdf', plot = top_10_comp_base_fig, width =
# 5, height = 10)
It was recorded whether the animals were also exposed to compound mixtures
EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::reframe(mixture_yes = sum(compond_mixture == "Yes", na.rm = TRUE), mixture_no = sum(compond_mixture ==
"No", na.rm = TRUE), mixture_percent = (mixture_yes/mixture_no) * 100) %>%
gt()
mixture_yes | mixture_no | mixture_percent |
---|---|---|
165 | 736 | 22.41848 |
Medical articles have a much higher use of mixtures
EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(study_motivation) %>%
dplyr::reframe(mixture_yes = sum(compond_mixture == "Yes", na.rm = TRUE), mixture_no = sum(compond_mixture ==
"No", na.rm = TRUE), mixture_percent = round((mixture_yes/mixture_no) * 100,
1)) %>%
gt()
study_motivation | mixture_yes | mixture_no | mixture_percent |
---|---|---|---|
Environmental | 57 | 453 | 12.6 |
Medical | 76 | 157 | 48.4 |
Basic research | 32 | 126 | 25.4 |
Data on the method of exposure was also extracted
nrow <- EIPAAB_database %>%
nrow(.)
EIPAAB_database %>%
dplyr::group_by(compound_expose_route) %>%
dplyr::reframe(n = n(), percent = round((n/nrow) * 100, 1)) %>%
gt()
compound_expose_route | n | percent |
---|---|---|
Other exposure route | 223 | 12.8 |
Waterborne only | 1500 | 86.3 |
Waterborne plus any other route | 16 | 0.9 |
The database has both the minimum and maximum duration of exposure prior to behavioural measure (compound_min_duration_exposure and compound_max_duration_exposure). Here we will focus on the maximum duration
These are the different categories of exposure length
EIPAAB_database %>%
dplyr::distinct(compound_max_duration_exposure) %>%
gt()
compound_max_duration_exposure |
---|
Less than 6 hours |
1 to 3 months |
3 to 8 days |
22 to 29 days |
Multigenerational |
6 to 24 hours |
1 to 3 days |
8 to 15 days |
Not stated |
15 to 22 days |
Transgenerational |
3 to 6 months |
Lifetime |
Some articles did not report the exposure duration at all, or in sufficient detail to extract.
In total this occurred in 108 cases
EIPAAB_database %>%
dplyr::filter(compound_min_duration_exposure == "Not stated" | compound_max_duration_exposure ==
"Not stated") %>%
nrow(.)
## [1] 108
Summary table
exposure_duration_order <- c("Less than 6 hours", "6 to 24 hours", "1 to 3 days",
"3 to 8 days", "8 to 15 days", "15 to 22 days", "22 to 29 days", "1 to 3 months",
"3 to 6 months", "Lifetime", "Transgenerational", "Multigenerational")
nrow <- EIPAAB_database %>%
dplyr::filter(compound_max_duration_exposure != "Not stated") %>%
nrow(.)
exposure_duration_summary <- EIPAAB_database %>%
dplyr::filter(compound_max_duration_exposure != "Not stated") %>%
dplyr::group_by(compound_max_duration_exposure) %>%
dplyr::reframe(n = n(), percent = round((n/nrow) * 100, 1)) %>%
dplyr::mutate(compound_max_duration_exposure = factor(compound_max_duration_exposure,
levels = exposure_duration_order)) %>%
dplyr::arrange(compound_max_duration_exposure) %>%
dplyr::mutate(study_motivation = "All")
exposure_duration_summary %>%
gt()
compound_max_duration_exposure | n | percent | study_motivation |
---|---|---|---|
Less than 6 hours | 679 | 41.3 | All |
6 to 24 hours | 129 | 7.9 | All |
1 to 3 days | 106 | 6.5 | All |
3 to 8 days | 317 | 19.3 | All |
8 to 15 days | 101 | 6.1 | All |
15 to 22 days | 113 | 6.9 | All |
22 to 29 days | 59 | 3.6 | All |
1 to 3 months | 83 | 5.1 | All |
3 to 6 months | 23 | 1.4 | All |
Lifetime | 9 | 0.5 | All |
Transgenerational | 15 | 0.9 | All |
Multigenerational | 9 | 0.5 | All |
By motivation
exposure_duration_order <- c("Less than 6 hours", "6 to 24 hours", "1 to 3 days",
"3 to 8 days", "8 to 15 days", "15 to 22 days", "22 to 29 days", "1 to 3 months",
"3 to 6 months", "Lifetime", "Transgenerational", "Multigenerational")
exposure_duration_motivation_summary <- EIPAAB_database %>%
dplyr::filter(compound_max_duration_exposure != "Not stated") %>%
dplyr::group_by(compound_max_duration_exposure, study_motivation) %>%
dplyr::summarise(n = n(), .groups = "drop") %>%
tidyr::complete(compound_max_duration_exposure, study_motivation, fill = list(n = 0)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round((n/total_motivation) * 100, 1)) %>%
dplyr::select(-total_motivation)
exp_duration_motivation_summary <- exposure_duration_motivation_summary %>%
rbind(exposure_duration_summary) %>%
dplyr::mutate(compound_max_duration_exposure = factor(compound_max_duration_exposure,
levels = rev(exposure_duration_order))) %>%
dplyr::arrange(desc(compound_max_duration_exposure))
Making plots for each study motivation
exp_duration_all_fig <- exp_duration_motivation_summary %>%
dplyr::filter(study_motivation == "All") %>%
ggplot(aes(x = compound_max_duration_exposure, y = percent)) + geom_col(width = 0.1,
colour = NA, fill = "grey") + geom_point(size = 3, colour = "grey", fill = "grey") +
geom_text(aes(label = percent), vjust = -0.6, size = 3.5, color = "black") +
theme_classic() + coord_flip() + labs(title = "All", x = "", y = "Total percentage") +
theme(plot.title = element_text(size = 11))
exp_duration_env_fig <- exp_duration_motivation_summary %>%
dplyr::filter(study_motivation == "Environmental") %>%
ggplot(aes(x = compound_max_duration_exposure, y = percent)) + geom_col(width = 0.1,
colour = NA, fill = "#60BD6C") + geom_point(size = 3, colour = "#60BD6C", fill = "#60BD6C") +
geom_text(aes(label = percent), vjust = -0.6, size = 3.5, color = "black") +
theme_classic() + coord_flip() + labs(title = "Environmental", x = "", y = "Total percentage") +
theme(plot.title = element_text(size = 11))
exp_duration_med_fig <- exp_duration_motivation_summary %>%
dplyr::filter(study_motivation == "Medical") %>%
ggplot(aes(x = compound_max_duration_exposure, y = percent)) + geom_col(width = 0.1,
colour = NA, fill = "#D359A1") + geom_point(size = 3, colour = "#D359A1", fill = "#D359A1") +
geom_text(aes(label = percent), vjust = -0.6, size = 3.5, color = "black") +
theme_classic() + coord_flip() + labs(title = "Medical", x = "", y = "Total percentage") +
theme(plot.title = element_text(size = 11))
exp_duration_base_fig <- exp_duration_motivation_summary %>%
dplyr::filter(study_motivation == "Basic research") %>%
ggplot(aes(x = compound_max_duration_exposure, y = percent)) + geom_col(width = 0.1,
colour = NA, fill = "#3C82C4") + geom_point(size = 3, colour = "#3C82C4", fill = "#3C82C4") +
geom_text(aes(label = percent), vjust = -0.6, size = 3.5, color = "black") +
theme_classic() + coord_flip() + labs(title = "Basic research", x = "", y = "Total percentage") +
theme(plot.title = element_text(size = 11))
exp_duration_all_fig
exp_duration_env_fig
exp_duration_med_fig
exp_duration_base_fig
Save plots
# setwd(figures_path) ggsave('comp_exp_duration_all_fig.pdf', plot =
# exp_duration_all_fig, width = 8.3/3, height = 11.7/3)
# ggsave('comp_exp_duration_env_fig.pdf', plot = exp_duration_env_fig, width =
# 8.3/3, height = 11.7/3) ggsave('comp_exp_duration_med_fig.pdf', plot =
# exp_duration_med_fig, width = 8.3/3, height = 11.7/3)
# ggsave('comp_exp_duration_base_fig.pdf', plot = exp_duration_base_fig, width
# = 8.3/3, height = 11.7/3)
Here I will look at the number of doses used. This was meassured as the total treatments (i.e. inculding control), so if we want to know the number of doses for the compound we need to subtract 1. I have done this below.
n_doses_summary <- EIPAAB_database %>%
dplyr::filter(!is.na(compound_treatment_levels)) %>%
dplyr::mutate(n_doses = compound_treatment_levels - 1) %>%
dplyr::group_by(n_doses) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(total = sum(n), percent = round((n/total) * 100, 1), study_motivation = "All")
n_doses_summary %>%
gt()
n_doses | n | total | percent | study_motivation |
---|---|---|---|---|
1 | 449 | 1513 | 29.7 | All |
2 | 203 | 1513 | 13.4 | All |
3 | 377 | 1513 | 24.9 | All |
4 | 172 | 1513 | 11.4 | All |
5 | 163 | 1513 | 10.8 | All |
6 | 46 | 1513 | 3.0 | All |
7 | 50 | 1513 | 3.3 | All |
8 | 14 | 1513 | 0.9 | All |
9 | 11 | 1513 | 0.7 | All |
10 | 10 | 1513 | 0.7 | All |
11 | 10 | 1513 | 0.7 | All |
12 | 5 | 1513 | 0.3 | All |
13 | 2 | 1513 | 0.1 | All |
17 | 1 | 1513 | 0.1 | All |
Let’s see how many use more then 5
n_doses_summary %>%
dplyr::filter(n_doses > 5) %>%
dplyr::summarise(over_5_percent = sum(percent)) %>%
gt()
over_5_percent |
---|
9.8 |
Looking by study motivation
n_doses_motivation_summary <- EIPAAB_database %>%
dplyr::filter(!is.na(compound_treatment_levels)) %>%
dplyr::mutate(n_doses = compound_treatment_levels - 1) %>%
dplyr::group_by(n_doses, study_motivation) %>%
dplyr::summarise(n = n(), .groups = "drop") %>%
tidyr::complete(n_doses, study_motivation, fill = list(n = 0)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round((n/total) * 100, 1))
Making a plot
dose_order <- c(1:12, ">12")
doses_fig <- n_doses_motivation_summary %>%
dplyr::mutate(n_doses = as.character(if_else(n_doses>13, 13, n_doses)),
n_doses = if_else(n_doses == "13", ">12", n_doses),
n_doses = factor(n_doses, levels = dose_order)
)%>%
ggplot(aes(x=n_doses, y=percent, colour = study_motivation,
fill = study_motivation, group = study_motivation)) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_line(size = 1) +
geom_text(aes(label = percent), vjust=-0.6, size=3.5, color="black") +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
facet_wrap(~study_motivation) +
theme(
legend.position = c(0.8, 0.9), # Positioning the legend in the top-left corner within the plot
legend.justification = c(0, 1) # Ensuring the legend box aligns properly at the top-left corner
) +
labs(
x = "",
y = "Number of studies"
) +
theme()
doses_fig
Save the plot
# setwd(figures_path) ggsave('comp_doses_fig.pdf', plot = doses_fig, width =
# 8.3, height = 11.7/3)
Here I will have a look at the min and max and range of doses used in the database. For the MS, I am including only studies that reported in a mass to water volume measure so we can compare standardised unites (ug/L). This was the most common reporting methods (62% of all data; 1090 total).
nrow <- EIPAAB_database %>%
nrow()
EIPAAB_database %>%
dplyr::group_by(compound_min_dose_unit_std) %>%
dplyr::reframe(n = n(), prop = n/nrow) %>%
dplyr::arrange(desc(n)) %>%
gt()
compound_min_dose_unit_std | n | prop |
---|---|---|
ug/L | 1076 | 0.618746406 |
uM | 397 | 0.228292122 |
NA | 228 | 0.131109833 |
uM/L | 25 | 0.014376078 |
ppm | 9 | 0.005175388 |
uL/L | 2 | 0.001150086 |
ug/g | 2 | 0.001150086 |
Summary of the minimum concentration used (where reported in mass to volume)
EIPAAB_database <- EIPAAB_database %>%
dplyr::mutate(range = compound_max_dose_std - compound_min_dose_std)
Dose summary by motivation
EIPAAB_database %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L", compound_max_dose_unit_std ==
"ug/L") %>%
dplyr::mutate(range = compound_max_dose_std - compound_min_dose_std) %>%
dplyr::group_by(study_motivation) %>%
dplyr::reframe(median_min = median(compound_min_dose_std, na.rm = T), sd_min = sd(compound_min_dose_std,
na.rm = T), min_min = min(compound_min_dose_std, na.rm = T), max_min = max(compound_min_dose_std,
na.rm = T), median_max = median(compound_max_dose_std, na.rm = T), sd_max = sd(compound_max_dose_std,
na.rm = T), min_max = min(compound_max_dose_std, na.rm = T), max_max = max(compound_max_dose_std,
na.rm = T), median_range = median(range, na.rm = T), sd_range = sd(range,
na.rm = T), min_range = min(range, na.rm = T), max_range = max(range, na.rm = T)) %>%
gt()
study_motivation | median_min | sd_min | min_min | max_min | median_max | sd_max | min_max | max_max | median_range | sd_range | min_range | max_range |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Environmental | 0.995 | 25255.8 | 3.13e-06 | 5.0e+05 | 100 | 140651.2 | 0.001 | 1600000 | 51.69 | 135347.17 | -1e-05 | 1579600 |
Medical | 1000.000 | 74500.7 | 5.00e-02 | 5.4e+05 | 5000 | 206906.9 | 0.050 | 1943000 | 0.00 | 196100.58 | 0e+00 | 1942030 |
Basic research | 3000.000 | 5301683.4 | 1.00e-02 | 6.0e+07 | 10000 | 5299563.0 | 0.010 | 60000000 | 72.00 | 84083.83 | 0e+00 | 600000 |
A plot for minimum doses, its on the log axis because the distribution is highly skewed motivation_colour_theme
motivation_colour_theme <- c("#60BD6C", "#D359A1", "#3C82C4")
min_conc_fig <- EIPAAB_database %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L") %>%
ggplot(aes(x = log(compound_min_dose_std), fill = study_motivation, colour = study_motivation)) +
stat_slab(alpha = 0.6, linewidth = 1.5, colour = NA) + stat_pointinterval(point_interval = "median_qi",
position = position_dodge(width = 0.4, preserve = "single"), .width = c(0.89,
0.95)) + scale_fill_manual(values = motivation_colour_theme, name = "Study motivation",
guide = "none") + scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() + labs(x = "Log10 minimum dose (ug/L)", y = "Density") + theme(legend.position = "bottom")
min_conc_fig
Save the figure
# setwd(figures_path) ggsave('comp_min_conc_fig.pdf', plot = min_conc_fig,
# width = 5, height = 6)
A summary table so we can see what the corresponding raw values are in the plot
min_conc_summary <- EIPAAB_database %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L") %>%
dplyr::group_by(study_motivation) %>%
dplyr::summarise(median = median(log(compound_min_dose_std), na.rm = TRUE), lower_89 = quantile(log(compound_min_dose_std),
probs = 0.11, na.rm = TRUE), upper_89 = quantile(log(compound_min_dose_std),
probs = 0.89, na.rm = TRUE), lower_95 = quantile(log(compound_min_dose_std),
probs = 0.05, na.rm = TRUE), upper_95 = quantile(log(compound_min_dose_std),
probs = 0.95, na.rm = TRUE), .groups = "drop") %>%
# Transform to a format suitable for ggplot annotation
pivot_longer(cols = -study_motivation, names_to = "stat", values_to = "value") %>%
dplyr::mutate(vaule_raw = format(exp(value), scientific = FALSE))
A plot for maximum doses, its on the log axis because the distribution is highly skewed motivation_colour_theme
motivation_colour_theme <- c("#60BD6C", "#D359A1", "#3C82C4")
max_conc_fig <- EIPAAB_database %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L") %>%
ggplot(aes(x = log(compound_max_dose_std), fill = study_motivation, colour = study_motivation)) +
stat_slab(alpha = 0.6, linewidth = 1.5, colour = NA) + stat_pointinterval(point_interval = "median_qi",
position = position_dodge(width = 0.4, preserve = "single"), .width = c(0.89,
0.95)) + scale_fill_manual(values = motivation_colour_theme, name = "Study motivation",
guide = "none") + scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() + labs(x = "Log10 maximum dose (ug/L)", y = "Density") + theme(legend.position = "bottom")
max_conc_fig
Save the figure
# setwd(figures_path) ggsave('comp_max_conc_fig.pdf', plot = max_conc_fig,
# width = 5, height = 6)
A summary table so we can see what the corresponding raw values are in the plot
max_conc_summary <- EIPAAB_database %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L") %>%
dplyr::group_by(study_motivation) %>%
dplyr::summarise(median = median(log(compound_max_dose_std), na.rm = TRUE), lower_89 = quantile(log(compound_max_dose_std),
probs = 0.11, na.rm = TRUE), upper_89 = quantile(log(compound_max_dose_std),
probs = 0.89, na.rm = TRUE), lower_95 = quantile(log(compound_max_dose_std),
probs = 0.05, na.rm = TRUE), upper_95 = quantile(log(compound_max_dose_std),
probs = 0.95, na.rm = TRUE), .groups = "drop") %>%
# Transform to a format suitable for ggplot annotation
pivot_longer(cols = -study_motivation, names_to = "stat", values_to = "value") %>%
dplyr::mutate(vaule_raw = format(exp(value), scientific = FALSE))
A plot for the range of doses, its on the log axis because the distribution is highly skewed. This includes only studies that had more then one dose and reported concentration in a mass to volume metric.
motivation_colour_theme <- c("#60BD6C", "#D359A1", "#3C82C4")
range_conc_fig <- EIPAAB_database %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L") %>%
dplyr::filter(range > 0) %>%
ggplot(aes(x = log(range), fill = study_motivation, colour = study_motivation)) +
stat_slab(alpha = 0.6, linewidth = 1.5, colour = NA) + stat_pointinterval(point_interval = "median_qi",
position = position_dodge(width = 0.4, preserve = "single"), .width = c(0.89,
0.95)) + scale_fill_manual(values = motivation_colour_theme, name = "Study motivation",
guide = "none") + scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() + labs(x = "Log10 range (ug/L)", y = "Density") + theme(legend.position = "bottom")
range_conc_fig
Save the figure
# setwd(figures_path) ggsave('comp_range_conc_fig.pdf', plot = range_conc_fig,
# width = 5, height = 6)
A summary table so we can see what the corresponding raw values are in the plot
range_conc_summary <- EIPAAB_database %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L", range > 0) %>%
dplyr::group_by(study_motivation) %>%
summarise(median = median(log(range), na.rm = TRUE), lower_89 = quantile(log(range),
probs = 0.055, na.rm = TRUE), upper_89 = quantile(log(range), probs = 0.945,
na.rm = TRUE), lower_95 = quantile(log(range), probs = 0.025, na.rm = TRUE),
upper_95 = quantile(log(range), probs = 0.975, na.rm = TRUE)) %>%
# Transform to a format suitable for ggplot annotation
pivot_longer(cols = -study_motivation, names_to = "stat", values_to = "value") %>%
dplyr::mutate(vaule_raw = format(exp(value), scientific = FALSE))
Range of does
env_min_conc_fig <- EIPAAB_database %>%
dplyr::filter(study_motivation == "Environmental") %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L") %>%
ggplot(aes(x = log(compound_min_dose_std))) + stat_slab(aes(alpha = 0.8, linewidth = 1.5)) +
stat_pointinterval(point_interval = "median_qi", position = position_dodge(width = 0.4,
preserve = "single"), .width = c(0.89, 0.95)) + theme_classic() + theme(legend.position = "none")
env_min_conc_fig
A summary table so we can see what the corresponding raw values are in the plot
env_conc_summary <- EIPAAB_database %>%
filter(study_motivation == "Environmental", compound_min_dose_unit_std == "ug/L") %>%
summarise(median = median(log(compound_min_dose_std), na.rm = TRUE), lower_89 = quantile(log(compound_min_dose_std),
probs = 0.055, na.rm = TRUE), upper_89 = quantile(log(compound_min_dose_std),
probs = 0.945, na.rm = TRUE), lower_95 = quantile(log(compound_min_dose_std),
probs = 0.025, na.rm = TRUE), upper_95 = quantile(log(compound_min_dose_std),
probs = 0.975, na.rm = TRUE)) %>%
# Transform to a format suitable for ggplot annotation
pivot_longer(cols = everything(), names_to = "stat", values_to = "value") %>%
dplyr::mutate(vaule_raw = format(exp(value), scientific = FALSE))
env_conc_summary %>%
gt()
stat | value | vaule_raw |
---|---|---|
median | -0.005025168 | 0.9949874371 |
lower_89 | -6.907755279 | 0.0010000000 |
upper_89 | 7.793933105 | 2425.8399279662 |
lower_95 | -7.489709151 | 0.0005588055 |
upper_95 | 9.674172499 | 15901.5600424659 |
Where the exposure itself was conducted
EIPAAB_database %>%
dplyr::group_by(compound_exposure_location) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(total = sum(n), perecent = round(n/total * 100, 1)) %>%
gt()
compound_exposure_location | n | total | perecent |
---|---|---|---|
Indoor laboratory setting or assumed indoors | 1729 | 1739 | 99.4 |
Outdoor natural setting | 4 | 1739 | 0.2 |
Outdoor restricted setting (cannot interact with wild species) | 6 | 1739 | 0.3 |
First I will make a new variable called beahv_catgory_n, which will look at how many of our 10 broad behavioural categories were measured in the article.
The 10 over-arching categories were: (1) movement and locomotion, (2) pre-mating and mating behaviour, (3) post-mating behaviour, (4) aggression, (5) sociality, (6) cognition and learning, (7) anxiety and boldness, (8) foraging and feeding, (9) antipredator behaviour, and (10) other behaviours not categorised
This will take the some of all the behaviour categories., so can range from 1 to 10 for a single behavioural category to all categories.
EIPAAB_database <- EIPAAB_database %>%
dplyr::mutate(behav_category_n = rowSums(across(starts_with("behav_") & ends_with("_boolean"))))
The majority of evidence seems to be based on a single behavioural category
EIPAAB_database %>%
dplyr::group_by(behav_category_n) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(percent = round(n/sum(n) * 100, 1)) %>%
gt()
behav_category_n | n | percent |
---|---|---|
1 | 1205 | 69.3 |
2 | 400 | 23.0 |
3 | 115 | 6.6 |
4 | 16 | 0.9 |
5 | 2 | 0.1 |
7 | 1 | 0.1 |
Is this the same by study motivation
EIPAAB_database %>%
dplyr::group_by(behav_category_n, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
gt()
behav_category_n | study_motivation | n | total_motivation | percent |
---|---|---|---|---|
1 | Environmental | 593 | 858 | 69.1 |
1 | Medical | 378 | 518 | 73.0 |
1 | Basic research | 234 | 363 | 64.5 |
2 | Environmental | 180 | 858 | 21.0 |
2 | Medical | 110 | 518 | 21.2 |
2 | Basic research | 110 | 363 | 30.3 |
3 | Environmental | 75 | 858 | 8.7 |
3 | Medical | 25 | 518 | 4.8 |
3 | Basic research | 15 | 363 | 4.1 |
4 | Environmental | 8 | 858 | 0.9 |
4 | Medical | 4 | 518 | 0.8 |
4 | Basic research | 4 | 363 | 1.1 |
5 | Environmental | 2 | 858 | 0.2 |
7 | Medical | 1 | 518 | 0.2 |
Here I make a dataframe where I have pivoted the data to long formate based on each of the 10 behaviour categories. This data frame can be used to ask more specific questions about the relationship between species, compound, and behaviour.
But first let’s use it to see what behaviours are most common overall all, and within each study motivation.
I have ploted the first 10 columns as an example of what this looks like
binary_behav <- EIPAAB_database %>%
dplyr::select((starts_with("behav_") & ends_with("_boolean"))) %>%
colnames()
PICO_long <- EIPAAB_database %>%
tidyr::pivot_longer(., cols = all_of(binary_behav), names_to = "behav_category",
values_to = "value") %>%
dplyr::select(article_id, study_motivation, species_name, species_class, compound_name,
compound_atc_level_3, behav_category, value) %>%
dplyr::mutate(behav_category = behav_category %>%
str_remove("behav_") %>%
str_remove("_boolean"))
PICO_long %>%
head() %>%
gt()
article_id | study_motivation | species_name | species_class | compound_name | compound_atc_level_3 | behav_category | value |
---|---|---|---|---|---|---|---|
236660465 | Environmental | Danio rerio | Actinopterygii | Buspirone | n05b anxiolytics | movement | 0 |
236660465 | Environmental | Danio rerio | Actinopterygii | Buspirone | n05b anxiolytics | boldness | 1 |
236660465 | Environmental | Danio rerio | Actinopterygii | Buspirone | n05b anxiolytics | foraging | 0 |
236660465 | Environmental | Danio rerio | Actinopterygii | Buspirone | n05b anxiolytics | antipredator | 0 |
236660465 | Environmental | Danio rerio | Actinopterygii | Buspirone | n05b anxiolytics | mating | 0 |
236660465 | Environmental | Danio rerio | Actinopterygii | Buspirone | n05b anxiolytics | post_mating | 0 |
Overall use of behaviour
behav_overall <- PICO_long %>%
dplyr::group_by(behav_category) %>%
reframe(n = sum(value)) %>%
dplyr::mutate(percent = round(n/sum(n) * 100, 1)) %>%
dplyr::arrange(desc(n))
behav_overall %>%
gt()
behav_category | n | percent |
---|---|---|
movement | 983 | 40.4 |
boldness | 567 | 23.3 |
foraging | 190 | 7.8 |
agression | 145 | 6.0 |
sociality | 143 | 5.9 |
mating | 122 | 5.0 |
noncat | 96 | 3.9 |
cognition | 90 | 3.7 |
antipredator | 85 | 3.5 |
post_mating | 10 | 0.4 |
By study motivation
behav_motivation <- PICO_long %>%
dplyr::group_by(behav_category, study_motivation) %>%
dplyr::summarise(n = sum(value), .groups = "drop") %>%
tidyr::complete(behav_category, study_motivation, fill = list(n = 0)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation) %>%
dplyr::arrange(desc(study_motivation))
behav_overall <- behav_motivation %>%
dplyr::group_by(behav_category) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(percent = round(n/sum(n) * 100, 1), study_motivation = "Overall")
behav_motivation <- rbind(behav_overall, behav_motivation)
motivation_colour_theme <- c("grey", "#60BD6C", "#D359A1", "#3C82C4")
behav_order <- c("movement", "boldness", "foraging", "antipredator", "mating", "post_mating",
"agression", "sociality", "cognition", "noncat")
study_motivation_order <- c("Overall", "Environmental", "Medical", "Basic research")
behav_motivation_fig <- behav_motivation %>%
dplyr::mutate(behav_category = factor(behav_category, levels = rev(behav_order)),
study_motivation = factor(study_motivation, levels = study_motivation_order)) %>%
ggplot(aes(x = behav_category, y = percent, colour = study_motivation, fill = study_motivation),
group = study_motivation) + geom_col(width = 0.1, colour = NA) + geom_point(size = 3) +
geom_text(aes(label = percent), vjust = -0.3, size = 3.5, color = "black") +
theme_classic() + facet_grid(cols = vars(study_motivation)) + scale_colour_manual(values = motivation_colour_theme,
name = "Study motivation") + scale_fill_manual(values = motivation_colour_theme,
name = "Study motivation") + coord_flip() + labs(x = "", y = "Percentage") +
theme(plot.title = element_text(size = 11))
behav_motivation_fig
Save the figure
# setwd(figures_path) ggsave('behav_motivation_fig.pdf', plot =
# behav_motivation_fig, width = 10, height = 5)
behav_select <- EIPAAB_database %>%
dplyr::select((starts_with("behav_") & !ends_with("_boolean") & !ends_with("is_social_context") &
!ends_with("test_location") & !ends_with("category_n"))) %>%
colnames()
behav_sub_cat_long <- EIPAAB_database %>%
tidyr::pivot_longer(., cols = all_of(behav_select), names_to = "parent_category",
values_to = "sub_category") %>%
dplyr::select(article_id, study_motivation, species_name, species_class, compound_name,
compound_atc_level_3, parent_category, sub_category) %>%
dplyr::mutate(parent_category = parent_category %>%
str_remove("behav_")) %>%
tidyr::separate_rows(sub_category, sep = ";") %>%
dplyr::filter(!is.na(sub_category))
Now a summary data file with sub-categories.
I have plotted the first 10 rows as an example
behav_sub_cat_summary <- behav_sub_cat_long %>%
dplyr::group_by(study_motivation, parent_category, sub_category) %>%
dplyr::summarise(n_sub_cat = n(), .groups = "drop") %>%
dplyr::group_by(study_motivation, parent_category) %>%
dplyr::mutate(n_parent = sum(n_sub_cat)) %>%
dplyr::ungroup() %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(n_motivation = sum(n_sub_cat)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent_sub_cat = n_sub_cat/n_parent, percent_parent = n_parent/n_motivation)
behav_sub_cat_summary %>%
head() %>%
gt()
study_motivation | parent_category | sub_category | n_sub_cat | n_parent | n_motivation | percent_sub_cat | percent_parent |
---|---|---|---|---|---|---|---|
Environmental | agression | aggression towards a live competitor free to interact | 27 | 70 | 1511 | 0.38571429 | 0.04632694 |
Environmental | agression | aggression towards a live competitor with physical barrier | 10 | 70 | 1511 | 0.14285714 | 0.04632694 |
Environmental | agression | aggression towards a mirror | 20 | 70 | 1511 | 0.28571429 | 0.04632694 |
Environmental | agression | aggression towards a model or video | 7 | 70 | 1511 | 0.10000000 | 0.04632694 |
Environmental | agression | locomotor activity within this context | 6 | 70 | 1511 | 0.08571429 | 0.04632694 |
Environmental | antipredator | locomotor activity within this context | 22 | 100 | 1511 | 0.22000000 | 0.06618134 |
Formating the data for a ring plot.
ring_plot_subcat <- behav_sub_cat_summary %>%
dplyr::group_by(study_motivation, parent_category) %>%
dplyr::mutate(ymax = cumsum(percent_sub_cat), ymin = lag(ymax, 1), ymin = if_else(is.na(ymin),
0, ymin), labelPosition = (ymax + ymin)/2, label = paste0(sub_category, "\n (n = ",
n_sub_cat, ")")) %>%
dplyr::ungroup()
First making a complete dataset (adding zeros for missing sub-categories. in each motivation), and ordering by overall prevalence of sub-categories.
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "movement") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
movement_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "movement") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0,
n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_movement_subcat_fig <- movement_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat, 3) * 100) %>%
ggplot(aes(x = sub_category, y = percent_sub_cat, colour = study_motivation,
fill = study_motivation, group = study_motivation)) + geom_col(position = position_dodge(width = 0.8),
width = 0.1) + geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust = -0.6, size = 3.5, color = "black",
position = position_dodge(width = 0.8)) + scale_colour_manual(values = motivation_colour_theme,
name = "Study motivation") + scale_fill_manual(values = motivation_colour_theme,
name = "Study motivation") + theme_classic() + coord_flip() + labs(x = "", y = "Percentage of data") +
theme(legend.position = "none")
beh_movement_subcat_fig
Save the figure
# n_subcat <- movement_subcat %>% dplyr::distinct(sub_category) %>% nrow(.)/10
# setwd(figures_path) ggsave('beh_movement_subcat_fig.pdf', plot =
# beh_movement_subcat_fig, width = 10, height = 11.7*n_subcat)
If you would like to make a doughnut chart here’s the code. However, for categories that have 5 or more sub-categories like movement I don’t think this is the clearest way to present the data.
movement_subcat %>%
dplyr::arrange(sub_category) %>%
dplyr::arrange(study_motivation) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(ymax = cumsum(percent_sub_cat), ymin = lag(ymax, 1), ymin = if_else(is.na(ymin),
0, ymin), labelPosition = (ymax + ymin)/2, label = if_else(n_sub_cat == 0,
NA, n_sub_cat)) %>%
dplyr::ungroup() %>%
ggplot(aes(ymax = ymax, ymin = ymin, xmax = 4, xmin = 3, fill = sub_category)) +
geom_rect() + coord_polar(theta = "y") + geom_label(x = 4, aes(y = labelPosition,
label = label), size = 3, alpha = 0.8) + facet_wrap(~study_motivation) + xlim(c(2,
5)) + theme_void() + theme(legend.position = "bottom")
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "boldness") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
boldness_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "boldness") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0,
n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_boldness_subcat_fig <- boldness_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat, 3) * 100) %>%
ggplot(aes(x = sub_category, y = percent_sub_cat, colour = study_motivation,
fill = study_motivation, group = study_motivation)) + geom_col(position = position_dodge(width = 0.8),
width = 0.1) + geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust = -0.6, size = 3.5, color = "black",
position = position_dodge(width = 0.8)) + scale_colour_manual(values = motivation_colour_theme,
name = "Study motivation") + scale_fill_manual(values = motivation_colour_theme,
name = "Study motivation") + theme_classic() + coord_flip() + labs(x = "", y = "Percentage of data") +
theme(legend.position = "none")
beh_boldness_subcat_fig
Save the figure
# n_subcat <- boldness_subcat %>% dplyr::distinct(sub_category) %>% nrow(.)/10
# setwd(figures_path) ggsave('beh_boldness_subcat_fig.pdf', plot =
# beh_boldness_subcat_fig, width = 10, height = 11.7*n_subcat)
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "foraging") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
foraging_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "foraging") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0,
n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_foraging_subcat_fig <- foraging_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat, 3) * 100) %>%
ggplot(aes(x = sub_category, y = percent_sub_cat, colour = study_motivation,
fill = study_motivation, group = study_motivation)) + geom_col(position = position_dodge(width = 0.8),
width = 0.1) + geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust = -0.6, size = 3.5, color = "black",
position = position_dodge(width = 0.8)) + scale_colour_manual(values = motivation_colour_theme,
name = "Study motivation") + scale_fill_manual(values = motivation_colour_theme,
name = "Study motivation") + theme_classic() + coord_flip() + labs(x = "", y = "Percentage of data") +
theme(legend.position = "none")
beh_foraging_subcat_fig
Save the figure
# n_subcat <- foraging_subcat %>% dplyr::distinct(sub_category) %>% nrow(.)/10
# setwd(figures_path) ggsave('beh_foraging_subcat_fig.pdf', plot =
# beh_foraging_subcat_fig, width = 10, height = 11.7*n_subcat)
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "antipredator") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
antipredator_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "antipredator") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0,
n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_antipredator_subcat_fig <- antipredator_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat, 3) * 100) %>%
ggplot(aes(x = sub_category, y = percent_sub_cat, colour = study_motivation,
fill = study_motivation, group = study_motivation)) + geom_col(position = position_dodge(width = 0.8),
width = 0.1) + geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust = -0.6, size = 3.5, color = "black",
position = position_dodge(width = 0.8)) + scale_colour_manual(values = motivation_colour_theme,
name = "Study motivation") + scale_fill_manual(values = motivation_colour_theme,
name = "Study motivation") + theme_classic() + coord_flip() + labs(x = "", y = "Percentage of data") +
theme(legend.position = "none")
beh_antipredator_subcat_fig
Save figure
# n_subcat <- antipredator_subcat %>% dplyr::distinct(sub_category) %>%
# nrow(.)/10 setwd(figures_path) ggsave('beh_antipredator_subcat_fig.pdf', plot
# = beh_antipredator_subcat_fig, width = 10, height = 11.7*n_subcat)
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "mating") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
mating_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "mating") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0,
n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_mating_subcat_fig <- mating_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat, 3) * 100) %>%
ggplot(aes(x = sub_category, y = percent_sub_cat, colour = study_motivation,
fill = study_motivation, group = study_motivation)) + geom_col(position = position_dodge(width = 0.8),
width = 0.1) + geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust = -0.6, size = 3.5, color = "black",
position = position_dodge(width = 0.8)) + scale_colour_manual(values = motivation_colour_theme,
name = "Study motivation") + scale_fill_manual(values = motivation_colour_theme,
name = "Study motivation") + theme_classic() + coord_flip() + labs(x = "", y = "Percentage of data") +
theme(legend.position = "none")
beh_mating_subcat_fig
Save the figure
# n_subcat <- mating_subcat %>% dplyr::distinct(sub_category) %>% nrow(.)/10
# setwd(figures_path) ggsave('beh_mating_subcat_fig.pdf', plot =
# beh_mating_subcat_fig, width = 10, height = 11.7*n_subcat)
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "post_mating") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
post_mating_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "post_mating") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0,
n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_post_mating_subcat_fig <- post_mating_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat, 3) * 100) %>%
ggplot(aes(x = sub_category, y = percent_sub_cat, colour = study_motivation,
fill = study_motivation, group = study_motivation)) + geom_col(position = position_dodge(width = 0.8),
width = 0.1) + geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust = -0.6, size = 3.5, color = "black",
position = position_dodge(width = 0.8)) + scale_colour_manual(values = motivation_colour_theme,
name = "Study motivation") + scale_fill_manual(values = motivation_colour_theme,
name = "Study motivation") + theme_classic() + coord_flip() + labs(x = "", y = "Percentage of data") +
theme(legend.position = "none")
beh_post_mating_subcat_fig
Save the figure
# n_subcat <- post_mating_subcat %>% dplyr::distinct(sub_category) %>%
# nrow(.)/10 setwd(figures_path) ggsave('beh_post_mating_subcat_fig.pdf', plot
# = beh_post_mating_subcat_fig, width = 10, height = 11.7*n_subcat)
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "agression") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
agression_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "agression") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0,
n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_agression_subcat_fig <- agression_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat, 3) * 100) %>%
ggplot(aes(x = sub_category, y = percent_sub_cat, colour = study_motivation,
fill = study_motivation, group = study_motivation)) + geom_col(position = position_dodge(width = 0.8),
width = 0.1) + geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust = -0.6, size = 3.5, color = "black",
position = position_dodge(width = 0.8)) + scale_colour_manual(values = motivation_colour_theme,
name = "Study motivation") + scale_fill_manual(values = motivation_colour_theme,
name = "Study motivation") + theme_classic() + coord_flip() + labs(x = "", y = "Percentage of data") +
theme(legend.position = "none")
beh_agression_subcat_fig
Save the figure
# n_subcat <- agression_subcat %>% dplyr::distinct(sub_category) %>% nrow(.)/10
# setwd(figures_path) ggsave('beh_agression_subcat_fig.pdf', plot =
# beh_agression_subcat_fig, width = 10, height = 11.7*n_subcat)
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "cognition") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
cognition_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "cognition") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0,
n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_cognition_subcat_fig <- cognition_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat, 3) * 100) %>%
ggplot(aes(x = sub_category, y = percent_sub_cat, colour = study_motivation,
fill = study_motivation, group = study_motivation)) + geom_col(position = position_dodge(width = 0.8),
width = 0.1) + geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust = -0.6, size = 3.5, color = "black",
position = position_dodge(width = 0.8)) + scale_colour_manual(values = motivation_colour_theme,
name = "Study motivation") + scale_fill_manual(values = motivation_colour_theme,
name = "Study motivation") + theme_classic() + coord_flip() + labs(x = "", y = "Percentage of data") +
theme(legend.position = "none")
beh_cognition_subcat_fig
Save the figure
# n_subcat <- cognition_subcat %>% dplyr::distinct(sub_category) %>% nrow(.)/10
# setwd(figures_path) ggsave('beh_cognition_subcat_fig.pdf', plot =
# beh_cognition_subcat_fig, width = 10, height = 11.7*n_subcat)
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "noncat") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
noncat_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "noncat") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0,
n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_noncat_subcat_fig <- noncat_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat, 3) * 100) %>%
ggplot(aes(x = sub_category, y = percent_sub_cat, colour = study_motivation,
fill = study_motivation, group = study_motivation)) + geom_col(position = position_dodge(width = 0.8),
width = 0.1) + geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust = -0.6, size = 3.5, color = "black",
position = position_dodge(width = 0.8)) + scale_colour_manual(values = motivation_colour_theme,
name = "Study motivation") + scale_fill_manual(values = motivation_colour_theme,
name = "Study motivation") + theme_classic() + coord_flip() + labs(x = "", y = "Percentage of data") +
theme(legend.position = "none")
beh_noncat_subcat_fig
Save the figure
# n_subcat <- noncat_subcat %>% dplyr::distinct(sub_category) %>% nrow(.)/10
# setwd(figures_path) ggsave('beh_noncat_subcat_fig.pdf', plot =
# beh_noncat_subcat_fig, width = 10, height = 11.7*n_subcat)
Check where behaviour was measured.
behav_location_summary <- EIPAAB_database %>%
dplyr::group_by(study_motivation, behav_test_location) %>%
dplyr::reframe(n = n()) %>%
tidyr::separate_rows(behav_test_location, sep = ";") %>%
dplyr::group_by(study_motivation, behav_test_location) %>%
dplyr::reframe(n = sum(n)) %>%
tidyr::complete(behav_test_location, study_motivation, fill = list(n = 0)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
behav_location_overall <- behav_location_summary %>%
dplyr::group_by(behav_test_location) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(percent = round(n/sum(n) * 100, 1), study_motivation = "Overall")
behav_location_summary <- rbind(behav_location_overall, behav_location_summary) %>%
dplyr::arrange(study_motivation)
behav_location_summary %>%
gt()
behav_test_location | n | percent | study_motivation |
---|---|---|---|
indoor laboratory setting or assumed indoors | 363 | 99.7 | Basic research |
outdoor natural setting | 1 | 0.3 | Basic research |
outdoor restricted setting (cannot interact with wild species) | 0 | 0.0 | Basic research |
indoor laboratory setting or assumed indoors | 852 | 98.7 | Environmental |
outdoor natural setting | 8 | 0.9 | Environmental |
outdoor restricted setting (cannot interact with wild species) | 3 | 0.3 | Environmental |
indoor laboratory setting or assumed indoors | 518 | 99.6 | Medical |
outdoor natural setting | 1 | 0.2 | Medical |
outdoor restricted setting (cannot interact with wild species) | 1 | 0.2 | Medical |
indoor laboratory setting or assumed indoors | 1733 | 99.2 | Overall |
outdoor natural setting | 10 | 0.6 | Overall |
outdoor restricted setting (cannot interact with wild species) | 4 | 0.2 | Overall |
Check how often behaviour was meassured in a social context
behav_behav_scoring_summary <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(study_motivation, validity_behav_scoring_method) %>%
dplyr::reframe(n = n()) %>%
tidyr::separate_rows(validity_behav_scoring_method, sep = ";") %>%
dplyr::group_by(study_motivation, validity_behav_scoring_method) %>%
dplyr::reframe(n = sum(n)) %>%
tidyr::complete(validity_behav_scoring_method, study_motivation, fill = list(n = 0)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
behav_behav_scoring_overall <- behav_behav_scoring_summary %>%
dplyr::group_by(validity_behav_scoring_method) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(percent = round(n/sum(n) * 100, 1), study_motivation = "Overall")
behav_behav_scoring_summary <- rbind(behav_behav_scoring_overall, behav_behav_scoring_summary) %>%
dplyr::arrange(study_motivation)
behav_behav_scoring_summary %>%
dplyr::filter(study_motivation == "Overall") %>%
gt()
validity_behav_scoring_method | n | percent | study_motivation |
---|---|---|---|
acoustic analysis software | 1 | 0.1 | Overall |
live scoring in real time | 84 | 8.6 | Overall |
manual or human scoring from videos or image | 259 | 26.6 | Overall |
not specified | 221 | 22.7 | Overall |
other | 1 | 0.1 | Overall |
quantifying food consumption | 21 | 2.2 | Overall |
sensory for physical movement | 7 | 0.7 | Overall |
supervised automated tracking approaches | 378 | 38.9 | Overall |
Making a data frame for a flow diagram (sankey plot)
PICO_df <- EIPAAB_database %>%
dplyr::mutate(behav_cat = case_when(behav_movement_boolean == 1 ~ "Movement",
behav_boldness_boolean == 1 ~ "Boldness", behav_foraging_boolean == 1 ~ "Foraging",
behav_antipredator_boolean == 1 ~ "Antipredator", behav_mating_boolean ==
1 ~ "Mating", behav_post_mating_boolean == 1 ~ "Post mating", behav_agression_boolean ==
1 ~ "Agression", behav_sociality_boolean == 1 ~ "Sociality", behav_cognition_boolean ==
1 ~ "Cognition", behav_noncat_boolean == 1 ~ "Not categorised", )) %>%
dplyr::select(study_motivation, compound_name, compound_atc_level_3, species_name,
species_class, behav_cat)
Let’s look at the 10 most common classes and ATCs
PICO_class_atc <- PICO_df %>%
dplyr::filter(!is.na(compound_atc_level_3), !is.na(species_class)) %>%
tidyr::separate_rows(compound_atc_level_3, sep = ";") %>%
dplyr::mutate(compound_atc_level_3 = str_trim(compound_atc_level_3))
PICO_atc_10 <- PICO_class_atc %>%
dplyr::group_by(compound_atc_level_3) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::pull(compound_atc_level_3)
PICO_class_10 <- PICO_class_atc %>%
dplyr::group_by(species_class) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::pull(species_class)
PICO_class_10 <- PICO_class_atc %>%
dplyr::group_by(species_class) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::pull(species_class)
behav_cat_order <- PICO_class_atc %>%
dplyr::filter(compound_atc_level_3 %in% PICO_atc_10 & species_class %in% PICO_class_10) %>%
dplyr::group_by(behav_cat) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::pull(behav_cat)
PICO_class_atc_10 <- PICO_class_atc %>%
dplyr::filter(compound_atc_level_3 %in% PICO_atc_10 & species_class %in% PICO_class_10) %>%
dplyr::mutate(compound_atc_level_3 = factor(compound_atc_level_3, levels = PICO_atc_10),
species_class = factor(species_class, levels = PICO_class_10), behav_cat = factor(behav_cat,
levels = behav_cat_order)) %>%
dplyr::select(compound_atc_level_3, behav_cat, species_class)
PICO_atc_class_sankey <- highcharter::hchart(data_to_sankey(PICO_class_atc_10), "sankey")
PICO_atc_class_sankey
Save interactive version
# setwd(figures_path) htmlwidgets::saveWidget(widget = PICO_atc_class_sankey,
# file = 'PICO_atc_class_sankey.html')
Save static version
# setwd(figures_path) # Make a webshot in pdf : high quality but can not choose
# printed zone webshot::webshot('PICO_atc_class_sankey.html' ,
# 'PICO_atc_class_sankey.pdf', delay = 10)
Lets take a closer look at the 3 most common compounds
EIPAAB_database %>%
dplyr::group_by(compound_name) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:3) %>%
gt()
compound_name | n |
---|---|
Fluoxetine | 200 |
Diazepam | 67 |
17-alpha-ethinylestradiol | 63 |
spp_order <- PICO_df %>%
dplyr::filter(compound_name == "Fluoxetine") %>%
dplyr::group_by(species_name) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:5) %>%
pull(species_name)
beh_order <- PICO_df %>%
dplyr::filter(compound_name == "Fluoxetine" & species_name %in% spp_order) %>%
dplyr::group_by(behav_cat) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
pull(behav_cat)
PICO_fluoxetine <- PICO_df %>%
dplyr::filter(compound_name == "Fluoxetine" & species_name %in% spp_order) %>%
dplyr::mutate(species_name = factor(species_name, levels = spp_order), behav_cat = factor(behav_cat,
levels = beh_order), ) %>%
dplyr::select(species_name, behav_cat)
PICO_fluoxetine_sankey <- highcharter::hchart(data_to_sankey(PICO_fluoxetine), "sankey",
name = "PICO")
PICO_fluoxetine_sankey
# setwd(figures_path) htmlwidgets::saveWidget(widget = PICO_fluoxetine_sankey,
# file = 'PICO_fluoxetine_sankey.html')
# setwd(figures_path) # Make a webshot in pdf : high quality but can not choose
# printed zone webshot::webshot('PICO_fluoxetine_sankey.html' ,
# 'PICO_fluoxetine_sankey.pdf', delay = 10)
spp_order <- PICO_df %>%
dplyr::filter(compound_name == "Diazepam") %>%
dplyr::group_by(species_name) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:5) %>%
pull(species_name)
beh_order <- PICO_df %>%
dplyr::filter(compound_name == "Diazepam" & species_name %in% spp_order) %>%
dplyr::group_by(behav_cat) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
pull(behav_cat)
PICO_diazepam <- PICO_df %>%
dplyr::filter(compound_name == "Diazepam" & species_name %in% spp_order) %>%
dplyr::mutate(species_name = factor(species_name, levels = spp_order), behav_cat = factor(behav_cat,
levels = beh_order), ) %>%
dplyr::select(species_name, behav_cat)
PICO_diazepam_sankey <- highcharter::hchart(data_to_sankey(PICO_diazepam), "sankey",
name = "PICO")
PICO_diazepam_sankey
Save interactive plot
# setwd(figures_path) htmlwidgets::saveWidget(widget = PICO_diazepam_sankey,
# file = 'PICO_diazepam_sankey.html')
Save static plot
# setwd(figures_path) Make a webshot in pdf : high quality but can not choose
# printed zone webshot::webshot('PICO_diazepam_sankey.html' ,
# 'PICO_diazepam_sankey.pdf', delay = 10)
spp_order <- PICO_df %>%
dplyr::filter(compound_name == "17-alpha-ethinylestradiol") %>%
dplyr::group_by(species_name) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:5) %>%
pull(species_name)
beh_order <- PICO_df %>%
dplyr::filter(compound_name == "17-alpha-ethinylestradiol" & species_name %in%
spp_order) %>%
dplyr::group_by(behav_cat) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
pull(behav_cat)
PICO_EE2 <- PICO_df %>%
dplyr::filter(compound_name == "17-alpha-ethinylestradiol" & species_name %in%
spp_order) %>%
dplyr::mutate(species_name = factor(species_name, levels = spp_order), behav_cat = factor(behav_cat,
levels = beh_order), ) %>%
dplyr::select(species_name, behav_cat)
PICO_EE2_sankey <- highcharter::hchart(data_to_sankey(PICO_EE2), "sankey", name = "PICO")
PICO_EE2_sankey
Save interactive plot
# setwd(figures_path) htmlwidgets::saveWidget(widget = PICO_EE2_sankey, file =
# 'PICO_EE2_sankey.html')
Save static plot
# setwd(figures_path) #Make a webshot in pdf : high quality but can not choose
# printed zone webshot::webshot('PICO_EE2_sankey.html' , 'PICO_EE2_sankey.pdf',
# delay = 10)
Identify knowledge clusters and gaps.
We will also do this by study motivation, because the knowledge gaps will be motivation specific.
First look by species class
Making a data frame
behav_cat_class_long <- PICO_df %>%
dplyr::group_by(study_motivation, species_class, behav_cat) %>%
dplyr::reframe(count = n()) %>%
tidyr::complete(study_motivation, species_class, behav_cat, fill = list(count = 0)) %>%
dplyr::group_by(study_motivation, species_class) %>%
dplyr::mutate(total_class_motivation = sum(count)) %>%
dplyr::ungroup() %>%
dplyr::mutate(rel_percent = round(count/total_class_motivation * 100, 0), rel_percent = if_else(is.finite(rel_percent),
rel_percent, 0))
class_order <- behav_cat_class_long %>%
dplyr::group_by(species_class) %>%
dplyr::reframe(n = sum(count)) %>%
dplyr::arrange(n) %>%
dplyr::pull(species_class)
behav_cat_order <- behav_cat_class_long %>%
dplyr::group_by(behav_cat) %>%
dplyr::reframe(n = sum(count)) %>%
dplyr::arrange(desc(n)) %>%
dplyr::pull(behav_cat)
behav_cat_class_long <- behav_cat_class_long %>%
dplyr::mutate(species_class = factor(species_class, levels = class_order), behav_cat = factor(behav_cat,
levels = behav_cat_order))
Creating a tile plot
cust_col <- colorRampPalette(c("#FDEDF4", "#F068A7"))(30)
behav_class_hm <- behav_cat_class_long %>%
ggplot(aes(x = behav_cat, y = species_class, fill = count)) + geom_tile() + scale_fill_gradientn(colors = cust_col,
na.value = "white", limits = c(1, max(behav_cat_class_long$count, na.rm = TRUE)),
guide = "none") + theme_bw() + facet_wrap(~study_motivation) + theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + labs(x = "Behaviour", y = "Species Class", fill = "Count")
behav_class_hm
# setwd(figures_path) ggsave('behav_class_hm.pdf', plot = behav_class_hm, width
# = 8.3, height = 11.7/2)
This one uses relative vaules for each class (e.g. row in the heat map)
cust_col <- colorRampPalette(brewer.pal(4, "Oranges"))(30)
behav_class_rel_hm <- behav_cat_class_long %>%
ggplot(aes(x = behav_cat, y = species_class, fill = rel_percent)) + geom_tile() +
# geom_text(aes(label = ifelse(rel_percent == 0, NA, rel_percent)), color =
# 'black', size = 3) +
scale_fill_gradientn(colors = cust_col, na.value = "white", limits = c(1, max(behav_cat_class_long$count,
na.rm = TRUE)), guide = "none") + theme_bw() + facet_wrap(~study_motivation) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + labs(x = "Behaviour",
y = "Species Class", fill = "Count")
behav_class_rel_hm
# setwd(figures_path) ggsave('behav_class_rel_hm.pdf', plot =
# behav_class_rel_hm, width = 8.3, height = 11.7/2)
Now looking by compound
Making a data frame
behav_atc_long <- PICO_df %>%
separate_rows(compound_atc_level_3, sep = ";") %>%
dplyr::group_by(study_motivation, compound_atc_level_3, behav_cat) %>%
dplyr::reframe(count = n()) %>%
tidyr::complete(study_motivation, compound_atc_level_3, behav_cat, fill = list(count = 0)) %>%
dplyr::group_by(study_motivation, compound_atc_level_3) %>%
dplyr::mutate(total_atc_motivation = sum(count)) %>%
dplyr::ungroup() %>%
dplyr::mutate(rel_percent = round(count/total_atc_motivation * 100, 0), rel_percent = if_else(is.finite(rel_percent),
rel_percent, 0))
atc_order <- behav_atc_long %>%
dplyr::group_by(compound_atc_level_3) %>%
dplyr::reframe(n = sum(count)) %>%
dplyr::arrange(n) %>%
dplyr::pull(compound_atc_level_3)
behav_cat_order <- behav_atc_long %>%
dplyr::group_by(behav_cat) %>%
dplyr::reframe(n = sum(count)) %>%
dplyr::arrange(desc(n)) %>%
dplyr::pull(behav_cat)
behav_atc_long <- behav_atc_long %>%
dplyr::mutate(compound_atc_level_3 = factor(compound_atc_level_3, levels = atc_order),
behav_cat = factor(behav_cat, levels = behav_cat_order))
There are 132 ATC level 3 codes. So a tile plot is not going to be very informative. It could be useful if you are only interested in a few ATC groups.
behav_atc_long %>%
dplyr::distinct(compound_atc_level_3) %>%
nrow()
## [1] 132
Code for title plot if you would like to make one.
# cust_col <- colorRampPalette(c('#FDEDF4', '#F068A7'))(30) behav_atc_hm <-
# behav_atc_long %>% ggplot(aes(x = behav_cat, y = compound_atc_level_3, fill =
# count)) + geom_tile() + #geom_text(aes(label = ifelse(count == 0, NA,
# count)), color = 'black', size = 3) + scale_fill_gradientn(colors = cust_col,
# na.value = 'white', limits = c(1, max(behav_atc_long$count, na.rm = TRUE)),
# guide = 'none') + theme_void() + facet_wrap(~study_motivation) +
# theme(axis.text.x = element_text(angle = 45, hjust = 1)) + labs( x =
# 'Behaviour', y = 'Species Class', fill = 'Count' ) behav_atc_hm
Save figure
# setwd(figures_path) ggsave('behav_atc_hm.pdf', plot = behav_atc_hm, width =
# 8.3, height = 11.7/2)
How many meassured addtional biomarks
EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::group_by(additional_biomarkers) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(total = sum(n), percent = n/total) %>%
gt()
additional_biomarkers | n | total | percent |
---|---|---|---|
No | 435 | 901 | 0.4827969 |
Yes | 466 | 901 | 0.5172031 |
How many measured survival/growth/reproduction
EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::group_by(validity_survival_growth_reproduction) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(total = sum(n), percent = n/total) %>%
gt()
validity_survival_growth_reproduction | n | total | percent |
---|---|---|---|
No | 543 | 901 | 0.6026637 |
Yes | 358 | 901 | 0.3973363 |
There are 19 metadata columns relating to validty: “validity_guideline”, “validity_good_laboratory_practice”, “validity_survival_growth_reproduction”, “validity_animal_feeding”, “validity_water_quality”, “validity_light_cycle”, “validity_randomization”, “validity_behav_scoring_method”, “validity_behav_blinding”, “validity_conflict_statement”, “species_source”, “species_stage”, “species_sex”, “compound_min_duration_exposure”, “compound_max_duration_exposure”, “validity_compound_cas_reported”, “validity_compound_purity_reported”, “validity_compound_water_verification”, “validity_compound_animal_verification”.
guideline <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_guideline, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
guideline_all <- guideline %>%
dplyr::group_by(validity_guideline) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
guideline_all <- rbind(guideline_all, guideline) %>%
dplyr::filter(validity_guideline == "Yes")
guideline_all %>%
gt()
validity_guideline | n | study_motivation | percent |
---|---|---|---|
Yes | 135 | Overall | 15.0 |
Yes | 111 | Environmental | 21.8 |
Yes | 14 | Medical | 6.0 |
Yes | 10 | Basic research | 6.3 |
GLP <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_good_laboratory_practice, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
GLP_all <- GLP %>%
dplyr::group_by(validity_good_laboratory_practice) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
GLP_all <- rbind(GLP_all, GLP) %>%
dplyr::filter(validity_good_laboratory_practice == "Yes")
GLP_all %>%
gt()
validity_good_laboratory_practice | n | study_motivation | percent |
---|---|---|---|
Yes | 6 | Overall | 0.7 |
Yes | 5 | Environmental | 1.0 |
Yes | 1 | Medical | 0.4 |
survival_growth_reproduction <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_survival_growth_reproduction, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
survival_growth_reproduction_all <- survival_growth_reproduction %>%
dplyr::group_by(validity_survival_growth_reproduction) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
survival_growth_reproduction_all <- rbind(survival_growth_reproduction_all, survival_growth_reproduction) %>%
dplyr::filter(validity_survival_growth_reproduction == "Yes")
survival_growth_reproduction_all %>%
gt()
validity_survival_growth_reproduction | n | study_motivation | percent |
---|---|---|---|
Yes | 358 | Overall | 39.7 |
Yes | 273 | Environmental | 53.5 |
Yes | 61 | Medical | 26.2 |
Yes | 24 | Basic research | 15.2 |
CAS <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_compound_cas_reported, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
CAS_all <- CAS %>%
dplyr::group_by(validity_compound_cas_reported) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
CAS_all <- rbind(CAS_all, CAS) %>%
dplyr::filter(validity_compound_cas_reported == "Yes")
CAS_all %>%
gt()
validity_compound_cas_reported | n | study_motivation | percent |
---|---|---|---|
Yes | 223 | Overall | 24.8 |
Yes | 187 | Environmental | 36.7 |
Yes | 25 | Medical | 10.7 |
Yes | 11 | Basic research | 7.0 |
purity <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_compound_purity_reported, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
purity_all <- purity %>%
dplyr::group_by(validity_compound_purity_reported) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
purity_all <- rbind(purity_all, purity) %>%
dplyr::filter(validity_compound_purity_reported == "Yes")
purity_all %>%
gt()
validity_compound_purity_reported | n | study_motivation | percent |
---|---|---|---|
Yes | 229 | Overall | 25.4 |
Yes | 200 | Environmental | 39.2 |
Yes | 20 | Medical | 8.6 |
Yes | 9 | Basic research | 5.7 |
stage <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(species_stage, study_motivation) %>%
dplyr::reframe(n = n()) %>%
tidyr::separate_rows(species_stage, sep = ";") %>%
dplyr::group_by(species_stage, study_motivation) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
stage_all <- stage %>%
dplyr::group_by(species_stage) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
stage_all <- rbind(stage_all, stage) %>%
dplyr::filter(species_stage == "Unknown or not specified") %>%
dplyr::mutate(percent_reported = 100 - percent)
stage_all %>%
gt()
species_stage | n | study_motivation | percent | percent_reported |
---|---|---|---|---|
Unknown or not specified | 166 | Overall | 16.6 | 83.4 |
Unknown or not specified | 100 | Environmental | 17.1 | 82.9 |
Unknown or not specified | 22 | Medical | 9.0 | 91.0 |
Unknown or not specified | 44 | Basic research | 26.5 | 73.5 |
sex <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(species_sex, study_motivation) %>%
dplyr::reframe(n = n()) %>%
tidyr::separate_rows(species_sex, sep = ";") %>%
dplyr::group_by(species_sex, study_motivation) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
sex_all <- sex %>%
dplyr::group_by(species_sex) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
sex_all <- rbind(sex_all, sex) %>%
dplyr::filter(species_sex == "Unknown or not specified") %>%
dplyr::mutate(percent_reported = 100 - percent)
sex_all %>%
gt()
species_sex | n | study_motivation | percent | percent_reported |
---|---|---|---|---|
Unknown or not specified | 546 | Overall | 46.5 | 53.5 |
Unknown or not specified | 325 | Environmental | 50.4 | 49.6 |
Unknown or not specified | 132 | Medical | 41.0 | 59.0 |
Unknown or not specified | 89 | Basic research | 43.2 | 56.8 |
source <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(species_source, study_motivation) %>%
dplyr::reframe(n = n()) %>%
tidyr::separate_rows(species_source, sep = ";") %>%
dplyr::group_by(species_source, study_motivation) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
source_all <- source %>%
dplyr::group_by(species_source) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
source_all <- rbind(source_all, source) %>%
dplyr::filter(species_source == "Not reported") %>%
dplyr::mutate(percent_reported = 100 - percent)
source_all %>%
gt()
species_source | n | study_motivation | percent | percent_reported |
---|---|---|---|---|
Not reported | 148 | Overall | 15.6 | 84.4 |
Not reported | 72 | Environmental | 13.1 | 86.9 |
Not reported | 51 | Medical | 21.2 | 78.8 |
Not reported | 25 | Basic research | 15.4 | 84.6 |
feeding <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_animal_feeding, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
feeding_all <- feeding %>%
dplyr::group_by(validity_animal_feeding) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
feeding_all <- rbind(feeding_all, feeding) %>%
dplyr::filter(validity_animal_feeding == "Yes")
feeding_all %>%
gt()
validity_animal_feeding | n | study_motivation | percent |
---|---|---|---|
Yes | 716 | Overall | 79.5 |
Yes | 430 | Environmental | 84.3 |
Yes | 159 | Medical | 68.2 |
Yes | 127 | Basic research | 80.4 |
water <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_water_quality, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
water_all <- water %>%
dplyr::group_by(validity_water_quality) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
water_all <- rbind(water_all, water) %>%
dplyr::filter(validity_water_quality == "Yes")
water_all %>%
gt()
validity_water_quality | n | study_motivation | percent |
---|---|---|---|
Yes | 806 | Overall | 89.5 |
Yes | 473 | Environmental | 92.7 |
Yes | 206 | Medical | 88.4 |
Yes | 127 | Basic research | 80.4 |
light <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_light_cycle, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
light_all <- light %>%
dplyr::group_by(validity_light_cycle) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
light_all <- rbind(light_all, light) %>%
dplyr::filter(validity_light_cycle == "Yes")
light_all %>%
gt()
validity_light_cycle | n | study_motivation | percent |
---|---|---|---|
Yes | 756 | Overall | 83.9 |
Yes | 429 | Environmental | 84.1 |
Yes | 200 | Medical | 85.8 |
Yes | 127 | Basic research | 80.4 |
min_duration <- EIPAAB_database %>%
dplyr::group_by(compound_min_duration_exposure, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
min_duration_all <- min_duration %>%
dplyr::group_by(compound_min_duration_exposure) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
min_duration_all <- rbind(min_duration_all, min_duration) %>%
dplyr::filter(compound_min_duration_exposure == "Not stated") %>%
dplyr::mutate(percent_reported = 100 - percent)
min_duration_all %>%
gt()
compound_min_duration_exposure | n | study_motivation | percent | percent_reported |
---|---|---|---|---|
Not stated | 102 | Overall | 5.9 | 94.1 |
Not stated | 38 | Environmental | 4.4 | 95.6 |
Not stated | 52 | Medical | 10.0 | 90.0 |
Not stated | 12 | Basic research | 3.3 | 96.7 |
max_duration <- EIPAAB_database %>%
dplyr::group_by(compound_max_duration_exposure, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
max_duration_all <- max_duration %>%
dplyr::group_by(compound_max_duration_exposure) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
max_duration_all <- rbind(max_duration_all, max_duration) %>%
dplyr::filter(compound_max_duration_exposure == "Not stated") %>%
dplyr::mutate(percent_reported = 100 - percent)
max_duration_all %>%
gt()
compound_max_duration_exposure | n | study_motivation | percent | percent_reported |
---|---|---|---|---|
Not stated | 96 | Overall | 5.5 | 94.5 |
Not stated | 32 | Environmental | 3.7 | 96.3 |
Not stated | 53 | Medical | 10.2 | 89.8 |
Not stated | 11 | Basic research | 3.0 | 97.0 |
water_verification <- EIPAAB_database %>%
dplyr::filter(!is.na(validity_compound_water_verification)) %>%
dplyr::group_by(validity_compound_water_verification, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
water_verification_all <- water_verification %>%
dplyr::group_by(validity_compound_water_verification) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
water_verification_all <- rbind(water_verification_all, water_verification) %>%
dplyr::filter(validity_compound_water_verification == "Measured")
water_verification_all %>%
gt()
validity_compound_water_verification | n | study_motivation | percent |
---|---|---|---|
Measured | 313 | Overall | 20.6 |
Measured | 295 | Environmental | 35.8 |
Measured | 10 | Medical | 2.5 |
Measured | 8 | Basic research | 2.7 |
tissue_verification <- EIPAAB_database %>%
# dplyr::filter(!is.na(validity_compound_animal_verification)) %>%
dplyr::group_by(validity_compound_animal_verification, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
tissue_verification_all <- tissue_verification %>%
dplyr::group_by(validity_compound_animal_verification) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
tissue_verification_all <- rbind(tissue_verification_all, tissue_verification) %>%
dplyr::filter(validity_compound_animal_verification == "Yes")
tissue_verification_all %>%
gt()
validity_compound_animal_verification | n | study_motivation | percent |
---|---|---|---|
Yes | 154 | Overall | 8.9 |
Yes | 115 | Environmental | 13.4 |
Yes | 22 | Medical | 4.2 |
Yes | 17 | Basic research | 4.7 |
randomization <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_randomization, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
randomization_all <- randomization %>%
dplyr::group_by(validity_randomization) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
randomization_all <- rbind(randomization_all, randomization) %>%
dplyr::filter(validity_randomization == "Yes") %>%
dplyr::mutate(percent_disclosed = 100 - percent)
randomization_all %>%
gt()
validity_randomization | n | study_motivation | percent | percent_disclosed |
---|---|---|---|---|
Yes | 362 | Overall | 40.2 | 59.8 |
Yes | 229 | Environmental | 44.9 | 55.1 |
Yes | 75 | Medical | 32.2 | 67.8 |
Yes | 58 | Basic research | 36.7 | 63.3 |
blinding <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_behav_blinding, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
blinding_all <- blinding %>%
dplyr::group_by(validity_behav_blinding) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
blinding_all <- rbind(blinding_all, blinding) %>%
dplyr::filter(validity_behav_blinding == "Yes")
blinding_all %>%
gt()
validity_behav_blinding | n | study_motivation | percent |
---|---|---|---|
Yes | 153 | Overall | 17.0 |
Yes | 75 | Environmental | 14.7 |
Yes | 44 | Medical | 18.9 |
Yes | 34 | Basic research | 21.5 |
behav_scoring <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
tidyr::separate_rows(validity_behav_scoring_method, sep = ";") %>%
dplyr::group_by(validity_behav_scoring_method, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
behav_scoring_all <- behav_scoring %>%
dplyr::group_by(validity_behav_scoring_method) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
behav_scoring_all <- rbind(behav_scoring_all, behav_scoring) %>%
dplyr::filter(validity_behav_scoring_method == "not specified") %>%
dplyr::mutate(percent_specified = 100 - percent)
behav_scoring_all %>%
gt()
validity_behav_scoring_method | n | study_motivation | percent | percent_specified |
---|---|---|---|---|
not specified | 221 | Overall | 22.7 | 77.3 |
not specified | 130 | Environmental | 24.0 | 76.0 |
not specified | 56 | Medical | 21.5 | 78.5 |
not specified | 35 | Basic research | 20.6 | 79.4 |
conflict <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_conflict_statement, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation * 100, 1)) %>%
dplyr::select(-total_motivation)
conflict_all <- conflict %>%
dplyr::group_by(validity_conflict_statement) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall", percent = round(n/sum(n) * 100, 1))
conflict_all <- rbind(conflict_all, conflict) %>%
dplyr::filter(validity_conflict_statement == "No statement is made in the paper") %>%
dplyr::mutate(percent_specified = 100 - percent)
conflict_all %>%
gt()
validity_conflict_statement | n | study_motivation | percent | percent_specified |
---|---|---|---|---|
No statement is made in the paper | 407 | Overall | 45.2 | 54.8 |
No statement is made in the paper | 254 | Environmental | 49.8 | 50.2 |
No statement is made in the paper | 65 | Medical | 27.9 | 72.1 |
No statement is made in the paper | 88 | Basic research | 55.7 | 44.3 |
# pander for making it look nicer
sessionInfo() %>%
pander()
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale: LC_COLLATE=English_Australia.utf8, LC_CTYPE=English_Australia.utf8, LC_MONETARY=English_Australia.utf8, LC_NUMERIC=C and LC_TIME=English_Australia.utf8
attached base packages: stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: readxl(v.1.4.3), gt(v.0.11.0), here(v.1.0.1), pander(v.0.6.5), highcharter(v.0.9.4), ggdist(v.3.3.2), gridExtra(v.2.3), ape(v.5.8), treeio(v.1.22.0), ggtree(v.3.6.2), RColorBrewer(v.1.1-3), ggrepel(v.0.9.3), igraph(v.1.6.0), ggraph(v.2.1.0), lubridate(v.1.9.2), forcats(v.1.0.0), stringr(v.1.5.1), dplyr(v.1.1.1), purrr(v.1.0.1), readr(v.2.1.4), tidyr(v.1.3.0), tibble(v.3.2.1), ggplot2(v.3.5.1) and tidyverse(v.2.0.0)
loaded via a namespace (and not attached): nlme(v.3.1-162), fs(v.1.6.4), xts(v.0.14.0), rprojroot(v.2.0.4), tools(v.4.2.3), backports(v.1.4.1), bslib(v.0.7.0), utf8(v.1.2.3), R6(v.2.5.1), lazyeval(v.0.2.2), colorspace(v.2.1-0), withr(v.3.0.0), tidyselect(v.1.2.1), curl(v.5.2.1), compiler(v.4.2.3), cli(v.3.6.1), pacman(v.0.5.1), formatR(v.1.14), xml2(v.1.3.3), labeling(v.0.4.3), sass(v.0.4.9), scales(v.1.3.0), digest(v.0.6.31), yulab.utils(v.0.1.4), rmarkdown(v.2.21), pkgconfig(v.2.0.3), htmltools(v.0.5.8.1), highr(v.0.11), fastmap(v.1.1.1), htmlwidgets(v.1.6.2), rlang(v.1.1.0), TTR(v.0.24.4), rstudioapi(v.0.14), quantmod(v.0.4.26), gridGraphics(v.0.5-1), jquerylib(v.0.1.4), farver(v.2.1.1), generics(v.0.1.3), zoo(v.1.8-12), jsonlite(v.1.8.4), distributional(v.0.4.0), magrittr(v.2.0.3), rlist(v.0.4.6.2), ggplotify(v.0.1.2), patchwork(v.1.2.0), Rcpp(v.1.0.10), munsell(v.0.5.1), fansi(v.1.0.4), viridis(v.0.6.5), lifecycle(v.1.0.4), stringi(v.1.7.12), yaml(v.2.3.7), MASS(v.7.3-58.2), grid(v.4.2.3), parallel(v.4.2.3), lattice(v.0.20-45), graphlayouts(v.1.0.2), hms(v.1.1.3), knitr(v.1.42), pillar(v.1.9.0), codetools(v.0.2-19), glue(v.1.6.2), evaluate(v.0.24.0), ggfun(v.0.1.5), data.table(v.1.14.8), vctrs(v.0.6.1), tzdb(v.0.3.0), tweenr(v.2.0.3), cellranger(v.1.1.0), gtable(v.0.3.5), polyclip(v.1.10-6), assertthat(v.0.2.1), cachem(v.1.0.7), xfun(v.0.38), ggforce(v.0.4.1), broom(v.1.0.4), tidygraph(v.1.3.0), tidytree(v.0.4.6), viridisLite(v.0.4.2), aplot(v.0.2.3), memoise(v.2.0.1) and timechange(v.0.2.0)
10.8.2.8 Sociality plots
10.8.2.8.1 Fig S6-8
Save the figure