Skip to Content

Match scientific names with the GBIF Backbone Taxonomy

How to use inborutils function gbif_species_name_match() to match a list of scientific names against the GBIF backbone taxonomy.

Introduction

This tutorial will explain how you can match a list of scientific names against the GBIF backbone taxonomy.

It is important that you have the most recent version of inborutils installed and available:

remotes::install_github("inbo/inborutils")   # install inborutils
library(tidyverse)    # To do datascience
library(rgbif)        # To lookup names in the GBIF backbone taxonomy
library(inborutils)   # To wrap GBIF API data
library(knitr)

Read data file containing the scientific names

Read file containing the scientific names you want to check against the GBIF taxonomic backbone:

species_df <- read_csv("https://raw.githubusercontent.com/inbo/inbo-pyutils/master/gbif/gbif_name_match/sample.csv", trim_ws = TRUE, col_types = cols())

Take a look at the data:

kable(species_df)
name kingdom euConcernStatus
Alopochen aegyptiaca Animalia under consideration
Cotoneaster ganghobaensis Plantae NA
Cotoneaster hylmoei Plantae NA
Cotoneaster x suecicus Plantae NA
Euthamia graminifolia Plantae under preparation

Request taxonomic information

Given a data.frame, you can match the column containing the scientific name against GBIF Backbone Taxonomy, using the gbif_species_name_match function from the inborutils package. You need to pass a data.frame, df and a column name, name:

species_df_matched <- gbif_species_name_match(df = species_df, name = "name")
## [1] "All column names present"

As the name argument has "name" as default value, the code above is equivalent to:

species_df_matched <- gbif_species_name_match(species_df)

or using pipe %>%:

species_df_matched <- species_df_matched %>% gbif_species_name_match()

By default gbif_species_name_match returns the following GBIF fields: usageKey, scientificName, rank, order, matchType, phylum, kingdom, genus, class, confidence, synonym, status, family.

Take a look at the updated data:

kable(species_df_matched)
name kingdom euConcernStatus usageKey scientificName rank order matchType phylum kingdom1 genus class confidence synonym status family
Alopochen aegyptiaca Animalia under consideration 2498252 Alopochen aegyptiaca (Linnaeus, 1766) SPECIES Anseriformes EXACT Chordata Animalia Alopochen Aves 98 FALSE ACCEPTED Anatidae
Cotoneaster ganghobaensis Plantae NA 3025989 Cotoneaster ganghobaensis J.Fryer & B.Hylmö SPECIES Rosales EXACT Tracheophyta Plantae Cotoneaster Magnoliopsida 98 FALSE ACCEPTED Rosaceae
Cotoneaster hylmoei Plantae NA 3025758 Cotoneaster hylmoei K.E.Flinck & J.Fryer SPECIES Rosales EXACT Tracheophyta Plantae Cotoneaster Magnoliopsida 98 FALSE ACCEPTED Rosaceae
Cotoneaster x suecicus Plantae NA 3026040 Cotoneaster suecicus G.Klotz SPECIES Rosales EXACT Tracheophyta Plantae Cotoneaster Magnoliopsida 98 FALSE ACCEPTED Rosaceae
Euthamia graminifolia Plantae under preparation 3092782 Euthamia graminifolia (L.) Nutt. SPECIES Asterales EXACT Tracheophyta Plantae Euthamia Magnoliopsida 98 FALSE ACCEPTED Asteraceae

Notice that GBIF fields whose name is already used as column name are automatically renamed by adding suffix 1. In our case, input data.frame species_df contains already a column called kingdom. The GBIF kingdom values are returned in column kingdom1:

species_df_matched %>% select(kingdom, kingdom1)
## # A tibble: 5 x 2
##   kingdom  kingdom1
##   <chr>    <chr>   
## 1 Animalia Animalia
## 2 Plantae  Plantae 
## 3 Plantae  Plantae 
## 4 Plantae  Plantae 
## 5 Plantae  Plantae

You can also specify which GBIF fields you would like to have:

species_df %>% 
    gbif_species_name_match(
      gbif_terms = c(
        'scientificName', 
        'family',
        'order',
        'rank',
        'matchType',
        'confidence',
        'status')) %>%
    kable()
## [1] "All column names present"
name kingdom euConcernStatus scientificName family order rank matchType confidence status
Alopochen aegyptiaca Animalia under consideration Alopochen aegyptiaca (Linnaeus, 1766) Anatidae Anseriformes SPECIES EXACT 98 ACCEPTED
Cotoneaster ganghobaensis Plantae NA Cotoneaster ganghobaensis J.Fryer & B.Hylmö Rosaceae Rosales SPECIES EXACT 98 ACCEPTED
Cotoneaster hylmoei Plantae NA Cotoneaster hylmoei K.E.Flinck & J.Fryer Rosaceae Rosales SPECIES EXACT 98 ACCEPTED
Cotoneaster x suecicus Plantae NA Cotoneaster suecicus G.Klotz Rosaceae Rosales SPECIES EXACT 98 ACCEPTED
Euthamia graminifolia Plantae under preparation Euthamia graminifolia (L.) Nutt. Asteraceae Asterales SPECIES EXACT 98 ACCEPTED

The function inborutils::gbif_species_name_match is a wrapper around rgbif::name_backbone, so you can pass any argument of name_backbone. For example, you can set strict = TRUE to fuzzy match only the given names, but never a taxon in the upper classification:

species_df %>% 
    gbif_species_name_match(strict = TRUE) %>%
    kable()
## [1] "All column names present"
name kingdom euConcernStatus usageKey scientificName rank order matchType phylum kingdom1 genus class confidence synonym status family
Alopochen aegyptiaca Animalia under consideration 2498252 Alopochen aegyptiaca (Linnaeus, 1766) SPECIES Anseriformes EXACT Chordata Animalia Alopochen Aves 99 FALSE ACCEPTED Anatidae
Cotoneaster ganghobaensis Plantae NA 3025989 Cotoneaster ganghobaensis J.Fryer & B.Hylmö SPECIES Rosales EXACT Tracheophyta Plantae Cotoneaster Magnoliopsida 99 FALSE ACCEPTED Rosaceae
Cotoneaster hylmoei Plantae NA 3025758 Cotoneaster hylmoei K.E.Flinck & J.Fryer SPECIES Rosales EXACT Tracheophyta Plantae Cotoneaster Magnoliopsida 99 FALSE ACCEPTED Rosaceae
Cotoneaster x suecicus Plantae NA 3026040 Cotoneaster suecicus G.Klotz SPECIES Rosales EXACT Tracheophyta Plantae Cotoneaster Magnoliopsida 99 FALSE ACCEPTED Rosaceae
Euthamia graminifolia Plantae under preparation 3092782 Euthamia graminifolia (L.) Nutt. SPECIES Asterales EXACT Tracheophyta Plantae Euthamia Magnoliopsida 98 FALSE ACCEPTED Asteraceae

These are all accepted parameters of name_backbone: ‘rank’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘strict’, ‘verbose’, ‘start’, ‘limit’, ‘curlopts’. See ?name_backbone for more details.

For Python users, there is a similar (but no longer maintained) function in inbo-pyutils.