News

OBIS eDNA services, expertise and data publication

OBIS eDNA services, expertise and data publication

Overview

Introduction to eDNA

Environmental DNA, or eDNA, refers to the genetic material that can be extracted from an environmental sample such as (sea)water, sediment, soil, or even air. This genetic material can be analysed with a set of molecular tools to infer information about the organisms living in the sampled environment. This DNA was shed as waste, mucus, or cells and the analyses vary from presence/absence detection with taxon-specific genetic markers, to the characterization of whole communities or even multi-species assemblages with broad-use markers. Sampling and analysing environmental DNA allows us to study the diversity of oceanographic life from its smallest fraction invisible to the naked eye to its largest inhabitants that are nonetheless challenging to survey, and all the organisms that fall in between.

The cost effective, ethical nature of eDNA sampling has the potential to revolutionise our knowledge about ecosystems and species diversity, and unlock new insights for biodiversity studies, environmental monitoring, as well as provide reliable data to support decision-making. The potential for standardisation with well-documented protocols and supporting metadata represent an important step in generating and sharing biodiversity data that is globally accessible, comparable and reusable.

Summary of OBIS eDNA services

OBIS, the Ocean Biodiversity Information System, is actively engaged in eDNA research and the eDNA research community. Through its many projects, OBIS develops tools for the bioinformatic processing and analysis of eDNA data, works on the development of metadata standards, and can provide services to support partners in field sampling and lab work, and in making eDNA data findable, accessible, interoperable, and reusable (FAIR).

As a data publication platform, OBIS provides rigorous quality control and can publish occurrence data derived from quantitative (qPCR/ddPCR) or community-level (metabarcoding) eDNA approaches following community-approved data standards. This data can in turn be queried and accessed from OBIS to increase the reach and application of eDNA-based biodiversity data. The associated metadata not only provides the essential sampling information (longitude & latitude coordinates, depth, …) but allows publishers to link critical information on the environment sampled, the different protocols and methods used across the workflow (target gene, primer sequences, clustering or denoising approach, reference database used, …), the DOI of the published study the data comes from, as well as link back to the original sequence data deposited in INSDC databases (e.g. EMBL-EBI’s ENA or NCBI’s SRA).

Publishing eDNA data to OBIS

The guidebook on Publishing DNA-derived data through biodiversity data platforms, co-authored by OBIS and GBIF, provides a comprehensive view of why, where and how to publish DNA-derived biodiversity data. The OBIS manual further provides all the information needed to publish biodiversity data to OBIS, with a section dedicated to publishing DNA-derived data.

The data published to OBIS follows the Darwin Core community data format, ensuring the data is interoperable across many databases, searchable, accessible and machine-readable. A detailed explanation of the terms are readily available in the Guidelines Document - 2.2 Data Mapping and the dedicated DwC dnaderivdedata extension page. The OBIS resources page hosts a tool for finding the right DwC format. In short, DNA-derived data is published to OBIS as an Occurrence core dataset supplemented with the DNA derived data extension. The occurrence table captures information such as the occurrence id, sampling date, location, time, ASV, read abundance, taxonomic assignment and the taxon’s corresponding WoRMS ID. The dnaderiveddata table in turns captures DNA-specific information such as the target gene, primers used, sequencing platform, bioinformatic parameters and the ASV sequence. It again contains the occurrence id to link the data to the occurrence table.

A series of video tutorials are available on the OBIS Youtube Channel to help you prepare and publish your datasets. These include the series How To Use & Publish data with OBIS which explains how to format your datasets to follow DwC criteria. OBIS also organised a webinar on genetic data, with an introduction to how OBIS is incorporating data, how genetic data can be accessed and a use case from the first eDNA dataset provided by OBIS-USA. The recording of the webinar can be watched here.

Finally, OBIS has an example eDNA metabarcoding dataset with scripts for data formatting available on the OBIS github page.

OBIS vs GBIF: marine-specific guidelines tailored for OBIS

OBIS - the Ocean Biodiversity Information System — and GBIF — the Global Biodiversity Information Facility are both global biodiversity data sharing platforms and have a joint strategy and action plan to ensure the cross-platform flows and services of high-quality data about marine and coastal biodiversity. Both platforms use the same guidelines and Darwin Core community data format, and the same Integrated Publishing Toolkit (IPT) to publish and register datasets. When publishing a dataset through either OBIS or GBIF nodes, selecting the option to register the data to GBIF or OBIS ensures that the data flows from one platform to the next.

OBIS is focused on the marine realm and marine taxa, whereas GBIF publishes data across both terrestrial and marine realms. With the focus on marine datasets, stringent quality controls on the data published to OBIS increase the reliability of the data and lead to small differences in what information is required for publishing in OBIS as opposed to GBIF. The two main differences consider the taxonomic backbone used and the geographic coordinates data checks:

Accessing eDNA data from OBIS

Data in general, and DNA-derived data specifically, can be queried and downloaded from OBIS through the online mapper or the R package robis. This is also covered in the OBIS manual under “How to find genetic data in OBIS”.

The mapper tool allows you to query biodiversity data on OBIS by searching for data using specific criteria such as scientific name or geographic area. eDNA-based biodiversity data can also specifically be searched for by specifying the option for DNAderiveddata under the Extensions tab. As of the day of publishing this article, the resulting data layer holds 19,815,140 records of 5,226 species and 8,694 taxa across 51 datasets.

Once this layer is added, the data can be downloaded either all together as Darwin Core Archive dataset, or you can access each published dataset separately and download the original Darwin Core formatted files. If the dataset originates from a published study, the doi of that paper is provided to link to all the necessary information.

Genetic data can also be accessed from OBIS using the R package robis. Instructions for using the robis package to access DNA derived data can be found in the OBIS manual and the dedicated vignette.

Both the mapper and the R package are based on the OBIS API, which can also be used to find and download data. When using the API directly, you can filter the Occurrence records by specifying the extension DNADerivedData. You can further search through this e.g. by scientific name, date, depth, coordinates, or country.

Examples of eDNA datasets hosted on OBIS

There are currently 51 eDNA datasets publicly available on OBIS, amounting to 19,815,140 records of 5,226 species and 8,694 taxa.

A few examples:

eDNA services and resources developed by OBIS

A set of different tools and services were developed within the different eDNA projects led or supported by OBIS. These cover the different steps across an eDNA workflow, ranging from developing sampling protocols for citizen science, to developing bioinformatics pipelines to process raw eDNA sequences and enable the export of eDNA-derived biodiversity data ready to be uploaded to OBIS and accessed by users worldwide.

In addition to these services already developed, OBIS can guide partners across these steps in designing and carrying out eDNA studies as well as develop new tools depending on the project’s need.

Citizen Science eDNA Sampling Protocols

As part of the eDNA expeditions project, the OBIS team collaboratively developed citizen science protocols for sampling eDNA. An instruction video, sampling booklets detailing the protocol in six languages and sample information sheets and infographics can be downloaded under the eDNA expeditions Training Materials.

Bioinformatics pipeline : from raw sequences to species-annotated data in OBIS

OBIS developed a bioinformatics pipeline for processing raw metabarcoding sequences under the PacMAN project. Broadly speaking, it creates a framework that receives raw sequence data from eDNA samples, cleans, aligns, classifies sequences, and finally outputs a DwC-compatible table. To note, the pipeline automatically searches for aphia-IDs from WoRMS to include in the DWC-compatible tables. In addition the output also contains a phyloseq object, which is compatible with the commonly used phyloseq R package for sequence data analysis. The pipeline is under active development. More details about the PacMAN pipeline can be found on its associated GitHub repository.

Analytical tools

Capacity building and training

OBIS provides training on eDNA sampling and analysis methods. These can e.g. be tailored towards scientists, practitioners, site managers or policy makers.

A first training course was organised in Fiji in 2023 under the PacMAN project. The focus was on eDNA sampling, DNA extraction, qPCR for the detection of invasive species, and bioinformatic processing of metabarcoding data. All information and necessary links can be found on the PacMAN project website under the “Marine Invasive Species Early Detection: Utilising Molecular Tools” training course page.

OBIS projects developing eDNA research, tools and standards

OBIS is currently leading two large-scale research and surveillance projects, engaging citizens globally to survey the biodiversity across 25 marine UNESCO World Heritage sites (eDNA expeditions) and developing information networks for the surveillance of marine invasive species in the Pacific (PacMANn). OBIS also participates in two Horizon Europe projects reviewing bioinformatic workflows (MARCO-BOLO), and rethinking the eDNA data publishing infrastructure and metadata standards (eDNAqua-Plan).

eDNAexpeditions

https://www.unesco.org/en/edna-expeditions

pacman-logo

https://pacman.obis.org/

ednaquaplan-logo

https://ednaquaplan.com/

marcobolo-logo

https://marcobolo-project.eu/