Sequence reference databases are comprised of entries for genomic, transcript, and protein sequences.
Biosecurity sequence reference databases are an essential component of biosecurity screening programs. While control lists name organisms with the potential to pose a severe threat to the public, animal, or plant health, sequence reference databases contain all published nucleic acid or protein sequences, whether or not they are potentially harmful.
What is a Sequence?
A DNA sequence is simply a series of A, C, T, and G’s. Sequences can be relatively short segments that code for a single protein or long sequences that could make up the genome of an organism. Most sequences are benign and likely code for no harmful proteins, but some sequences can code for a protein of concern. Sequences can contain a mix of both, which can make it very challenging to determine if a sequence could be used for malicious purposes.
Sequence reference databases help to determine what a sequence is, based on the data they have about each sequence. Most scientific journals require the sequences included in research reports to be entered into a public sequence database. The data generally includes the name of the species, publications related to the sequence, and other pertinent information that help characterize the nature of the sequence.
What is the INSDC?
There are three International Nucleotide Sequence databases that comprise the International Nucleotide Sequence Database Collaboration (INSDC). The INSDC hosts the links to the three databases’ data submission and retrieval tools in its database. Since the three databanks share their data daily, the information in the system is nearly identical between systems. Also, the most recent data is readily available to all at no charge.
What are the Partner Databases?
National Center for Biotechnology Information (NCBI): The NCBI in conjunction with the NIH, maintains GenBank®. The GenBank® database is one part of the INSDC. There is a new release every two months. The GenBank® data can be accessed through a variety of means, including the INSDC.
European Molecular Biology Laboratory (EMBL) - European Bioinformatics Institute (EBI): The EMBL-EBI is part of EMBL. They maintain the European Nucleotide Archive (ENA) database. The ENA/EBI makes a comprehensive record of global nucleotide sequencing information available. The data includes submissions of raw data and assembled sequences with annotation, as well as data from the major European sequencing centers.
DNA Database of Japan (DDBJ): As part of the INSDC, the DDBJ collects nucleotide sequence data. It also provides its supercomputer system for life science research activities. The DDBJ can assign a unique accession number for each sequence submitted. The majority of the data (99%) in the INSD comes from Japanese researchers through the DDBJ.
Interested in Battelle’s ThreatSEQ™ DNA Screening Service?
How are these Databases Used?
As part of a biosecurity screening framework, the customers placing an order, as well as the sequence being ordered, are screened. The customer is screened for veracity. The sequence is screened against reference sequence databases to see if it is associated with any organism on the control lists.
Battelle’s cloud-based, commercially available software, ThreatSEQ, adds a layer of analytics to the screening process. ThreatSEQ screens for all organisms on the U.S. Select Agents and Australia Group Lists (Tier 1) and compares the sequences against the full genomes of organisms derived from the NCBI and other select agent registries worldwide. This analytical software tool and service for the rapid screening of DNA sequences include additional information related to virulence, antibiotic resistance, immune evasion factors, human bioregulators, protein toxins, and other threat factors to assign a threat score to a sequence. This information is provided in a concise report that can be used to quickly determine the necessity of follow-up.
The INSDC provides oversight for the INSD, and by extension, for GenBank®, the EMBL-EBI, and the DDBJ. These international databases make it possible to screen sequences to ensure that sequences of concern and sequences of very high concern are flagged and investigated before any material is shipped.