Why Screen Sequences 200 bp at a Time?

By screening in 200 bp segments, it is possible, in a cost-effective manner, to identify sequences of concern that have been inserted into sequences.

Sequence screening is an essential component of any biosecurity screening program. Despite the declining cost to synthesize a sequence, it’s still costly and time-consuming to screen sequences. Consequently, 200 base pairs (bp) is the point where accuracy and cost are currently optimized.

What is a DNA Sequence?

A sequence is simply a string of A, C, T, G nucleotides that makes up part or all of a DNA sequence. A sequence of concern may be embedded into a larger, benign sequence to try to escape detection. Sequence screening tools need to be able to identify any concerning sequences that are embedded, whether on purpose or not. In conjunction with customer screening, this information can then be used to determine whether the order should be filled. 

What is a Base Pair?

DNA is a double-helix strand consisting of A, C, T, G nitrogen bases that bind backbones of phosphate and sugar. The base pairs are located between the phosphate and sugar and form the ladder in DNA. The bases can only match in specific ways: adenine (A) with thymine (T), and cytosine (C) with guanine (G). Depending on how the base pairs are ordered in a sequence, some may be able to code for a gene while others code for nothing. Typically, genes that code for a protein are at least 200bp long. 


Image: DNA sequences and text

Why Screen 200 bp?

In 2010, the U.S. Department of Health and Human Services (HHS) published its Screening Framework Guidance for Providers of Synthetic Double-Stranded DNA. This guidance provided the framework for voluntary biosecurity screening programs. Among other things, it called for screening 200 bp of DNA at a time against a sequence reference database. 200 bp was thought to be a good lower limit for identifying sequences of concern since genes are typically at least 200 bp. Since that time, 200 bp has been the accepted standard for a sequence screening window.

It is possible to screen a sequence using a window less than 200 bp each time a sequence is ordered, but it tends to result in a greater number of hits and likely a higher number of false positives. Filtering through these positives would be both time-consuming and expensive. Instead, the current guidance to screen 200 bp in a sliding, iterative process produces results that allow those screening to identify or rule out sequences of concern with a higher level of confidence. In fact, the Harmonized Screening Protocol v 2.0 promoted by the International Gene Synthesis Consortium (IGSC) - an international group that includes more than 80% of gene synthesis capacity in the world - also calls for screening segments of 200 bp.

As we become more sophisticated and facile with joining sequences, there have been calls for the screening of shorter sequences, especially in the context of oligo pools. The rationale for this is that it is feasible that bad actors will soon be able to use sequences of fewer than 200 bp to create a longer sequence they want for their purposes. Those involved in research and projects that make use of DNA sequences are well aware of this possibility. For this reason, discussion and consideration of a change to a different standard are ongoing. 

UltraSEQ™ Advanced DNA Sequencing Threat Analysis

Providing nucleic acid synthesis providers and genomic researchers rapid, scalable, and science-backed results for biological threat identification.

Explore Ultraseq
Posted
July 09, 2020
Author
Neeraj Rao
Estimated Read Time
3 Mins
Service Solution

UltraSEQ™

Rapid, Accurate Decisions for the Nucleic Acid Synthesis Industry and Beyond

Explore UltraSEQ
White Paper

Enhancing Future Biosecurity

Get the White Paper
Stay In the Know

Get Battelle Insights in Your Inbox

Get Updates

BATTELLE UPDATES

Receive updates from Battelle for an all-access pass to the incredible work of Battelle researchers.