The massively parallel sequencing technology known as next-generation sequencing (NGS) has revolutionized the biological sciences. With its ultra-high throughput, scalability, and speed, NGS enables researchers to perform a wide variety of applications and study biological systems at a level never before possible. Today’s complex genomic research questions demand a depth of information beyond the capacity of traditional DNA sequencing technologies. Next-generation sequencing has filled that gap and become an everyday research tool to address these questions.
However, all NGS platforms perform sequencing of millions of small fragments of DNA in parallel. Bioinformatics analyses are used to piece together these fragments by mapping the individual reads to the human reference genome. Each of the three billion bases in the human genome is sequenced multiple times, providing high depth to deliver accurate data and an insight into unexpected DNA variation. NGS can be used to sequence entire genomes or constrained to specific areas of interest, including all 22 000 coding genes (a whole exome) or small numbers of individual genes.
Next generation methods of DNA sequencing have three general steps:
- Library preparation: libraries are created using random fragmentation of DNA, followed by ligation with custom linkers
- Amplification: the library is amplified using clonal amplification methods and PCR
- Sequencing: DNA is sequenced using one of several different approaches
Firstly, DNA is fragmented either enzymatically or by sonication (excitation using ultrasound) to create smaller strands. Adaptors (short, double-stranded pieces of synthetic DNA) are then ligated to these fragments with the help of DNA ligase, an enzyme that joins DNA strands. The adaptors enable the sequence to become bound to a complementary counterpart.
Adaptors are synthesized so that one end is ‘sticky’ while the other is ‘blunt’ (non-cohesive) with the view to joining the blunt end to the blunt ended DNA. This could lead to the potential problem of base pairing between molecules and therefore dimer formation. To prevent this, the chemical structure of DNA is utilized, since ligation takes place between the 3′-OH and 5′-P ends. By removing the phosphate from the sticky end of the adaptor and therefore creating a 5′-OH end instead, the DNA ligase is unable to form a bridge between the two termini
In order for sequencing to be successful, the library fragments need to be spatially clustered in PCR colonies or ‘colonies’ as they are conventionally known which consist of many copies of a particular library fragment. Since these polonies are attached in a planar fashion, the features of the array can be manipulated enzymatically in parallel. This method of library construction is much faster than the previous labour intensive procedure of colony picking and E. coli cloning used to isolate and amplify DNA for Sanger sequencing, however, this is at the expense of read length of the fragments.
Library amplification is required so that the received signal from the sequencer is strong enough to be detected accurately. With enzymatic amplification, phenomena such as ‘biasing’ and ‘duplication’ can occur leading to preferential amplification of certain library fragments. Instead, there are several types of amplification process which use PCR to create large numbers of DNA clusters.
Emulsion oil, beads, PCR mix and the library DNA are mixed to form an emulsion which leads to the formation of micro wells.
In order for the sequencing process to be successful, each micro well should contain one bead with one strand of DNA (approximately 15% of micro wells are of this composition). The PCR then denatures the library fragment leading two separate strands, one of which (the reverse strand) anneals to the bead. The annealed DNA is amplified by polymerase starting from the bead towards the primer site. The original reverse strand then denatures and is released from the bead only to re-anneal to the bead to give two separate strands. These are both amplified to give two DNA strands attached to the bead. The process is then repeated over 30-60 cycles leading to clusters of DNA.
The surface of the flow cell is densely coated with primers that are complementary to the primers attached to the DNA library fragments. The DNA is then attached to the surface of the cell at random where it is exposed to reagents for polymerase based extension. In addition to nucleotides and enzymes, the free ends of the single strands of DNA attach themselves to the surface of the cell via complementary primers, creating bridged structures. Enzymes then interact with the bridges to make them double stranded, so that when the denaturation occurs, two single stranded DNA fragments are attached to the surface in close proximity. Repetition of this process leads to clonal clusters of localized identical strands. In order to optimize cluster density, concentrations of reagents must be monitored very closely to avoid overcrowding.
Certain competing methods of Next Generation Sequencing have been developed by different companies.
- 454 Pyrosequencing
- Ion torrent semiconductor sequencing
- Sequencing by ligation (SOLiD)
- Reversible terminator sequencing (Illumina)
- 3′-O-blocked reversible terminators
- 3′-unblocked reversible terminators
NGS technology has fundamentally changed the kinds of questions scientists can ask and answer. Innovative sample preparation and data analysis options enable a broad range of applications. For example, NGS allows researchers to:
- Rapidly sequence whole genomes
- Zoom in to deeply sequence target regions
- Utilize RNA sequencing (RNA-Seq) to discover novel RNA variants and splice sites, or quantify mRNAs for gene expression analysis
- Analyze epigenetic factors such as genome-wide DNA methylation and DNA-protein interactions
- Sequence cancer samples to study rare somatic variants, tumor subclones, and more
- Study the human microbiome and discover novel pathogens