Introduction to DNA Sequencing

Learn the basics of DNA sequencing with this introductory article on Sanger, Next Generation and Long-read Sequencing.

What is DNA Sequencing?

Deoxyribonucleic acid (DNA) contains the genetic instructions that all organisms use to function. These instructions are encoded within a string of nucleotides consisting of the four nucleotide bases: adenine (A), thymine (T), guanine (G), and cytosine (C). These bases can be translated into genes and other elements that govern what goes on inside a cell. To study these genetic elements, scientists use DNA sequencing to decipher the specific order of these bases.

DNA sequencing has fueled many types of biological inquiry. DNA sequencing can:

  • Help evolutionary biologists understand how organisms or genes are related to one another and how they evolve over time. ➤ Understanding Phylogenetics
  • Help biologists study individual genes and proteins by verifying the accuracy of molecular cloning experiments. ➤ Introduction to Molecular Cloning
  • Identify the mutations behind certain diseases
  • Diagnose genetic disease by sequencing a patient’s DNA
  • Diagnose infections by detecting and/or sequencing bacteria/viral DNA in patient samples

DNA Sequencing Methods

DNA sequencing methods have evolved tremendously over the last several decades enabling many advancements in biological research and diagnostics (Figure 1). Various techniques have led scientists to classify DNA sequencing technologies into different generations based on the throughput and approach.

The first generation of DNA sequencing methods developed in the late 1970s includes Maxam-Gilbert sequencing and the popular Sanger sequencing method. These methods can sequence up to 400 bases for Maxam-Gilbert sequencing and up to 1,000 for Sanger sequencing. Applied Biosystems automated and commercialized Sanger sequencing in 1987. The Human Genome Project generated the first sequence of the human genome using Sanger sequencing in 2003.

While first generation techniques only sequence one DNA fragment at a time, second generation sequencing methods, commercialized in the early 2000s, sequence millions to billions of DNA fragments simultaneously. These techniques are also called next generation sequencing (NGS) or massive parallel sequencing because they sequence many fragments at once. They generate reads (or the sequence of bases inferred from one DNA fragment) of ~50-500 bases in length. Biologists can use these methods to assemble short reads into a larger DNA sequence or compare reads to a known genome.

Assembling DNA Sequences

First generation and second generation sequencing methods are often referred to as short-read sequencing as the reads are short - typically less than 500 bases in length for NGS and less than 1,000 bases for Sanger sequencing. In contrast, third generation sequencing, also known as long-read sequencing and developed in the early 2010s, provides reads that are each over 10,000 bases (10 kilobases (kb)), in length.

Figure 1. History of sequencing technologies.

When choosing a sequencing strategy, you can combine multiple methods to overcome some of the limitations of each method on its own. For example, because of Sanger sequencing’s high accuracy, you can use it to confirm specific results from NGS.  Combining NGS and long-read sequencing can improve sequence assembly particularly when there are lengthy repetitive regions and structural variations.

Sanger Sequencing

What is Sanger sequencing? 

Frederick Sanger et al. developed Sanger sequencing, also known as chain-termination sequencing, in 1977. This technique uses electrophoresis and the incorporation of chain-terminating dideoxynucleotides during in vitro DNA replication. 

How does Sanger sequencing work? 

Sanger sequencing requires a DNA primer, DNA polymerases, deoxynucleotide triphosphates (dNTPs), and chain terminating di-deoxynucleotide triphosphates (ddNTPs). The Sanger sequencing reaction contains multiple DNA fragments of the same sequence and proceeds as follows: 

1. Denaturation and Annealing

Sanger sequencing begins by separating the double-stranded DNA fragments into two single-stranded DNA fragments (denaturation). Then, an oligonucleotide primer (also known as a sequencing primer) binds to the single-stranded DNA based on complementarity between the primer and the DNA sequence. 

Figure 2. Double-stranded DNA fragments unwind into two single-stranded fragments.

2. Extension

Next, a mixture of dNTPs and ddNTPs is added by DNA polymerase. If a dNTP is added, extension continues. If a chain-terminating nucleotide (ddNTPs) is added, extension stops (Figure 3). Because the polymerase incorporates chain-terminating nucleotides at random throughout the process, the resulting DNA fragments at the end of the reaction vary in length. Each ddNTP also includes a different fluorescent marker, which helps in the visualization step.

Figure 3. The extension step of Sanger sequencing results in DNA fragments of different lengths.

3. Separation

Amplified DNA fragments are then separated by size using capillary electrophoresis. In this process, smaller fragments travel faster than larger ones (Figure 3). This step arranges the DNA fragments in size order, with the smallest fragment migrating the furthest and reaching the detector first.

4. Visualization

The sequence is determined by visualizing the fluorescent tag incorporated at each position of the DNA sequence. As the DNA fragments move past the detector, a laser excites the fluorescent tags, and a camera records the emitted light (Figure 4).

Figure 4. Size separation of DNA fragments by capillary electrophoresis and visualization of fluorescence markers.

Because each ddNTP contains a different fluorescent marker, they produce a different signal. This method gives a chromatogram, which plots the fluorescence intensity as the DNA fragments migrate across the detector. Because fragments migrate in order of length, the fluorescence intensity corresponds to each position along the DNA molecule.

Figure 5. Chromatogram of Sanger sequencing in Geneious Prime. In the chromatogram, each peak represents the fluorescence intensity at each position along the DNA, with the inferred sequence below the plot.

When is Sanger sequencing used? 

Sanger sequencing is best for sequencing less than 1 kb of sequence at a time. It’s suitable for sequencing fewer targets as sequencing a large number of samples with this method can get expensive and time-consuming. Because Sanger sequencing is highly accurate, it remains the most widely used sequencing method. It’s also the gold standard in many applications and used to confirm next-generation sequencing or long-read sequencing results.

Pros and cons of Sanger sequencing

Pros

  • Simpler data analysis compared to NGS
  • Affordable for a small number of samples
  • Highly accurate
  • Longer reads compared to NGS (up to ~1 kb) so it can help improve accuracy and assembly of repetitive regions.

Cons

  • Requires a larger amount of input DNA
  • Low throughput
  • Unaffordable or impractical for sequencing a large number of samples

Sanger sequencing in Geneious Prime

Geneious Prime allows for powerful visualization and analysis of Sanger sequencing chromatograms, including: automated assembly of forward and reverse pairs, identifying sequence variants at a single base (single nucleotide polymorphisms, or SNPs), and detecting heterogeneity within the sample. Find tutorials and videos below to learn more about using Geneious Prime for your Sanger sequencing data.

Assembling Chromatograms Tutorial: Step-by-step guide with example exercises on editing and assembling chromatograms.

Intro to Sanger Sequencing Analysis: Video series demonstrating how to map chromatogram sequence against a reference sequence and how to call SNPs and variants using the Find Variations/SNPs tool.

Learn more about Sanger sequencing analysis features in Geneious Prime.

Next-Generation Sequencing (NGS)

What is next-generation sequencing?

In contrast to Sanger sequencing, which provides sequence data for one DNA fragment, NGS is a high throughput method that generates millions to billions of 50-500 bp reads per run. These reads can be pieced together in a process called DNA assembly.

How does next-generation sequencing work? 

The NGS process can be condensed into four basic steps: (1) Nucleic acid extraction, (2) Library preparation, (3) Sequencing, and (4) Analysis. Different NGS platforms require slightly different methodologies to prepare the DNA for sequencing and can impact the sequencing steps. There are many NGS techniques, and we go into more detail on sequencing by synthesis (e.g. Illumina) below.

1. Nucleic acid extraction

NGS begins by isolating the DNA or RNA from the sample of interest using a commercially available kit. Kit selection depends on the sample source (ex: bacteria, tissue, blood, etc.) If the nucleic acid extracted is RNA, use reverse transcriptase to convert it to DNA before library preparation.

Figure 6. Nucleic acid extraction.

2. Library preparation

While library preparation methods can differ based on the sequencing platform, it generally results in a pool of fragmented DNA containing adapter sequences attached to both ends of the DNA. 

The first step here is to fragment the DNA (<1 kb in length). Alternatively, if the sequence of interest is within a known region, researchers can amplify that region of DNA by PCR to generate the DNA fragments.

After fragmentation, adapter sequences are added to the ends of the DNA fragments. These sequences immobilize DNA fragments to a flow cell and contain sequences for sequencing primer attachment. If you plan to sequence multiple samples together in the same flow cell, adding a different DNA barcode per sample  allows you to distinguish the reads obtained from each sample. Depending on the amount of DNA extracted, you may or may not need to amplify the library to generate additional copies of the DNA.

Figure 7. Library preparation results in DNA fragmentation and the addition of adapters to the DNA.

3. Sequencing

Next, the DNA library is denatured and attached to the flow cell via the adapter sequences complementary to oligonucleotides on the flow cell. The flow cells contain billions of different spots to allow simultaneous sequencing of billions of fragments. The immobilized DNA sequences are amplified via repeated PCR cycles to generate thousands of copies of each original fragment, creating what is called a cluster. These dense clusters of identical sequence create a strong fluorescent signal that can be detected during the following step: sequencing by synthesis.

Sequencing by synthesis requires a sequencing primer complementary to the adapter sequence, DNA polymerase, and chain-terminating bases that contain a fluorescent dye. Each of the four bases contain a different label. As the bases lack the 3’-OH group, DNA polymerase can only add one base at a time. After addition, the fluorescence is read. Then, the 3’-OH is regenerated, and the fluorescent dye removed, allowing the addition of the next base. Sequencing can occur from both ends of the fragment (paired-end sequencing) or from one end of the fragment (single-end sequencing). The choice between paired-end vs. single-end sequencing has implications for sequence assembly.

Figure 8. Sequencing, based on the incorporation of fluorescently labeled nucleotides, gives a fluorescent signal after each base addition. Each base has a different fluorescent label.

4. Sequence analysis

After sequencing, the researcher can analyze the DNA sequence generated with computational tools such as Geneious Prime. The raw data undergoes preprocessing steps to improve quality and pair reads. Sequences can then be assembled based on their overlapping regions or aligned to a reference sequence. Learn more about preprocessing and various assembly methods in our guide to sequence assembly.

Figure 9. Different methods for sequence analysis include de novo assembly and mapping to a reference sequence.

When is next-generation sequencing used?

Next-generation sequencing is an ideal method for sequencing a large number of DNA fragments simultaneously. It’s also more suitable for DNA samples with a low starting quantity, such as environmental samples.

Pros and cons of next-generation sequencing

Pros

  • High accuracy compared to long-read sequencing
  • Lower cost compared to long-read sequencing

Cons

  • Difficult to assemble repetitive regions (ex: repetitive regions may be larger than the NGS read length)
  • Requires large data storage capabilities and computational resources

Next generation sequencing in Geneious Prime

Software like Geneious Prime can help with all stages of the NGS data analysis, from preprocessing raw data to visualizing alignments and assemblies. Learn more about how to use Geneious Prime for NGS analysis.

De Novo Assembly Tutorial: Step-by-step guide on performing a de novo assembly of short-read NGS data.

Map to Reference: Video demonstrating the map to reference tool to map NGS data against a reference genome.

Metagenomics Analysis: Video series on assembling, filtering, and analyzing NGS metagenomic data.

Assembling Your DNA Sequences: Overview of de novo and map to reference assembly.

Learn more about NGS visualization and analysis features in Geneious Prime.

Long-read Sequencing

What is long-read sequencing? 

As the name suggests, long-read sequencing generates long reads, generally 10 kb to over 50 kb. These methods sequence a single DNA molecule directly without amplification steps. Theoretically, assembling reads from long-read sequencing is easier than short-read sequencing as there are fewer fragments to join. However, long-read sequencing typically has more errors than short-read sequencing.

How does long-read sequencing work?

Long-read sequencing was first described in the late 2000s, with several companies developing their own approach.

Single-molecule real-time (SMRT) sequencing (PacBio)

SMRT sequencing begins with the addition of adapters to both ends of the DNA fragment to create a circular template (Figure 6). Primer and polymerase are then added to the DNA library. Single molecules are attached to wells called zero-mode waveguides (ZMWs), with each ZMW containing its own molecule. Polymerase incorporates nucleotides conjugated with fluorescent dyes. Since each nucleotide has a different fluorescent dye, the emitted light, measured in real-time, corresponds to a different base.

Figure 10. An overview of SMRT sequencing. Adapters are added to the DNA to form a circular template. Single DNA molecules are attached to the ZMW well for sequencing. The polymerase incorporates nucleotides containing fluorescent dyes that are read by a camera at each well. 

Nanopore sequencing (Oxford Nanopore)

Nanopore sequencing relies on the fact that each nucleotide has a different size and electrical property. In Nanopore sequencing, a motor protein unwinds the DNA, allowing it to pass through a pore on a membrane (Figure 7). When a base passes through the pore, it reduces the ionic current. Since each base reduces the ionic current by a different amount, the individual bases, and thus the sequence of the DNA strand, can be determined.

Figure 11. Nanopore sequencing provides a different change in ionic current as a different base moves through the pore.

When is long-read sequencing used? 

Long-read sequencing is suitable for sequencing genomes that do not have a high quality reference genome. Researchers also use it for variant detection, as large rearrangements, indels, and repetitive regions can be difficult to assemble with NGS. Long-read sequencing can also facilitate the detection of DNA modifications like DNA methylation, as long-read sequencing uses the original DNA sample instead of amplified DNA.

Pros and cons of long-read sequencing

Pros

  • Easier library preparation
  • Assembly becomes less ambiguous compared to assembling short reads
  • Portability. The platforms are about the size of a USB.
  • Fast sequencing runs

Cons

  • Higher error rate than NGS

Long-read sequencing in Geneious Prime

Geneious Prime can help you assemble various types of long-read sequencing reads. Find more resources in our Knowledge Base, including:

Can Geneious Prime assemble PacBio or Minion data? This article discusses how you can work with Geneious Prime to assemble different data types from PacBio or Nanopore.

Can I perform a hybrid assembly with Illumina and PacBio/MinION data? This article shows how Geneious Prime can assemble data from short-read and long-read sequencing together.

Recommended Resources

A Brief Tour of Geneious Prime

Take look at the Geneious Prime interface with this brief tour video.

Geneious Prime Features

Geneious Prime puts industry-leading bioinformatics and molecular biology tools directly into researchers' hands.

Geneious Prime Knowledge Base

The most commonly asked questions about Geneious Prime installation, licensing, functionality and more.

Get Started with Geneious Prime

Start Your 30 Day Free Trial of Geneious Prime.

Geneious Academy

Learn the basics with this introductory guide to assembling DNA sequences with de novo and map to reference.
Practice how to trim, edit and assemble chromatograms. Find heterozygotes and incorrectly called bases.
Learn to align your chromatograms to a reference sequence, find variants and verify sequences in this video series.
Assemble, filter and analyze an NGS amplicon metagenomic data set in Geneious Prime with this practical exercise.
Get started with Geneious today