Types of Sequencing

There are multiple types of sequencing, each with unique strengths, weaknesses, costs, and purposes. This post will briefly describe a few types of sequencing platforms, their pros and cons, and when they should or shouldn’t be used.

Types of Sequencing

Second-Generation Sequencing Systems:

Sequencing by Synthesis (SBS) Sequencing - One of the most widely used types of sequencing. SBS sequencing operates on the basic concept of sequencing originally designed through Sanger sequencing, or “Sequencing by Synthesis”. In SBS sequencing, DNA or RNA that has been made into double-stranded RNA (dsRNA), from each sample is “barcoded” with a small strand of identifying base pairs using PCR. Each strand of DNA or RNA from a sample will have this barcode on it in order to identify it. During sequencing, all the tagged sample is loaded onto a flow cell, and the strands of DNA/dsRNA will attach to the cell. The strands separate with an enzyme, and fluorescent tagged nucleotides are floating around. An enzyme attaches the complimentary nucleotide onto each nucleotide of a strand, and a laser shines onto the surface. Whenever the laser shines, it lights up the fluorescently tagged nucleotide that just attached, and each of the four nucleotides (AGTC) has a different color tag. As each tag lights up, the nucleotide is identified, and that data is recorded. This happens simultaneously across millions of strands of DNA/dsRNA. As the strand is matched to each of the complimentary color-tagged nucleotides, they are identified until the strand is completed. This gives us the sequence of the original strand, as it identifies what nucleotide binds each time through the laser.

Basic principle of Illumina Sequencing, as described by Untergasser, 2019. (a) flow cell overview; (b) incorporation of nucleotides results in fluorescents release; (c) zoomed in the flow cell -different nucleotides with their specific fluorescents color (modified after Genomics 2019).

There are variations in this basic principle also using Sequencing by Hybridization (Illumina), Emulsion PCR (IonTorrent), and DNA nanoball technology (BGI group).

Here’s a more in-depth explanation of Sequencing by Synthesis from Illumina if you’d like to learn more.

Sequencing by Synthesis Strengths:

  • Widely available and considered a very reliable and accurate form of sequencing.

  • Typically low-cost and rapid.

  • Can be used to investigate most forms of DNA/RNA and transcriptomic or genomic studies.

Sequencing by Synthesis Weaknesses:

  • Can only be used for relatively short strands or fragments of DNA. Depending on the type of research you’re doing, you may need to be able to read strands longer than the standard 150-300 base pair reads.

  • Not good for sequencing strands that have a lot of sequence repeats or identifying between transcript isoforms.

When to use Sequencing by Synthesis:

  • Basic transcriptomic analysis of mRNA or gene expression using well-annotated genomes of known species.

  • When you have highly fragmented DNA samples that are being assembled de novo.

  • If you do not need to perform sequencing on long strands of DNA.

Third-Generation Sequencing Systems

While several of the third-generation sequencing systems are still under development, there are exciting advances in the ability to read longer strands and sequence without having to amplify or synthesize complementary strands. Of the current third-generation systems being developed, we will only discuss Nanopore and PacBio, as they are two of the most established and currently commonly used systems.

Third-generation sequencing is different from the above-described second-generation sequencing because rather than synthesizing a complementary strand and identifying nucleotides based on that synthesis process, third-gen systems identify the nucleotides directly from the DNA strand using chemical or physical property identification of each nucleotide (AGTC).

Each of the four basic nucleotides has an inherent chemical signature or identifying feature that is used in third-gen sequencing systems.

“PacBio” sequencing - uses DNA polymerase (an enzyme that copies DNA strands) to identify fluorescently labeled nucleotides as it copies the original sequence. As the DNA strand is copied, the nucleotides being used to make the copy are tagged with fluorescent colors and identify each nucleotide being used, thus, the sequence of the DNA strand is identified.

Oxford Nanopore Sequencing - Uses a nano-sized channel to pull each DNA strand through a channel. As the strand of DNA passes through the channel, the unique chemical structure of each nucleotide interrupts an electrical current. Due to the unique chemical/physical structure of each nucleotide (AGTC), the interruption is unique as well and creates an identifying signal. The signal is recorded, and thus, the nucleotide is identified. The strand of DNA works down the channel, identifying each nucleotide as it passes through the nanopore opening.

Third-generation sequencing strengths:

  • Sequences don’t have to be amplified in order to be read.

  • Long sequences can be read.

  • Epigenomic research can be performed because methylated or otherwise epigenetically modified nucleotides can be identified.

  • Nanopore sequencers are small, affordable, and portable.

Third-generation sequencing weaknesses:

  • Many consider these types of long-reading sequencing platforms to be less accurate.

  • There are still aspects of their use and integration that are under development.

  • Small fragments can produce weak signals, making sequencing difficult or inaccurately call base pairs.

When to use Third-generation sequencing:

  • For studies requiring long reads.

  • Epigenetics research.

Here’s a great article from Frontline Genomics for another overview of sequencing technologies.

References

  • Untergasser, Gerold & Bucher, Philipp & Dresch, Philipp. (2019). Metagenomics Profiling of Tumours Using 16S-rRNA Amplicon Based Next Generation Sequencing.

Previous
Previous

Red Flags Series: Forgetting Redundancy

Next
Next

Using Gene Expression data to make decisions.