RNA-Seq 1: Introduction

blog / Molecular Biology October 09 2019

RNA sequencing or RNA-seq uses next-generation sequencing (NGS) technology to provide a snapshot of the numbers and identities of RNA molecules in any sample at any time under a condition(s) of interest. Since its inception in the early 00’s until now, data gleaned from RNA-seq experiments has revolutionised our understanding of RNA, its roles in animal and plant development, the importance of differential gene expression during health and disease, and how individuals respond to drug treatments. It has also revealed that eukaryotic transcriptomes are much more complex than previously thought!

In this series, we will take you through the RNA-seq workflow from RNA isolation, library preparation, and sequencing, looking at the various sequencing methodologies and considerations for setting up an RNA-seq experiment.

Part 1 introduces RNA-seq, its advantages in the broad sense, and the overall workflow for a typical experiment. Let’s dig in!

So What Makes RNA-Seq So Great?

Unlike PCR-based approaches to examine gene expression, RNA-seq will provide you with quantitative information about all of the transcripts expressed in a cell, tissue, organism, or community (metatranscriptomics) at any given point in time under a condition(s) of your choice, e.g., during particular cell cycle phases in the presence of various drugs.

You can look at many different RNA species simultaneously (e.g. mRNA, miRNA, tRNA) or you may focus on a particular type of RNA. As well as all of this, because RNA-seq gives nucleotide-level resolution, it can reveal novel splice variants, posttranslational modifications and new genetic variants e.g., SNPs and insertions and deletions (INDELS).

The RNA-Seq Workflow

Figure 1: Typical RNA-Seq Workflow.

The three major sequencing technologies used for RNA-seq today are the Illumina, Pacific Biosciences and Oxford Nanopore platforms. While the underlying principles and protocols used for library preparation and sequencing vary between these platforms, the overall workflow is similar:

Total RNA Isolation

Intact and high purity RNA is critical for successful RNA-seq. Choose an RNA isolation kit that is validated for your sample type, and always perform quality control on your isolated samples, e.g., through fluorimetry or capillary electrophoresis, to check the integrity, purity and yield. Use freshly isolated samples if at all possible, otherwise make sure that older samples have been stored correctly, at an appropriate storage temperature in the presence of stabilisers as necessary. If your RNA isolation protocol doesn’t include a genomic DNA removal step, you may consider including a post-isolation DNase I treatment to remove contaminating genomic DNA.

RNA Enrichment

Depending on the focus of the analysis and the expected abundance of the RNA species of interest, the total RNA may be subjected to further purification including ribosomal RNA (rRNA) depletion to eliminate non-specific artifacts or unwanted RNA species and enrich for specific RNA fractions or species e.g., a large and small RNA fraction, or a specific RNA species e.g., miRNA, mRNA.

Library Preparation

This part of the workflow is usually performed using a kit compatible with the sequencing platform in use, and it involves a series of sub-steps:

  • Fragmentation. Larger RNA molecules, e.g., mRNAs are often fragmented either mechanically or enzymatically to produce smaller fragments that are suitable for sequencing.
  • Reverse transcription. Here, similar to RT-PCR, the RNA is converted to cDNA using oligo(dT) primers or random hexamers or a mixture of both, depending on the type of RNA in question and the level of RNA integrity.
  • Adaptor ligation. The cDNA molecules are blunted at the 5’ and/or 3’ ends enzymatically (this is often referred to as cDNA repair), and sequencing adapters are ligated.
  • Library cleanup and amplification. Libraries are enriched for correctly ligated cDNA fragments and PCR amplification is carried out to incorporate any remaining sequencing primer sequences.
  • Library quantification and quality control. cDNA libraries are usually quantified using a combination of real-time PCR and capillary electrophoresis or fluorimetric methods.

Sequencing and Data Analysis

If the cDNA meets the quality criteria and is of a suitable yield and concentration, it is ready for sequencing! What happens next depends on which sequencing platform you use. Following sequencing, the resulting data is subjected to quality control. This is carried out in a bioinformatics pipeline that is either associated with the sequencing instrument or built in-house. Data quality control and analysis includes trimming of sequencing adapters and removal of poor quality reads, followed by mapping of reads to a genome sequence, analysis of differential transcript expression, identification of novel transcripts, splice variants, mutations etc. and pathway analysis to assess whether differentially expressed genes are linked to particular cellular processes or biological pathways.

Stay Tuned!

That was it for Part 1. Stay tuned for Part 2 where we take a closer look at RNA quality considerations for RNA-seq!