Bioinformatics Bioinformatics Analysis Sequence Analysis

PSI-Seq Analysis

Pinterest LinkedIn Tumblr

PSI-Seq identifies RNA sequences having pseudouridine sites using high-throughput sequencing. PSI-Seq uses N-Cyclohexyl-N′-(2-morpholinoethyl) carbodiimide (CMC) to modify pseudouridine selectively, effectively halting reverse transcription. 

Working

  • The cDNA libraries are prepared by the ARTseq method. Briefly, samples are poly(A)-selected, treated with DNase, and fragmented. 
  • CMC is added to modify existing pseudouridine, and the 3′ ends of the RNA are ligated to linker-adapters. 
  • Next, the RNA fragments are reverse-transcribed to cDNA.
  • Upon encountering CMC-modified pseudouridine, reverse transcription is halted. 
  • cDNA strands of 20–80 nt are isolated and processed into cDNA libraries using the Ribo-Seq/ARTseq method before high-throughput sequencing.

Uses:

  • Identifies pseudouridylation sites in ncRNAs
  • Single-base resolution
  • Uses a regression analysis to compare reads in a specific location between treated and mock-treated libraries

Percent Spliced-In (PSI) values are commonly used to report alternative pre-mRNA splicing (AS) changes. Many PSI-detection tools were limited to specific AS events and were evaluated by silico RNA-seq data. A new tool PSI-Sigma has been developed that uses a new PSI index and used actual (non-simulated) RNA-seq data from spliced synthetic genes to benchmark its performance such as its  precision, recall, false positive rate and correlation in comparison with three leading tools rMATS, SUPPA2 and Whippet. PSI-Sigma outperformed these tools, especially in the case of AS events with multiple alternative exons and intron-retention events. PSI-Sigma.

Isoform-based methods

Isoform-based methods first find full-length transcripts and estimate their relative abundances in each sample based on the sequencing reads. Statistical testing is then used to identify significant differences in the relative transcript abundances between the different experimental conditions. The performance of this approach depends on accurate transcript quantification.

Tools

Cufflinks/cuffdiff2

Cufflinks is a pipeline consisting of different programs including cufflinks itself, cuffmerge and cuffdiff2. Cufflinks first performs transcript assembly by generating overlap graphs with fragments as nodes and edges connecting the compatible fragments. 

DiffSplice

DiffSplice takes a graph-based ab initio approach; it first reconstructs the transcriptome based on the aligned reads, then quantifies the abundance of alternative paths through the graph and finally identifies the alternative splicing modules (ASMs). 

Count-based methods

Count-based methods involve both exon-based and event-based approaches. In exon-based methods, read counts are assigned to different features, such as exons or junctions. The limitation of this approach is that it does not infer the type of the splicing event occurring in a gene but only identifies the differentially expressed exons/junctions between experimental conditions. In event-based methods, splicing events themselves are quantified by calculating the percentage spliced in (PSI) values for each event, which measure the fraction of mRNAs expressed from a gene that contains a specific form of that event.

Tools

DEXSeq

Exon-based method DEXSeq is an R/Bioconductor package developed to detect DS from RNA-seq data. The method uses a generalized linear model to model the differential usage of exons in different sample groups. 

edgeR

edgeR is an R/Bioconductor package that can be used to analyse differential expression at the gene, exon or transcript level. The exon count data is first fitted using a negative binomial generalized log-linear model, after which the differential exon usage is tested by comparing the log-fold-change of an exon to the log-fold-change of the entire gene.

JunctionSeq

JunctionSeq is an R/Bioconductor package, which utilizes a similar statistical strategy as DEXSeq. limma

limma is an R/Bioconductor package that is widely used for differential gene expression analysis and has been extended to perform DS using exon-level count data. 

dSpliceType

dSpliceType is an event-based method designed to find DS by utilizing base-wise read coverage signal data. It extracts the candidate splicing events for five different event types using the available gene annotations and the supported junction reads. 

MAJIQ

MAJIQ (Modeling Alternative Junction Inclusion Quantification) uses local splicing variations (LSVs) to quantify RNA splicing in genes. LSVs are splits in a splice graph where several edges come to or from a single exon called a reference exon. 

rMATS

rMATS is an event-based method, which is an improved version of the original MATS method. rMATS simultaneously accounts for sampling uncertainty within individuals and variability between samples by using a hierarchical framework to model the PSI of each event. The method uses a likelihood ratio test to examine whether the between-group differences of mean PSI exceed a given, user-defined threshold. 

SUPPA

SUPPA is an event-based method that uses transcript abundances to estimate the PSI values for each DS event. Transcript abundances are determined using the RSEM tool. In addition to the five standard types of splicing events, SUPPA also considers two other event types, alternative first exon (AF) and alternative last exon (AL). 

Write A Comment