Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology

Thind, Amarinder Singh; Monga, Isha; Thakur, Prasoon Kumar; Kumari, Pallawi; Dindhoria, Kiran; Krzak, Monika; Ranson, Marie; Ashford, Bruce

doi:10.1093/bib/bbab259

Abstract

Significant innovations in next-generation sequencing techniques and bioinformatics tools have impacted our appreciation and understanding of RNA. Practical RNA sequencing (RNA-Seq) applications have evolved in conjunction with sequence technology and bioinformatic tools advances. In most projects, bulk RNA-Seq data is used to measure gene expression patterns, isoform expression, alternative splicing and single-nucleotide polymorphisms. However, RNA-Seq holds far more hidden biological information including details of copy number alteration, microbial contamination, transposable elements, cell type (deconvolution) and the presence of neoantigens. Recent novel and advanced bioinformatic algorithms developed the capacity to retrieve this information from bulk RNA-Seq data, thus broadening its scope. The focus of this review is to comprehend the emerging bulk RNA-Seq-based analyses, emphasizing less familiar and underused applications. In doing so, we highlight the power of bulk RNA-Seq in providing biological insights.

next-generation sequencing, RNA-Seq applications, RNA-Seq genomics, bioinformatics advancements, transcriptomicsomics

Introduction

Transcriptomics provides deep insights into both coding and non-coding regions functional properties, helping understand various cellular conditions or disease phenotypes. Using these analyses, one can study differential gene expression across tissues or patient cohorts, perform isoform analysis and investigate splicing events across genes. To date, RNA sequencing (RNA-Seq) has been used in many fields including healthcare [1, 2], agriculture [3, 4], microbial diversity [5] and evolutionary biology [6], proving its usefulness in various applications [7]. As these fields have expanded, different biological challenges have pushed the development of novel computational tools that have evolved and diversified according to technological advancements, leading to many novel applications using bulk RNA-Seq. Bulk RNA-Seq measures the transcriptomics of pooled cell populations, tissue sections or biopsies.

Given the bespoke nature of many of the newer bulk RNA-Seq applications, widespread use can be impacted by lack of familiarity or propagation between research groups. For example, for copy number alteration (CNA) and indel detection, DNA analysis (as targeted, exome or genome sequencing) is extensively used in cancer research. Newly developed bioinformatic tools and integrated interpretational methods for RNA-Seq have the potential to perform and explain those analyses with better accuracy than before, in scenarios where whole genome sequencing (WGS) is not available because of limited resources. Other RNA-Seq applications with expanding utility include detection of microbial contamination in experiments [8], the understanding of transposable elements (TEs) [9, 10] or applications related to specific biological interests, e.g. isoform-guided prediction of epitopes [11] and estimation of immune cell fraction [12] from RNA-Seq data. Also, significant progress has been made to develop tools for the prediction of tumour-specific mutant peptides (cancer neoantigens), utilizing single-nucleotide variants (SNVs), indels, gene fusions or alternative splicing information. A transcriptome-directed diagnosis tool for Mendelian disease has helped overcome the limitations of conventional genomic testing [13]. Other studies recommend integrated approaches that improve confidence and yields in the mutations detected in low-purity tumours [14]. The growth in options for functional interpretation of non-messenger RNA data has also progressed [15], expanding the scope of total RNA-Seq for non-coding studies (Figure 2). In this review, we will discuss the range of evolving bulk RNA-Seq applications enabled by the latest tools and pipelines (Figure 1).

Figure 1

Open in new tab Download slide

Computational analyses based on bulk RNA-Seq. Core analyses such as transcriptome profiling, differential gene expression, and functional profiling of messenger RNA (mRNA) and microRNA (miRNA) are standard practices. However, Piwi and circular RNA-related studies are still growing from the last few years. Advanced analyses include copy number alteration, indel detections and microbial contamination investigation in RNA-Seq experiments. Other data integration and network analyses to study gene regulatory networks are quite helpful (refer to Supplementary Data).

Figure 2

Open in new tab Download slide

Overview of transcriptomics products and biomolecules interaction. Figure shows a variety of transcriptomics products and the role of various biomolecules interactions in the regulation of gene functionality.

Transcriptome assembly

Typical RNA-Seq applications on common species (such as human or mouse) utilize reference-based methods to map RNA-Seq reads onto a pre-existing reference genome. STAR [16] and HISAT2 [17] are the popular assemblers used for reference-based assembly.

A high-quality genome is available for limited species; therefore, studying rare or new species with RNA-Seq is more challenging and requires the transcriptome of these organisms to be reconstructed through the assembly process [18, 19].

Two different strategies are used for the transcriptome assembly of new species, genome-guided (when the reference genome exists) and de novo assembly. Genome-guided assemblers utilize pre-existing close-related genomes to guide the reconstruction of an alternative consensus sequence. Assemblers map RNA-Seq reads to the reference to orient sequences of new species along the chromosomes. These assemblies are somewhat biased because of their dependence on related reference, and it increases when the transcriptome of a novel organism is not close to the chosen reference [19]. De novo assembly methods, on the other hand, assemble sequences directly from the reads, which is less biased towards using the reference genome, but may be biased towards specific tissue type and conditions from which a sample of a novel species has been sequenced. However, the bias caused by the dependence on a related reference alone can be address by merging de novo and a related reference assembly in contigs [20].

Ungaro et al. [18] compares the reads mapping of the genome of various organisms, using both the transcriptome-guided assembly and de novo assembly; they choose to use a range of different reference genomes having a divergence of >15% from the species of interest. The guided assembly outperforms de novo approach alone, including when the divergence level is high. Holzer et al. [21] compared 10 de novo assembler on nine RNA-Seq datasets spanning different kingdoms of life. Evaluations of the tools is measured in a combination of 20 biological-based and reference-free metrics such as mapping rate, mismatches, reference coverage, contig F1, KC score, etc. Based on summarized metric of 20, the Trinity [22], SPAdes [23] and Trans-ABySS [24], followed by Bridger [25] and SOAPdenovo-Trans [26], generally outperformed the other tools. However, these performances vary among species. The selection of best assembly remains a challenging task, such metrics needs additional factors to be compared, such as tools abilities to reconstruct alternatively spliced transcripts/isoforms. Another example of computational difficulties in assembly is to distinguish the back-splice junctions that arise from an internal tandem duplication (generate duplicate exons within a gene) and the ones used for inferring circular RNA, because these kinds of reads are identical [27].

Evolving applications of bulk RNA-Seq

Copy Number Alteration from bulk RNA-Seq

CNA analyses are commonly used in cancer research. Amplification and deletion of the genes have signalling pathway implications and are a key determinant of molecular dysfunction in oncology. For example, PIK3CA amplification and deletion of the CDKN2A are associated with head and neck squamous cell carcinoma [28]. Many challenges lie in calling the CNA from RNA-Seq data, because of the differences in gene expression, which lead to the read depth variance of different magnitudes across genes. Because of these difficulties, most of these analyses are usually done using whole genome sequencing / whole exome sequencing (WGS/WES). However, recent advancements in bioinformatic tools and methods have assisted in the determination of CNA from bulk and single-cell RNA-Seq (scRNA-seq) data. Some of bulk RNA-Seq-based tools are CaSpER [29], CNAPE [30], CNVkit-RNA [31] and SuperFreq [32].

CNAPE is a machine learning approach that uses prior knowledge from The Cancer Genome Atlas (TCGA) data to predict the CNA in human cancer [33]. CaSpER first utilizes multiscale smoothing of expression signal by five-state Hidden Markov Model. Later smoothing of B allele frequency (BAF) signal is performed, followed by identification of the BAF shift threshold using the Gaussian mixture model. Finally, Copy Number Variants (CNV) correction is done by using the BAF shifts. CaSpER works on both bulk and scRNA-seq data. SuperFreq was initially developed for tracking somatic mutations in cancer. It was recently updated for calling absolute allele-sensitive CNA and can establish the relationship between CNA profile and mutation load in cancer. SuperFreq claims to accommodate small number of samples, paired with normal tumour comparisons. Compared with CNVkit-RNA, SuperFreq provides absolute, allele-sensitive and subclone aware CNA calls. There is no universal definition exists of what magnitude of amplification/deletion should be considered as amplified and deleted gene. For example, as per cosmic (https://cancer.sanger.ac.uk/cosmic/analyses), the gain/amplification of the gene is when the copy number is >5 (ploidy ≤2.7) or >9 (ploidy >2.7); however, the PURPLE [34] definition is 3^*sample ploidy for gain/amplification. Furthermore, the significance of CNA of genes is defined based on defined amplification/deletion and its occurrence across samples. Moreover, because of a single copy of sex chromosomes in male, the user should be careful of the threshold for chromosome Y and chromosome X.

Indel detection from RNA-Seq

Small genetic variations, either insertions or deletions, measuring from 1 to 10 000 base pairs in length, are called indels. They can be coding and non-coding, and in coding genome, indels are either frameshift or non-frameshifts. Indels are known to have various functions, such as in cancer, they disable tumour suppression, activate oncogenes and trigger an increased abundance of neoantigens [34]. Indels are less common in the genome than SNVs [35]. It is challenging to identify indels from RNA-Seq because of RNA-Seq library and mapping artefacts (often related to the misalignment of spliced reads). Furthermore, complications arise when somatic indels should be distinguished from germline indels, e.g. in cancer studies, lack of normal/healthy tissue RNA-Seq data presents challenges for the detection of indels from tumour RNA-Seq. New tools such as RNAIndel [36] can handle this three-class classification problem using supervised learning and classify coding indels into somatic, germline and artefacts. Sun et al. [37] demonstrated that RNA-Seq alignment is the most important step for accurate indel identification. In a benchmarked study of indel detection, Slosarek et al. [38] suggested the use of additional filtering using Opossum [39]. Some recent studies recommend the integration of RNA- and DNA-Seq data, which improves the sensitivity for identifying mid-sized and large indels missed by DNA-Seq [40]. For the annotation of indel and CNA tools such as Anntool [41] are in common use.

Gene fusion detection

Gene fusions result from break points combining within previous disparate coding regions and lead to unpredictable and potentially deleterious effects. Such fusions can produce greater amounts of active abnormal protein than non-fusion genes, which could lead to tumour genesis [42]. In cancer, gene fusion has been described with the combination of oncogenes, e.g. recurrent gene fusion is reported between Transmembrane protease, serine 2 (TMPRSS2) and Erythroblast Transformation Specific (ETS) transcription factor genes in prostate cancer [43].

Various tools detect gene fusions from RNA-Seq data based on either read mapping or de novo fusion transcript assembly. After benchmarking 23 different gene fusion detection methods (that leverage RNA-Seq solely as input) on simulated and real RNA-Seq data, Haas et al. [44] concluded that Arriba [45, 46], STAR-Fusion [44] and STAR-SEQR [44] are the most accurate and fastest for fusion detection. New databases and bioinformatics resources can characterize and evaluate the significance of the gene fusion [47]. ChiTaRS 2.1 [48], FusionCancer [49], the TCGA Fusion Portal [50] and FARE-CAFE [51] are the latest gene fusion databases. The ChiTaRS 2.1 database is the largest fusion database that catalogues over 29 000 fusion transcripts, primarily from humans.

Neoantigen prediction from RNA-Seq data

Different genetic events such as Indel, gene fusion, SNV can generate mutant peptides called neoantigens. These neoantigens play a vital role in tumour-specific T cell-mediated antitumour immune response and have applications in immunotherapy [52]. Not only genetic events but alternative splicing also has proven to contribute to candidate neoantigens’ generation [53], but most of the tools use indels, SNVs or gene fusions events for neoantigen prediction. Few tools such as ASNEO [54] identify personalized alternative splicing-based neoantigens from RNA-Seq. INTEGRATE-neo [55] and ScanNeo [56] pipelines have been designed to predict neoepitopes derived from gene fusion and indels, respectively.

Measuring the expression of TEs using RNA-Seq data

TEs are sequences of DNA, which switches location within a genome. These TE elements constitute ~66–69% of the human [57, 58] and 30% fruit fly (Drosophila melanogaster) genomes [59]. They are integrated into a different genomic location either using DNA (‘transposons’) or through reverse transcription (‘retrotransposons’) [60]. There are three types of retrotransposons: long terminal repeats, long intersected elements and short intersected elements (SINEs) [61]. TEs can contribute to the formation of transcription factors [62] and abnormal expression of TE associated with cancer [63], and neurodegenerative diseases [64].

Next-generation sequencing (NGS) has advantages over in vitro techniques for studying retrotransposition, e.g. NGS can detect novel SINE amplification events during development, and this is not possible with in vitro methods. Despite the challenge posed by the repetitive nature of retroelements for developing bioinformatics analyses algorithms, reasonable progress has been made recently by the bioinformatics community. Moreover, in SINE elements, the short length of SINE elements contributes additional difficulty to the identification of the SINE instances that regulate cell functions.

MMR [65] and SalmonTE [66] tools were designed to investigate transposon expression using RNA-Seq explicitly. These packages perform quantification at the subfamily level. However, newer packages SINEsFIND [67] and ERVmap [68] address the need for locus-specific quantification of TE-derived transcripts. Novel fusion events that connect TE promoter sequences to downstream coding gene sequences can also be detected using the LIONS tool [69]. Table 1 describes more TE tools designed for RNA-Seq. For detailed algorithmic strategies, we recommended following TE tools-specific reviews [70–73].

Table 1

The table lists tools and pipelines for the measurements and analysis of the transposable elements from RNA-Seq data

Pipeline	Year	Applications or functional use
GeneTEFlow [9]	2020	A Nextflow-based pipeline for analysing gene and transposable elements expression from RNA-Seq data
SQuIRE [95]	2019	Quantifying interspersed repeat expression (SQuIRE): RNA-Seq analysis pipeline that provides a quantitative and locus-specific picture of TE expression
L1EM [96]	2020	For accurate locus-specific LINE-1 RNA quantification
repEnrich [97]	2014	To study genome-wide transcriptional regulation of repetitive elements, the original study shows transcriptionally active long terminal repeat retrotransposon
TEtools [98]	2017	Facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes
TEtranscripts [99]	2015	For including transposable elements in differential expression analysis of RNA-Seq datasets

Pipeline	Year	Applications or functional use
GeneTEFlow [9]	2020	A Nextflow-based pipeline for analysing gene and transposable elements expression from RNA-Seq data
SQuIRE [95]	2019	Quantifying interspersed repeat expression (SQuIRE): RNA-Seq analysis pipeline that provides a quantitative and locus-specific picture of TE expression
L1EM [96]	2020	For accurate locus-specific LINE-1 RNA quantification
repEnrich [97]	2014	To study genome-wide transcriptional regulation of repetitive elements, the original study shows transcriptionally active long terminal repeat retrotransposon
TEtools [98]	2017	Facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes
TEtranscripts [99]	2015	For including transposable elements in differential expression analysis of RNA-Seq datasets

Open in new tab

Table 1

The table lists tools and pipelines for the measurements and analysis of the transposable elements from RNA-Seq data

Pipeline	Year	Applications or functional use
GeneTEFlow [9]	2020	A Nextflow-based pipeline for analysing gene and transposable elements expression from RNA-Seq data
SQuIRE [95]	2019	Quantifying interspersed repeat expression (SQuIRE): RNA-Seq analysis pipeline that provides a quantitative and locus-specific picture of TE expression
L1EM [96]	2020	For accurate locus-specific LINE-1 RNA quantification
repEnrich [97]	2014	To study genome-wide transcriptional regulation of repetitive elements, the original study shows transcriptionally active long terminal repeat retrotransposon
TEtools [98]	2017	Facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes
TEtranscripts [99]	2015	For including transposable elements in differential expression analysis of RNA-Seq datasets

Pipeline	Year	Applications or functional use
GeneTEFlow [9]	2020	A Nextflow-based pipeline for analysing gene and transposable elements expression from RNA-Seq data
SQuIRE [95]	2019	Quantifying interspersed repeat expression (SQuIRE): RNA-Seq analysis pipeline that provides a quantitative and locus-specific picture of TE expression
L1EM [96]	2020	For accurate locus-specific LINE-1 RNA quantification
repEnrich [97]	2014	To study genome-wide transcriptional regulation of repetitive elements, the original study shows transcriptionally active long terminal repeat retrotransposon
TEtools [98]	2017	Facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes
TEtranscripts [99]	2015	For including transposable elements in differential expression analysis of RNA-Seq datasets

Open in new tab

Detection of microbial contaminants

Advanced tools can detect the contaminants’ presence in the experiment using the same RNA-Seq data, which is sequenced for testing some biological hypothesis. Generally, these tools extract the contaminants information from unmapped reads to the reference genome, i.e. reads that does not map to the reference genome. Many experimental standards have been developed to reduce laboratory contamination, despite this unexpected microbial contamination is reported in many studies. Recent studies [74, 75] suggest that researchers should consider measuring the functional impacts of contamination as this affects the quality of the research. Since cell manipulation for a range of experimental designs is essential in modern research and eukaryotic cells are highly likely exposed to microorganisms at the time of transition. Sometimes these contaminations contribute to a false experimental interpretation because contaminants may cause their host cells to have significant morphological and physiological changes. Furthermore, identifying infectious agents is vital in donated cells in medical and clinical facilities to prevent donor–patient disease transmission. These studies explore the effect of cross-contamination caused by exogenous sources, including laboratory reagents.

Some tools designed for this purpose require full FASTQ files as an input, whereas other new tools work directly on unmapped BAM files. A few tools are explicitly designed for virus detection. VirTect [76], VIRTUS [77] and ViGen [78] detect viruses using RNA-Seq. In contrast, the Decontaminer tool [8] and contamination screen are designed for bacteria, fungal and virus contamination detection. Decontaminer also provides an online visualization architect to explore the experimental contamination in detail.

Metatranscriptomics analysis

Metatranscriptome denotes the total RNA copies of the genes in a community of organisms. At a specific time of sampling, it is considered as a unique entity. Metatranscriptome varies with time and environmental changes. It can be used to obtain the whole expression profile in a community and demonstrate the dynamics of gene expression patterns over time. RNA-Seq is on the rise to conduct metatranscriptomics and microbiome dynamics studies. Metatranscriptomics analysis remarkably studied the alteration of gut microflora and its association with diseases like inflammatory bowel disease [79], obesity [80] and diabetes [81]. The gut microbiome plays roles in directing the host immune system development, post-embryonic development, host behaviour, detoxification [82]. Peters et al. [83] showed a relationship between gut microbiota and immunotherapy responses via metatranscriptomics in melanoma patients. Recently, metatranscriptomics employed as a surveillance tool for detecting the presence of the severe acute respiratory syndrome coronavirus 2 genome through the nasopharyngeal swabs of COVID-19 disease patients [84]. Studies have also explored the microbiome’s role in host evolution [85], which are predicted to be co-evolved [86].

Numerous bioinformatics pipelines are available for metatranscriptomics to investigate complex microorganism communities. These pipelines can be classified as assembly-free and assembly-based pipelines [87]. HUMAnN2 [88], MetaTrans [89], SAMSA2 [90] use assembly free strategy. SqueezeMeta [91], IMP [92] and MOSCA [93] are assembly-based tools. MetaTrans and SAMSA2 implemented for both taxonomic and differential gene expression analysis purposes, whereas HUMAnN2 does not carry out differential gene expression analysis. Among assembly-based tools, only MOSCA performs differential expression (DE) studies; the rest two, SqueezeMeta and IMP, are limited to taxonomic and functional analysis. Galaxy-based pipeline, ASaiM-MT [94], is designed for functional and taxonomy quantitation using HUMAnN2 and MetaPhlan2, respectively. The accurate quantification of organisms or functional activity within the complex metatranscriptome of samples remains an open challenge, because it is difficult to identify the optimal sampling frequency for different times, and the compositional nature of microbiome data frequently violates assumptions of statistical methods.

Determining cell type abundance (cell type deconvolution) and expression from bulk tissues

The method of estimating the proportion of each cell type is called cell type deconvolution, which is essentially performed with cell type-specific gene expression profiles for bulk transcriptomics samples [100]. Understanding the composition of cell types in disease-relevant tissues is an important step towards discovering cellular disease targets. Flow cytometry or single-cell RNA-Seq are most commonly used for cell type deconvolution. Currently, advanced bioinformatics tools can perform the same task using bulk RNA-Seq, which also estimates cell type abundances. For example, CIBERSORTx [101] tool can infer cell type-specific gene expression profiles without physical cell isolation using machine learning models, and its utility is evaluated in several types of tumours. In Table 4, we list the cell type deconvolution tools for bulk RNA-Seq and highlight the essential features for each. These tools can be used to define the cellular heterogeneity of complex tissues such as cancer to investigate the mechanism of disease.

Variants calling from RNA-Seq

SNVs offer insights to decipher the novel biological mechanisms for complex diseases and define genotype and phenotype associations. The genomic and genetic variation provides clues to the drivers of both malignancy and metastasis [101–103]. The variant identification can be statistically challenging due to low sequencing depth and lower variant allele frequency. Diseases such as cancer further increase these challenges because of tumour subpopulation, low tumour purity or a highly disturbed genome. Typically, whole-genome/exome sequencing is predominantly used for calling genomics variants rather than transcriptome (RNA-Seq). However, the relatively lower cost of RNA-Seq is gaining attention as an alternative to Whole Genome sequencing/ Whole exome sequencing (WGS/WES). Even though variant calling from RNA-Seq has drawbacks, it can help in the investigation of discrepancies at both the transcriptomic and genomic level due to post-transcriptional modifications. For example, RNA editing, nonsense-mediated decay, could be missed in genomic data. Conversely, RNA-based variant calling can miss (a) variants in the alleles, which are not expressed, (b) splicing variants, (c) variants in the low-expressed gene and (d) is limited to expressed region, etc. In conclusion, both methods have pros and cons. Wherever applicable, integration of both is the best solution for exploring the genotypic and phenotypic relationship; for example, Coudray et al. [104] shows in the absence of RNA-Seq data, WES/WGS likely to miss key driver mutations [105].

The important variant calling pipeline steps are mapping, marking the duplicates, base recalibration, variants calling, filtering of low-quality variants and variants annotations. Identification of genomic variants from RNA-Seq data is further complicated by intrinsic complexity such as splicing, low-expressed regions and managing duplicate reads, but algorithms are evolving day-by-day. Many variants calling pipelines have been developed for underlying models, filters, input data requirements and targeted applications. These pipelines generally use different variant callers for calling germline and somatic variants. Table 2 details variant calling pipelines dedicated to RNA-Seq data.

Table 2

RNA-Seq-based variant calling pipelines. Various pipelines commonly used for variant calling from RNA-Seq data are reported in the table with the information of the aligner and variant caller used in the pipeline

Pipeline	Year	Aligner or input	Variant caller
Opossum + Platypus [39]	2017	Opossum pre-processes BAM (TopHat2 or Star 2-pass)	Platypus or GATK HaplotypeCaller
STAR 2-pass + MuTect2 [104]	2018	STAR 2-pass	MuTect2
VAP (Variant analysis Pipeline) [113]	2019	Star 2-Pass	GATK Haplotypecaller

Pipeline	Year	Aligner or input	Variant caller
Opossum + Platypus [39]	2017	Opossum pre-processes BAM (TopHat2 or Star 2-pass)	Platypus or GATK HaplotypeCaller
STAR 2-pass + MuTect2 [104]	2018	STAR 2-pass	MuTect2
VAP (Variant analysis Pipeline) [113]	2019	Star 2-Pass	GATK Haplotypecaller

Open in new tab

Table 2

RNA-Seq-based variant calling pipelines. Various pipelines commonly used for variant calling from RNA-Seq data are reported in the table with the information of the aligner and variant caller used in the pipeline

Pipeline	Year	Aligner or input	Variant caller
Opossum + Platypus [39]	2017	Opossum pre-processes BAM (TopHat2 or Star 2-pass)	Platypus or GATK HaplotypeCaller
STAR 2-pass + MuTect2 [104]	2018	STAR 2-pass	MuTect2
VAP (Variant analysis Pipeline) [113]	2019	Star 2-Pass	GATK Haplotypecaller

Pipeline	Year	Aligner or input	Variant caller
Opossum + Platypus [39]	2017	Opossum pre-processes BAM (TopHat2 or Star 2-pass)	Platypus or GATK HaplotypeCaller
STAR 2-pass + MuTect2 [104]	2018	STAR 2-pass	MuTect2
VAP (Variant analysis Pipeline) [113]	2019	Star 2-Pass	GATK Haplotypecaller

Open in new tab

Earlier tools based on non-splice aware aligners contribute little to address false positives calls due to the transcriptome complexity. To overcome these limitations, modern pipelines such as VAP work with a splice-aware aligner (e.g. STAR [16]) and use transcripts annotation data to identify reliable SNPs in RNA-Seq. Further prioritization of RNA variants can be done with tools such as RVboost. RVboost uses a boosting technique to train a model of ‘high-quality’ variants by using common HapMap variants, and RNA variants are prioritized and called on the basis of the trained model. Another machine learning tool ‘SmartRNASeqCaller’ [106] has been developed to improve true germline variants and minimize the burden of false-positive calls from RNA-Seq. ANNOVAR [107] and Ensembl Variant Effect Predictor [108] are top-rated tools for the annotation of the variants called by variant callers. However, new platforms such as open-cravat [109] provide varieties of tools for annotating variants as per user needs. Radenbaugh et al. [110] reported that inclusion of the RNA-Seq with genomic data increases the power to detect somatic mutations.

Identifying the cancer malignancy driver genes is quite challenging, because a high mutation rate in cancer often introduces random mutations everywhere in the genome. Highly mutable genes are not simply driver’s genes, as it is often assumed that genome-wide mutation rate is constant, e.g. mutation rate for lowly expressed and late replication genes is usually higher. Moreover, higher gene length contributes towards a false-positive signal for a driver gene. Whole exome sequencing (WXS)/WES approaches-based tools like MutsigCV [111] and OncodriveFML [112] deal with such biases and help to predict the positive driver signals. As per our knowledge, currently, similar tools are not developed for RNA-Seq-based data.

Expression quantitative trait loci and transcriptome-wide association study analysis

The genome-wide association studies (GWAS) link the association between the gene variations (SNPs) and the complex phenotype/disease. However, GWAS approaches are unable to find the causal SNPs or in vitro confirmation to interpret biological functionality. GWAS also ignored the association between genetic variants, DNA functional elements and complicated traits and diseases. Expression quantitative trait loci (eQTL) studies can help identify the extent of influence that a variant can have on gene expression. eQTL are the genetic variants associated with the expression levels of one or more genes located near a variant (cis-eQTLs) or in a distant location from the variant (trans-eQTLs). A typical eQTL analysis is based on statistical hypothesis testing between SNP genotypes and gene expression levels are result in P-values that represent the strength of the association between the two measurements. Since trans-eQTL identification involves many more genes to be tested, the eQTL association to the distant genes remains challenging. In the same study subjects, the importance of gene expression regulation proved by eQTL studies by measuring both genetic variation and transcriptome data. However, these designs are not cost effective. Because of the complicated extraction procedure and other clinical reasons, for some trait, usually, a limited number of datasets with matched GWAS and transcriptome data are available. Limited sample size does not provide enough power for the analysis. When transcriptome data are not available in GWAS samples, transcriptome-wide association study (TWAS) offers novel insights into complex disease mechanisms [114]. TWAS emerged as a powerful approach that integrates common features of eQTL and GWAS studies (generally utilize different population-based data) to test the gene–trait associations and prioritize potential causal genes having the effects of a mediate variant on the traits. TWAS helped in discovering this relationship with improved statistical power and reduced multiple testing burden (i.e. number of genes) compared with testing traits with genetic variations (i.e. number of SNPs) [115]. TWAS is a robust framework to discover susceptible risk genes in complex human traits [116]. TWAS determines if GWAS-significant variants are also enriched for eQTLs by detecting co-localization of expression signals at known GWAS loci [117].

For eQTL analysis, tools use various strategies such as one-at-a-time test (Krux [118], MatrixeQTL [119], GeneVar [120]), multiple regression (HEFT [121]) and one-at-a-time-regression (SNPTEST [122], R/QTL [123]). When an intricate linkage disequilibrium structure is present, the one-at-a-time strategy cannot control the false positives. Also, this approach is often underpowered to detect the full spectrum of trans-acting regulatory effects. Other tools based on Bayesian statistics are Sherlock [124], eQTLBMA [125], etc. Multivariate eQTL mapping approaches, either least absolute shrinkage and selection operator (LASSO) or Bayesian Evolutionary Stochastic Search (ESS), are more suitable for detecting polygenic effects [126]. Furthermore, many TWAS integration and analysis methods have been published in the recent past, such as PrediXcan, which constructs a weighted SNP-set-based test in GWAS using SNP weights directed from the expression mapping studies [117]. Another method, TWAS, predicts the association between the complex phenotype/disease and the predicted quantitative gene expression built using the Bayesian sparse linear mixed model [127]. To incorporate the randomness and to deal with complexity in the polygenic architecture of expression data, a non-parametric latent Dirichlet process regression (DPR) model was incorporated for TWAS [128]. Similarly, TIGAR is also DPR model based, but provides a user-friendly interface. Other tools are developed on Mendelian randomization and perform tests to predict the causal relationship between gene expression and disease trait; these are summary data-based Mendelian randomization (SMR) [129] and Generalized SMR (GSMR) [130], respectively. In conclusion, TWAS can facilitate our understanding of the molecular and causal mechanisms underlying variant–trait associations. TWAS developed a few years ago, and since then, it helped into the GWAS-prioritized SNPs, which may have a crucial role in revealing the cis-SNPs regulating the upstream target gene. Moreover, eQTL analysis is not limited to the coding part, but it is also used for various non coding RNAs (ncRNAs), such as long non-coding RNA (IncR)-eQTLs, miRNA QTLs and circular RNA (circRNA)-eQTLs, etc. eQTL mapping can be performed based on tissue types as well. However, much work is still required to elucidate the effect of trans-SNPs on the neighbouring upstream target gene of interest and the effect of rare variants.

mRNA-related analyses and application

One of the primary motives of most RNA-Seq studies is DE analysis between two or multiple conditions or comparing different phenotypes.

Differential expression analysis

DE analysis of RNA-Seq compares the transcriptomic expression of different diseases’ phenotypes or conditions and measures unknown transcripts that other old techniques such as microarray were not capable of detecting. Because of its low cost, it became a popular choice in various research applications, including the discovery of potential biomarkers and therapeutic targets in cancer [131].

Regardless of the type of the study, to access transcriptomic differences using read counts, one has to first map reads to the reference genome and quantify them. The quantification of reads depends on various transcriptomics features that one wants to measure i.e. genes, exons or gene isoforms. After reads are quantified, few steps are recommended before performing downstream DE analysis, such as filtering lowly expressed genes, normalization and/or batch effect correction. Filtering is recommended to eliminate low abundant genes that are believed to contribute to false discovery rate. Normalization (although not always required) corrects counts for various technical factors, including sequencing depth, RNA composition or gene length. The most common normalization techniques are Fragments Per Kilobase of transcript per Million mapped reads (FPKM), Reads Per Kilobase of transcript per Million mapped reads (RPKM), Counts Per Million (CPM), Trimmed Mean of M-values (TMM) and median of ratios. Depending on the design of the experiment, RNA-Seq data may also suffer from various systematic biases that can affect downstream analysis results. To remove unwanted variation in RNA-Seq data, one can apply methods like ComBat [132] or RUV-Seq [133]. Different complex statistical models are used to test which features have significant differences in their counts between the conditions. These models are often based on various data distribution assumptions, including negative binomial (NB) in DESeq2 [134], edgeR [135], or EBSeq [136], log-normal distribution in Ballgown [137], or relaxing the assumption of any distribution (non-parametric) as in NOISeq [138]. In case of high variabilities of gene expression within a condition or phenotype, a higher number of biological replicates are preferred to get significant statistical power in detecting differentially expressed features. Depending on the biological question to answer, comparison between the conditions can be performed in various forms such as pairwise, multiconditional or time series. To date, several reviews and studies have been published for the guidelines and best practices of the RNA-Seq applications [139, 140].

Large-scale time series RNA-Seq

Typically, static RNA-Seq captures a snapshot of the transcriptome at the time of sample collection, whereas the time series RNA-Seq methods are applications of RNA-Seq, which measures RNA abundances over time in biological samples. Its applications include the exploration of cellular differentiation, developmental biology processes, disease progression, etc. The general aim is to identify specific gene expression pattern over time. However, statistical analysis of time course data is challenging because of complex experimental design, large dynamic changes of the transcriptome. Compared with static RNA-Seq methods, these methods are not well benchmarked and validated [139]. Recent reports inform that no comparative study compares these tools considering metrics of factors such as the impact of pre-processing steps, various time points, different numbers of replicates, multiple conditions and variable noise levels. Table 3 explain various tools based on time course analysis. Each tool has its pros and cons, and at the individual level, no pipeline can address the complete characterization of the dynamic biological process. Tools based on static RNA-Seq have a major assumption that observed samples are collected independently at a fixed time point; hence, not appropriate for time course RNA-Seq statistic.

Table 3

Large-scale time series RNA-Seq analysis tools

Software	Year	Applications feature
MAPTest [161]	2020	Two series time course; longitudinal settings in repeatedly measured data; also enables classification between parallel differential expression or non-parallel differential expression
deepTS [162]	2020	Flexible and powerful web-based platform for identification and analysis of pairwise transcriptome data along with the visualization
AR (auto-regressive model) [163]	2019	Single series of longitudinal RNA-Seq time course data; balanced longitudinal data required with an even number of sample size; longer running time required as number of time point increases
ImpulseDE2 [164]	2018	One- or two-series time course data, longitudinally measured RNA-Seq and Chip-Seq data with more than six time points
Trendy [165]	2018	Single-series time course; longitudinally measured, use optimal segmented regression model; provides the location and direction of dynamic changes in expression
TSIS [166]	2017	Detects transcript isoform switches, graphic interface performed by Shiny app from R package
timeSeq [167]	2016	Two-series time course; method based on negative binomial mixed-effects model with major factors; distinguish non-parallel versus parallel temporal changes between conditions over time
EBSeq-HMM [168]	2015	Single-series time course; longitudinally measured, use auto-regressive Hidden Markov Model
Ngsp [169]	2015	One- or two-series time course; longitudinally measured, and advantageous for temporal dynamic patterns incorporated course experimental designs
Next maSigPro [170]	2014	One- or multi-series time course data; non-longitudinally measured, and use polynomial regression model for a different time variable. DEG is selected by the best-fitted model in a temporal manner

Software	Year	Applications feature
MAPTest [161]	2020	Two series time course; longitudinal settings in repeatedly measured data; also enables classification between parallel differential expression or non-parallel differential expression
deepTS [162]	2020	Flexible and powerful web-based platform for identification and analysis of pairwise transcriptome data along with the visualization
AR (auto-regressive model) [163]	2019	Single series of longitudinal RNA-Seq time course data; balanced longitudinal data required with an even number of sample size; longer running time required as number of time point increases
ImpulseDE2 [164]	2018	One- or two-series time course data, longitudinally measured RNA-Seq and Chip-Seq data with more than six time points
Trendy [165]	2018	Single-series time course; longitudinally measured, use optimal segmented regression model; provides the location and direction of dynamic changes in expression
TSIS [166]	2017	Detects transcript isoform switches, graphic interface performed by Shiny app from R package
timeSeq [167]	2016	Two-series time course; method based on negative binomial mixed-effects model with major factors; distinguish non-parallel versus parallel temporal changes between conditions over time
EBSeq-HMM [168]	2015	Single-series time course; longitudinally measured, use auto-regressive Hidden Markov Model
Ngsp [169]	2015	One- or two-series time course; longitudinally measured, and advantageous for temporal dynamic patterns incorporated course experimental designs
Next maSigPro [170]	2014	One- or multi-series time course data; non-longitudinally measured, and use polynomial regression model for a different time variable. DEG is selected by the best-fitted model in a temporal manner

Open in new tab

Table 3

Large-scale time series RNA-Seq analysis tools

Software	Year	Applications feature
MAPTest [161]	2020	Two series time course; longitudinal settings in repeatedly measured data; also enables classification between parallel differential expression or non-parallel differential expression
deepTS [162]	2020	Flexible and powerful web-based platform for identification and analysis of pairwise transcriptome data along with the visualization
AR (auto-regressive model) [163]	2019	Single series of longitudinal RNA-Seq time course data; balanced longitudinal data required with an even number of sample size; longer running time required as number of time point increases
ImpulseDE2 [164]	2018	One- or two-series time course data, longitudinally measured RNA-Seq and Chip-Seq data with more than six time points
Trendy [165]	2018	Single-series time course; longitudinally measured, use optimal segmented regression model; provides the location and direction of dynamic changes in expression
TSIS [166]	2017	Detects transcript isoform switches, graphic interface performed by Shiny app from R package
timeSeq [167]	2016	Two-series time course; method based on negative binomial mixed-effects model with major factors; distinguish non-parallel versus parallel temporal changes between conditions over time
EBSeq-HMM [168]	2015	Single-series time course; longitudinally measured, use auto-regressive Hidden Markov Model
Ngsp [169]	2015	One- or two-series time course; longitudinally measured, and advantageous for temporal dynamic patterns incorporated course experimental designs
Next maSigPro [170]	2014	One- or multi-series time course data; non-longitudinally measured, and use polynomial regression model for a different time variable. DEG is selected by the best-fitted model in a temporal manner

Software	Year	Applications feature
MAPTest [161]	2020	Two series time course; longitudinal settings in repeatedly measured data; also enables classification between parallel differential expression or non-parallel differential expression
deepTS [162]	2020	Flexible and powerful web-based platform for identification and analysis of pairwise transcriptome data along with the visualization
AR (auto-regressive model) [163]	2019	Single series of longitudinal RNA-Seq time course data; balanced longitudinal data required with an even number of sample size; longer running time required as number of time point increases
ImpulseDE2 [164]	2018	One- or two-series time course data, longitudinally measured RNA-Seq and Chip-Seq data with more than six time points
Trendy [165]	2018	Single-series time course; longitudinally measured, use optimal segmented regression model; provides the location and direction of dynamic changes in expression
TSIS [166]	2017	Detects transcript isoform switches, graphic interface performed by Shiny app from R package
timeSeq [167]	2016	Two-series time course; method based on negative binomial mixed-effects model with major factors; distinguish non-parallel versus parallel temporal changes between conditions over time
EBSeq-HMM [168]	2015	Single-series time course; longitudinally measured, use auto-regressive Hidden Markov Model
Ngsp [169]	2015	One- or two-series time course; longitudinally measured, and advantageous for temporal dynamic patterns incorporated course experimental designs
Next maSigPro [170]	2014	One- or multi-series time course data; non-longitudinally measured, and use polynomial regression model for a different time variable. DEG is selected by the best-fitted model in a temporal manner

Open in new tab

Table 4

Tools and pipelines for cell type abundance measurement from bulk tissue

Tool	Year	Features	Programming language	Software link
TIMER2.0 [193]	2020	Web server for comprehensive analysis of tumour-infiltrating immune cells	Web-tool R, Javascript	https://github.com/taiwenli/TIMER
CIBERSORTx [101]	2019	Impute gene expression profiles and provide an estimation of the abundances of member cell types in a mixed cell population	Web-tool Java, R	https://cibersortx.stanford.edu/
quanTIseq [194]	2019	Quantify the fractions of 10 immune cell types from bulk RNA-sequencing data	R, Shell (Bash)	https://icbi.imed.ac.at/quantiseq
Immunedeconv [12]	2019	Benchmarking of transcriptome-based cell type quantification methods for immune oncology	R	https://github.com/icbilab/immunedeconv
Linseed [195]	2019	Deconvolution of cellular mixtures based on linearity of transcriptional signatures	C++, R	https://github.com/ctlab/LinSeed
deconvSEQ [196]	2019	Deconvolution of cell mixture distribution based on a generalized linear model	R	https://github.com/rosedu1/deconvSeq
CDSeq [197]	2019	Simultaneously estimate both cell type proportions and cell type-specific expression profiles	MATLAB, R	https://github.com/kkan g7/CDSeq_R_Package
dtangle [198]	2019	Estimates cell type proportions using publicly available, often cross-platform, and reference data	R	https://github.com/gjhunt/dtangle
GEDIT [199]	2019	Estimate cell type abundances	Web-based tool Python, R	http://webtools.mcdb.ucla.edu/
EPIC [200]	2017	Simultaneously estimates the fraction of cancer and immune cell types	R	http://epic.gfellerlab.org/
WSCUnmix [201]	2017	Automated deconvolution of structured mixtures	MATLAB	https://github.com/tedro man/WSCUnmix
Infino [202]	2017	Deconvolves bulk RNA-Seq into cell type abundances and captures gene expression variability in a Bayesian model to measure deconvolution uncertainty	R, Python	https://github.com/ham merlab/infino
MCPcounter [203]	2016	Estimating the population abundance of tissue-infiltrating immune and stromal cell populations	R	https://github.com/ebec ht/MCPcounter

Tool	Year	Features	Programming language	Software link
TIMER2.0 [193]	2020	Web server for comprehensive analysis of tumour-infiltrating immune cells	Web-tool R, Javascript	https://github.com/taiwenli/TIMER
CIBERSORTx [101]	2019	Impute gene expression profiles and provide an estimation of the abundances of member cell types in a mixed cell population	Web-tool Java, R	https://cibersortx.stanford.edu/
quanTIseq [194]	2019	Quantify the fractions of 10 immune cell types from bulk RNA-sequencing data	R, Shell (Bash)	https://icbi.imed.ac.at/quantiseq
Immunedeconv [12]	2019	Benchmarking of transcriptome-based cell type quantification methods for immune oncology	R	https://github.com/icbilab/immunedeconv
Linseed [195]	2019	Deconvolution of cellular mixtures based on linearity of transcriptional signatures	C++, R	https://github.com/ctlab/LinSeed
deconvSEQ [196]	2019	Deconvolution of cell mixture distribution based on a generalized linear model	R	https://github.com/rosedu1/deconvSeq
CDSeq [197]	2019	Simultaneously estimate both cell type proportions and cell type-specific expression profiles	MATLAB, R	https://github.com/kkan g7/CDSeq_R_Package
dtangle [198]	2019	Estimates cell type proportions using publicly available, often cross-platform, and reference data	R	https://github.com/gjhunt/dtangle
GEDIT [199]	2019	Estimate cell type abundances	Web-based tool Python, R	http://webtools.mcdb.ucla.edu/
EPIC [200]	2017	Simultaneously estimates the fraction of cancer and immune cell types	R	http://epic.gfellerlab.org/
WSCUnmix [201]	2017	Automated deconvolution of structured mixtures	MATLAB	https://github.com/tedro man/WSCUnmix
Infino [202]	2017	Deconvolves bulk RNA-Seq into cell type abundances and captures gene expression variability in a Bayesian model to measure deconvolution uncertainty	R, Python	https://github.com/ham merlab/infino
MCPcounter [203]	2016	Estimating the population abundance of tissue-infiltrating immune and stromal cell populations	R	https://github.com/ebec ht/MCPcounter

Open in new tab

Table 4

Tools and pipelines for cell type abundance measurement from bulk tissue

Tool	Year	Features	Programming language	Software link
TIMER2.0 [193]	2020	Web server for comprehensive analysis of tumour-infiltrating immune cells	Web-tool R, Javascript	https://github.com/taiwenli/TIMER
CIBERSORTx [101]	2019	Impute gene expression profiles and provide an estimation of the abundances of member cell types in a mixed cell population	Web-tool Java, R	https://cibersortx.stanford.edu/
quanTIseq [194]	2019	Quantify the fractions of 10 immune cell types from bulk RNA-sequencing data	R, Shell (Bash)	https://icbi.imed.ac.at/quantiseq
Immunedeconv [12]	2019	Benchmarking of transcriptome-based cell type quantification methods for immune oncology	R	https://github.com/icbilab/immunedeconv
Linseed [195]	2019	Deconvolution of cellular mixtures based on linearity of transcriptional signatures	C++, R	https://github.com/ctlab/LinSeed
deconvSEQ [196]	2019	Deconvolution of cell mixture distribution based on a generalized linear model	R	https://github.com/rosedu1/deconvSeq
CDSeq [197]	2019	Simultaneously estimate both cell type proportions and cell type-specific expression profiles	MATLAB, R	https://github.com/kkan g7/CDSeq_R_Package
dtangle [198]	2019	Estimates cell type proportions using publicly available, often cross-platform, and reference data	R	https://github.com/gjhunt/dtangle
GEDIT [199]	2019	Estimate cell type abundances	Web-based tool Python, R	http://webtools.mcdb.ucla.edu/
EPIC [200]	2017	Simultaneously estimates the fraction of cancer and immune cell types	R	http://epic.gfellerlab.org/
WSCUnmix [201]	2017	Automated deconvolution of structured mixtures	MATLAB	https://github.com/tedro man/WSCUnmix
Infino [202]	2017	Deconvolves bulk RNA-Seq into cell type abundances and captures gene expression variability in a Bayesian model to measure deconvolution uncertainty	R, Python	https://github.com/ham merlab/infino
MCPcounter [203]	2016	Estimating the population abundance of tissue-infiltrating immune and stromal cell populations	R	https://github.com/ebec ht/MCPcounter

Tool	Year	Features	Programming language	Software link
TIMER2.0 [193]	2020	Web server for comprehensive analysis of tumour-infiltrating immune cells	Web-tool R, Javascript	https://github.com/taiwenli/TIMER
CIBERSORTx [101]	2019	Impute gene expression profiles and provide an estimation of the abundances of member cell types in a mixed cell population	Web-tool Java, R	https://cibersortx.stanford.edu/
quanTIseq [194]	2019	Quantify the fractions of 10 immune cell types from bulk RNA-sequencing data	R, Shell (Bash)	https://icbi.imed.ac.at/quantiseq
Immunedeconv [12]	2019	Benchmarking of transcriptome-based cell type quantification methods for immune oncology	R	https://github.com/icbilab/immunedeconv
Linseed [195]	2019	Deconvolution of cellular mixtures based on linearity of transcriptional signatures	C++, R	https://github.com/ctlab/LinSeed
deconvSEQ [196]	2019	Deconvolution of cell mixture distribution based on a generalized linear model	R	https://github.com/rosedu1/deconvSeq
CDSeq [197]	2019	Simultaneously estimate both cell type proportions and cell type-specific expression profiles	MATLAB, R	https://github.com/kkan g7/CDSeq_R_Package
dtangle [198]	2019	Estimates cell type proportions using publicly available, often cross-platform, and reference data	R	https://github.com/gjhunt/dtangle
GEDIT [199]	2019	Estimate cell type abundances	Web-based tool Python, R	http://webtools.mcdb.ucla.edu/
EPIC [200]	2017	Simultaneously estimates the fraction of cancer and immune cell types	R	http://epic.gfellerlab.org/
WSCUnmix [201]	2017	Automated deconvolution of structured mixtures	MATLAB	https://github.com/tedro man/WSCUnmix
Infino [202]	2017	Deconvolves bulk RNA-Seq into cell type abundances and captures gene expression variability in a Bayesian model to measure deconvolution uncertainty	R, Python	https://github.com/ham merlab/infino
MCPcounter [203]	2016	Estimating the population abundance of tissue-infiltrating immune and stromal cell populations	R	https://github.com/ebec ht/MCPcounter

Open in new tab

Allele-specific expression

The diploid genome with two paternal alleles of a gene has a relative difference in the allelic transcription due to various factors, viz differential rates of transcription, relative differences in the mRNA stability and multiple cis- trans-regulatory elements in action to control the allelic transcription abundance. These allelic expression differences can range from a slight quantitative change to many folds up/down in magnitude [141–143]. Thus, accurate estimation of the differential abundance of a gene’s allelic copies is referred to as allele-specific expression (ASE). It provides a substantial understanding of alleles’ relative expression in a diseased state termed as eQTLs [144, 145]. RNA-Seq can measure the ASE of the gene coming from parental alleles. Usually, the heterozygous SNP information in the transcribed genomic region helps differentiate the RNA-Seq reads’ origin. The paternal and maternal allele gene expression ratio is used to measure the ASE differences, which may lead to some phenotypic variations. RNA-Seq analysis platforms have grown robust and sensitive enough to provide quantitative information on differential ASE for genes [142, 143].

The standard reference-based mapping approaches suffer from the biased estimation in favour of the most abundant allelic counterpart matching with the reference genome or transcriptome [144, 146]—many attempts to circumvent this bias while scoring the alignment [145, 147, 148]. Among many approaches, one of the chief steps involved was hashing method, which refers to assigning reads to the alleles with the already known SNP [149]. WASP tools are based on this approach [150] and speculated that approach helped in navigating the fact that SNPs matching to non-reference transcripts usually match to multiple genomic locus. Thus, allele assignment of these reads is discarded in WASP. Another tool, ASElux, assigns SNPs, quantifies allelic reads directly from unmapped RNA-Seq datasets [151]. Some tools developed using statistical approaches to deal with the individual-specific random effects to assign SNPs using multi-SNP correlations for a single transcript. ASE analysis in a population tool intakes RNA-Seq datasets and infers multi-SNPs correlations for a single gene-level ASE between two conditions [152]. The other tools in this related domain using statistical approaches are GeneiASE [153], QuASAR [154] and MBASED [155]. In addition, a set of best practices, comprehensive discussion on the ASE expression methods from RNA-Seq datasets is provided here [156].

Differential isoform expression and differential splicing

Recent transcriptome analyses have shown that up to 95% of human genes are alternatively spliced to encode proteins with different functions [157]. There are two types of splicing described: (1) constitutive splicing and (2) alternative splicing. In constitutive splicing, non-coding introns are removed from the pre-mRNA, and exons are joined together to form a mature mRNA [158, 159], whereas in alternative splicing, exons can be included or excluded in different combinations to create a diverse range of mRNA transcripts from a single pre-mRNA. It serves as one of the major mechanisms to increase the diversity of the transcriptome. Alternative splicing events are commonly classified into various categories such as cassette exons, mutually exclusive exons, retained intron, alternative 5′ splice sites, alternative 3′ splice sites, alternative promoters and alternative poly-A sites [160]. In a cassette exon, an intervening exon between two other exons from the mature mRNA sequence can be either included or skipped to generate two distinct protein isoforms. In the case of intron retention, the intron removal can be suppressed, resulting in the entire intron’s retention. Alternate splice sites of 5 ‘or 3’ can extend or shorten the exons. Intron retention is generally the most difficult type to detect due to the experimental artefacts. Since incompletely spliced transcripts may retain the intron fragments, which can be mislabelled as the intron’s retention.

Identification and analysis of alternative splicing events

The advancement of RNA-Seq methods has facilitated the study of the splicing process’s genome-wide impact and regulation. New bioinformatics tools led to the identification of novel transcribed regions and splicing isoforms and provided that a significant fraction of the transcribed RNA in human cells undergo alternative splicing [171]. An increasing number of studies illustrated that the selection of wrong splice sites causes human disease [172], emphasizing the importance of studying differentially spliced transcripts. However, the large datasets produced from RNA-Seq experiments and complex information to analyse have turned this into a challenging task.

Several tools have been developed for quantification and identification of alternative splicing events such as SpliceR [173], SpliceSeq [174], splicing Viewer [175], ASTALAVISTA [176, 177], GESS [178], DiffSplice [179], DSGseq [180], GLiMMPS [181], GPSeq [182], MATS [183], rMATS [184], MISO [185], JuncBASE [186], finesplice [187], SpliceTrap [188] and ASpli2 [189]. Each tool has its advantages and disadvantages. In the context of alternative splicing, the SpliceSeq, SpliceViewer, DiffSplice and ASTALAVISTA tools also facilitate intuitive visualization of RNA-Seq results. GESS (graph-based exon-skipping scanner) performs de novo detection of skipping event sites from raw RNA-Seq reads without prior knowledge of gene annotations. GESS method builds a splice site-link graph from first hand, raw RNA-Seq reads and then implements a walking strategy on this graph by iteratively navigating subgraphs to reveal those with a pattern corresponding to an exon-skipping event. Thus, it can provide a more accurate and comprehensive picture of skipping events associated with a cell’s particular physiological condition.

Both GESS and DiffSplice can identify splicing events without prior annotation. MATS compares two samples without replicates, whereas rMATS compares two samples with replicates. Running rMATS without replicates automatically invokes the MATS algorithm. DSGseq is an exon-based method that uses the NB distribution to model sequencing reads on exons and proposes an NB-statistic to detect differentially spliced genes. It can also detect the novel alternative splicing events and highlight differentially spliced.

MISO (mixture-of-isoforms) model is a statistical model that estimates the expression of alternatively spliced exons and isoforms and assesses confidence in these estimates from RNA-Seq data. The MISO model uses Bayesian inference to compute the probability that a read originated from a particular isoform. ASpli evaluate the difference in AS events’ magnitude by comparing the exon inclusion or intron retention percentage using splice junctions, and for the differential use of introns, exons and splicing junctions, read counts are used. This integrative process enables to detect the differences in both annotated and novel AS events. Main steps involved in the ASpli workflow are parsing the genome annotation into bins and aligning the read against the bins. Finally, based on the count of the reads aligned to the bins or junction, differential bins are inferred.

To identify alternative splicing events, SpliceR utilizes the full-length transcript output from RNA-Seq assemblers. SpliceR also annotates the differentially spliced elements’ genomic coordinates, facilitating downstream sequence analysis of each splicing pattern. Fraction values are calculated for each transcript isoform to detect the transcript switching between conditions. MISO, ASTALAVISTA and DiffSplice are unable to evaluate novel features because these methods do not generate the genomic coordinates of differentially spliced regions. Recently various approaches for differential splicing and DE at isoform have been benchmarked [190] and reviewed [191, 192], highlighting basic characteristics along with the pros and cons of some of the commonly used software.

Prospective

RNA-Seq algorithms and methodologies are continuing to evolve. Advancement in statistical models vastly improved bioinformatics methods, and many new applications of the RNA-Seq-based analyses are now possible. The key statistical considerations and methods evolved at sample level, gene/transcript level, exon level and the methods diversified with respect to different statistical models and data distributional assumptions. As the novel tools are now providing various ways to answer the same biological question, the pitfall for the user is to choose the most optimal and appropriate model. The choice of parameters within the method can also impact the final results and interpretation of data. In that case, it is recommended to try various tools and settings and to support the choice of the model by benchmarking studies and in-depth reviews of the methods. However, the user should carefully design their experiment and optimize the experimental design by power analysis for RNA-Seq beforehand. If biological samples are processed in different batches of the sequencing, the use of technical replicates is also recommended.

Bias due to sequencing library can also alter the outcome, e.g. insufficient small non-coding RNA (sncRNA) detection bias due to RNA modification, isoforms detection bias due to short reads. The new sequencing method PANDORA can detect sncRNAs that once undetected by other libraries due to RNA modification that interferes with ligation and reverse transcription. Similarly, an improved read coverage and sequencing depth can overcome the disadvantage of short read length. New technologies such as long-read sequencing and scRNA-seq open new possibilities where short-read sequencing and bulk RNA-Seq have limitations. For example, long RNA-Seq can better resolve the genome assembly of the highly repetitive region of the larger genomes. Furthermore, it can detect long structural variants, full-length isoforms sequence and examine splicing better than short-read sequencing. On the other hand, scRNA-Seq helps understand tumour heterogeneity and drug-resistant clones with higher resolution. It is also very efficient in the investigation of cell lineages and cancer stem cell evolution. Other NGS and microarray technologies restricted rare cell type analyses to a large cell population only, but scRNA-Seq can analyse rare cell type more efficiently. scRNA-Seq assisted in the identification of exhausted CD8 cells in cancer therapy and establishing drug-resistant clones with higher resolution. Furthermore, with the help of time course scRNA-Seq experiments, we are opening up a wealth of biological information regarding spatio-temporal gene expression at different stages of cell growth or under disease stages. On the other hand, it is not straightforward to construct the cell lineages from the bulk RNA-Seq and microarray-based technologies. In the past, the gene regulatory networks was used to construct time series datasets from bulk and microarray platforms. However, these methods cannot identify the rare cell type and perform downstream analyses in a computationally efficient way.

This review described the evolving methodology for bulk RNA-Seq-based analysis and highlighted the knowledge gaps. Due to different biological aims and scenarios, finding consent on the best tools or pipelines is challenging. However, within the scope of this article, we provided a good summary of methods and information on available resources together with benchmarking studies that can help non-experts users orient themselves on new applications that bulk-RNA-Seq analysis can accommodate.

Key Points

Detection of genomic features from RNA-Seq data with advanced bioinformatic algorithms.
Evolving bioinformatic tools have the potential to discover more hidden information in bulk RNA-Seq data such as microbial contamination, TEs expression, CNAs, neoantigen prediction and cell type deconvolution, etc.
Exploration of advance TWAS using bulk RNA-Seq.

Funding

The National Health and Medical Research Council Project Grant APP1181179 Ideas Grant (to A.S.T., B.A. and M.R.). Czech Science Foundation (20-04099S to P.K.T.).

Amarinder Singh Thind is a postdoctoral researcher (Bioinformatics) at the University of Wollongong, Australia and IHMRI. His current research interests include the development of methodology for the integration of genomics, coding and non-coding transcriptome to discover biomarkers in cancer contexts.

Isha Monga is working as postdoctoral research scientist at the Department of Dermatology, Columbia University Medical Center, New York. Her research interests include understanding and elucidation of the molecular mechanisms of skin disorders including autoimmune disease alopecia areata. To achieve that, she is currently analysing high-throughput datasets of the single-cell sequencing, GWAS fine-mapping and 3D genomics methods from the mouse model and patients’ skin samples.

Prasoon Kumar Thakur is a Bioinformatician at the laboratory of RNA Biology at the Institute of Molecular genetics, Prague, Czech Republic and pursuing a PhD degree at Charles University, Prague, Czech Republic. The focus of his current research is RNA splicing and high-throughput data analysis.

Pallawi Kumari is a Bioinformatics researcher at IMTECH-Bioinformatics Centre, Chandigarh, India. She has expertise in NGS genomics and transcriptomics. Her research interests lie in human cancer genetics.

Kiran Dindhoria is a research fellow at MTCC, CSIR-IMTech, Chandigarh, India. She has expertise in NGS genomics and metagenomics. Her research interests lie in human microbiome studies.

Monika Krzak is a postdoctoral researcher in Bioinformatics at the Genomics of Inflammation and Immunity Group at Sanger Institute, UK. Her research interests are focused on understanding the pathogenesis of immune-mediated diseases.

Marie Ranson is a Professor of Molecular Biology at the University of Wollongong, Australia. She has longstanding research interests in biomarkers of cancer invasion and metastasis. She has made substantial contributions to the plasminogen activation system field in cancer as well as in inflammatory diseases. She has also successfully led and/or contributed IP to several drug development projects related.

Bruce Ashford is a surgeon scientist within the Medical School, University of Wollongong, Australia. His current research interests include bioinformatics, health informatics, molecular oncology and reconstructive microsurgery. The main focus of his research is to discover biomarkers in cSCC cancer using various omics methods.

References

1.

Byron

SA

,

Van Keuren-Jensen

KR

,

Engelthaler

DM

, et al.

Translating RNA sequencing into clinical diagnostics: opportunities and challenges

.

Nat Rev Genet

2016

;

17

:

257

–

71

.

Google Scholar

OpenURL Placeholder Text

WorldCat

2.

Hong

M

,

Tao

S

,

Zhang

L

, et al.

RNA sequencing: new technologies and applications in cancer research

.

J Hematol Oncol

2020

;

13

:

166

.

Google Scholar

OpenURL Placeholder Text

WorldCat

3.

Howlader

J

,

Robin

AHK

,

Natarajan

S

, et al.

Transcriptome analysis by RNA-Seq reveals genes related to plant height in two sets of parent-hybrid combinations in Easter lily (Lilium longiflorum)

.

Sci Rep

2020

;

10

:

9082

.

Google Scholar

OpenURL Placeholder Text

WorldCat

4.

Kaur

G

,

Shukla

V

,

Kumar

A

, et al.

Integrative analysis of hexaploid wheat roots identifies signature components during iron starvation

.

J Exp Bot

2019

;

70

:

6141

–

61

.

Google Scholar

OpenURL Placeholder Text

WorldCat

5.

Bang-Andreasen

T

,

Anwar

MZ

,

Lanzen

A

, et al.

Total RNA sequencing reveals multilevel microbial community changes and functional responses to wood ash application in agricultural and forest soil

.

FEMS Microbiol Ecol

2020

;

96

:

1

–

12

.

Google Scholar

OpenURL Placeholder Text

WorldCat

6.

Thind

AS

,

Vitali

V

,

Guarracino

MR

, et al.

What’s genetic variation got to do with it? Starvation-induced self-fertilization enhances survival in paramecium

.

Genome Biol Evol

2020

;

12

:

626

–

38

.

Google Scholar

OpenURL Placeholder Text

WorldCat

7.

Conesa

A

,

Madrigal

P

,

Tarazona

S

, et al.

A survey of best practices for RNA-seq data analysis

.

Genome Biol

2016

;

17

:

13

.

Google Scholar

OpenURL Placeholder Text

WorldCat

8.

Sangiovanni

M

,

Granata

I

,

Thind

AS

, et al.

From trash to treasure: detecting unexpected contamination in unmapped NGS data

.

BMC Bioinformatics

2019

;

20

:

168

.

Google Scholar

OpenURL Placeholder Text

WorldCat

9.

Liu

X

,

Bienkowska

JR

,

Zhong

W

.

GeneTEFlow: a Nextflow-based pipeline for analysing gene and transposable elements expression from RNA-Seq data

.

PLoS One

2020

;

15

:e0232994.

Google Scholar

OpenURL Placeholder Text

WorldCat

10.

Roman

AC

,

Morales-Hernandez

A

,

Fernandez-Salguero

PM

.

RNA-Seq analysis to measure the expression of SINE retroelements

.

Methods Mol Biol

2016

;

1400

:

107

–

16

.

Google Scholar

OpenURL Placeholder Text

WorldCat

11.

Trincado

JL

,

Reixachs-Sole

M

,

Pérez-Granado

J

, et al.

ISOTOPE: ISOform-guided prediction of epiTOPEs in cancer

.

2020

.

10.1007/978-1-4939-3372-3_7

.

12.

Sturm

G

,

Finotello

F

,

Petitprez

F

, et al.

Comprehensive evaluation of transcriptome-based cell-type quantification methods for immuno-oncology

.

Bioinformatics

2019

;

35

:

i436

–

45

.

Google Scholar

OpenURL Placeholder Text

WorldCat

13.

Murdock

DR

,

Dai

H

,

Burrage

LC

, et al.

Transcriptome-directed analysis for Mendelian disease diagnosis overcomes limitations of conventional genomic testing

.

J Clin Invest

2021

;

131

:

1

–

13

.

Google Scholar

OpenURL Placeholder Text

WorldCat

14.

Wilkerson

MD

,

Cabanski

CR

,

Sun

W

, et al.

Integrated RNA and DNA sequencing improves mutation detection in low purity tumors

.

Nucleic Acids Res

2014

;

42

:

e107

.

Google Scholar

OpenURL Placeholder Text

WorldCat

15.

Mora

A

.

Gene set analysis methods for the functional interpretation of non-mRNA data-genomic range and ncRNA data

.

Brief Bioinform

2020

;

21

:

1495

–

508

.

Google Scholar

OpenURL Placeholder Text

WorldCat

16.

Dobin

A

,

Davis

CA

,

Schlesinger

F

, et al.

STAR: ultrafast universal RNA-seq aligner

.

Bioinformatics

2013

;

29

:

15

–

21

.

Google Scholar

OpenURL Placeholder Text

WorldCat

17.

Kim

D

,

Paggi

JM

,

Park

C

, et al.

Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

.

Nat Biotechnol

2019

;

37

:

907

–

15

.

Google Scholar

OpenURL Placeholder Text

WorldCat

18.

Ungaro

A

,

Pech

N

,

Martin

JF

, et al.

Challenges and advances for transcriptome assembly in non-model species

.

PLoS One

2017

;

12

:e0185020.

Google Scholar

OpenURL Placeholder Text

WorldCat

19.

Duarte

GT

,

Volkova

PY

,

Geras'kin

SA

.

A pipeline for non-model organisms for de novo transcriptome assembly, annotation, and gene ontology analysis using open tools: case study with Scots pine

.

Bio Protoc

2021

;

11

:

e3912

.

Google Scholar

OpenURL Placeholder Text

WorldCat

20.

Alhakami

H

,

Mirebrahim

H

,

Lonardi

S

.

A comparative evaluation of genome assembly reconciliation tools

.

Genome Biol

2017

;

18

:

93

.

Google Scholar

OpenURL Placeholder Text

WorldCat

21.

Hölzer

M

,

Marz

M

.

De novo transcriptome assembly: a comprehensive cross-species comparison of short-read RNA-Seq assemblers

.

Gigascience

2019

;

8

:

1

–

16

.

Google Scholar

OpenURL Placeholder Text

WorldCat

22.

Grabherr

MG

,

Haas

BJ

,

Yassour

M

, et al.

Full-length transcriptome assembly from RNA-Seq data without a reference genome

.

Nat Biotechnol

2011

;

29

:

644

–

52

.

Google Scholar

OpenURL Placeholder Text

WorldCat

23.

Bankevich

A

,

Nurk

S

,

Antipov

D

, et al.

SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing

.

J Comput Biol

2012

;

19

:

455

–

77

.

Google Scholar

OpenURL Placeholder Text

WorldCat

24.

Robertson

G

,

Schein

J

,

Chiu

R

, et al.

De novo assembly and analysis of RNA-seq data

.

Nat Methods

2010

;

7

:

909

–

12

.

Google Scholar

OpenURL Placeholder Text

WorldCat

25.

Chang

Z

,

Li

G

,

Liu

J

, et al.

Bridger: a new framework for de novo transcriptome assembly using RNA-seq data

.

Genome Biol

2015

;

16

:

30

.

Google Scholar

OpenURL Placeholder Text

WorldCat

26.

Xie

Y

,

Wu

G

,

Tang

J

, et al.

SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads

.

Bioinformatics

2014

;

30

:

1660

–

6

.

Google Scholar

OpenURL Placeholder Text

WorldCat

27.

Jakobi

T

,

Dieterich

C

.

Computational approaches for circular RNA analysis

.

Wiley Interdiscip Rev RNA

2019

;

10

:

e1528

.

Google Scholar

OpenURL Placeholder Text

WorldCat

28.

Johnson

DE

,

Burtness

B

,

Leemans

CR

.

Head and neck squamous cell carcinoma

.

Nat Rev Dis Primers

2020

;

6

:

93

.

Google Scholar

OpenURL Placeholder Text

WorldCat

29.

Serin Harmanci

A

,

Harmanci

AO

,

Zhou

X

.

CaSpER identifies and visualizes CNV events by integrative analysis of single-cell or bulk RNA-sequencing data

.

Nat Commun

2020

;

11

:

89

.

Google Scholar

OpenURL Placeholder Text

WorldCat

30.

Mu

Q

,

Wang

J

.

CNAPE: a machine learning method for copy number alteration prediction from gene expression

.

IEEE/ACM Trans Comput Biol Bioinform

2021

;

18

:

306

–

11

.

Google Scholar

OpenURL Placeholder Text

WorldCat

31.

Talevich

E

,

Shain

AH

.

CNVkit-RNA: copy number inference from RNA-sequencing data

.

bioRxiv

2018

;408534.

Google Scholar

OpenURL Placeholder Text

WorldCat

32.

Flensburg

C

,

Oshlack

A

,

Majewski

IJ

.

Detecting copy number alterations in RNA-Seq using SuperFreq

.

Bioinformatics

2021

;btab440, https://doi.org/10.1093/bioinformatics/btab440.

Google Scholar

OpenURL Placeholder Text

WorldCat

33.

Tomczak

K

,

Czerwinska

P

,

Wiznerowicz

M

.

The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge

.

Contemp Oncol (Pozn)

2015

;

19

:

A68

–

77

.

Google Scholar

OpenURL Placeholder Text

WorldCat

34.

Turajlic

S

,

Litchfield

K

,

Xu

H

, et al.

Insertion-and-deletion-derived tumour-specific neoantigens and the immunogenic phenotype: a pan-cancer analysis

.

Lancet Oncol

2017

;

18

:

1009

–

21

.

Google Scholar

OpenURL Placeholder Text

WorldCat

35.

Mills

RE

,

Luttig

CT

,

Larkins

CE

, et al.

An initial map of insertion and deletion (INDEL) variation in the human genome

.

Genome Res

2006

;

16

:

1182

–

90

.

Google Scholar

OpenURL Placeholder Text

WorldCat

36.

Hagiwara

K

,

Ding

L

,

Edmonson

MN

, et al.

RNAIndel: discovering somatic coding indels from tumor RNA-Seq data

.

Bioinformatics

2020

;

36

:

1382

–

90

.

Google Scholar

OpenURL Placeholder Text

WorldCat

37.

Sun

Z

,

Bhagwate

A

,

Prodduturi

N

, et al.

Indel detection from RNA-seq data: tool evaluation and strategies for accurate detection of actionable mutations

.

Brief Bioinform

2017

;

18

:

973

–

83

.

Google Scholar

OpenURL Placeholder Text

WorldCat

38.

Slosarek

T

,

Kraus

M

,

Schapranow

M-P

, et al.

Qualitative Comparison of Selected Indel Detection Methods for RNA-Seq Data

.

Cham

:

Springer International Publishing

,

2019

,

166

–

77

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

39.

Oikkonen

L

,

Lise

S

.

Making the most of RNA-seq: pre-processing sequencing data with Opossum for reliable SNP variant detection

.

Wellcome Open Res

2017

;

2

:

6

.

Google Scholar

OpenURL Placeholder Text

WorldCat

40.

Yang

R

,

Van Etten

JL

,

Dehm

SM

.

Indel detection from DNA and RNA sequencing data with transIndel

.

BMC Genomics

2018

;

19

:

270

.

Google Scholar

OpenURL Placeholder Text

WorldCat

41.

Makarov

V

,

O'Grady

T

,

Cai

G

, et al.

AnnTools: a comprehensive and versatile annotation toolkit for genomic variants

.

Bioinformatics

2012

;

28

:

724

–

5

.

Google Scholar

OpenURL Placeholder Text

WorldCat

42.

Edwards

PA

.

Fusion genes and chromosome translocations in the common epithelial cancers

.

J Pathol

2010

;

220

:

244

–

54

.

Google Scholar

OpenURL Placeholder Text

WorldCat

43.

Tomlins

SA

,

Rhodes

DR

,

Perner

S

, et al.

Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer

.

Science

2005

;

310

:

644

–

8

.

Google Scholar

OpenURL Placeholder Text

WorldCat

44.

Haas

BJ

,

Dobin

A

,

Li

B

, et al.

Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods

.

Genome Biol

2019

;

20

:

213

.

Google Scholar

OpenURL Placeholder Text

WorldCat

45.

Kim

P

,

Jang

YE

,

Lee

S

.

FusionScan: accurate prediction of fusion genes from RNA-Seq data

.

Genomics Inform

2019

;

17

:

e26

.

Google Scholar

OpenURL Placeholder Text

WorldCat

46.

Uhrig

S

,

Ellermann

J

,

Walther

T

, et al.

Accurate and efficient detection of gene fusions from RNA sequencing data

.

Genome Res

2021

;

31

:

448

–

60

.

Google Scholar

OpenURL Placeholder Text

WorldCat

47.

Latysheva

NS

,

Babu

MM

.

Discovering and understanding oncogenic gene fusions through data intensive computational approaches

.

Nucleic Acids Res

2016

;

44

:

4487

–

503

.

Google Scholar

OpenURL Placeholder Text

WorldCat

48.

Frenkel-Morgenstern

M

,

Gorohovski

A

,

Vucenovic

D

, et al.

ChiTaRS 2.1--an improved database of the chimeric transcripts and RNA-seq data with novel sense-antisense chimeric RNA transcripts

.

Nucleic Acids Res

2015

;

43

:

D68

–

75

.

Google Scholar

OpenURL Placeholder Text

WorldCat

49.

Wang

Y

,

Wu

N

,

Liu

J

, et al.

FusionCancer: a database of cancer fusion genes derived from RNA-seq data

.

Diagn Pathol

2015

;

10

:

131

.

Google Scholar

OpenURL Placeholder Text

WorldCat

50.

Yoshihara

K

,

Wang

Q

,

Torres-Garcia

W

, et al.

The landscape and therapeutic relevance of cancer-associated transcript fusions

.

Oncogene

2015

;

34

:

4845

–

54

.

Google Scholar

OpenURL Placeholder Text

WorldCat

51.

Korla

PK

,

Cheng

J

,

Huang

CH

, et al.

FARE-CAFE: a database of functional and regulatory elements of cancer-associated fusion events

.

Database (Oxford)

2015

;

2015

:bav086.

Google Scholar

OpenURL Placeholder Text

WorldCat

52.

Jiang

T

,

Shi

T

,

Zhang

H

, et al.

Tumor neoantigens: from basic research to clinical applications

.

J Hematol Oncol

2019

;

12

:

93

.

Google Scholar

OpenURL Placeholder Text

WorldCat

53.

Kahles

A

,

Lehmann

KV

,

Toussaint

NC

, et al.

Comprehensive analysis of alternative splicing across tumors from 8,705 patients

.

Cancer Cell

2018

;

34

:

211

–

24.e216

.

Google Scholar

OpenURL Placeholder Text

WorldCat

54.

Zhang

Z

,

Zhou

C

,

Tang

L

, et al.

ASNEO: identification of personalized alternative splicing based neoantigens with RNA-seq

.

Aging (Albany NY)

2020

;

12

:

14633

–

48

.

Google Scholar

OpenURL Placeholder Text

WorldCat

55.

Zhang

J

,

Mardis

ER

,

Maher

CA

.

INTEGRATE-neo: a pipeline for personalized gene fusion neoantigen discovery

.

Bioinformatics

2017

;

33

:

555

–

7

.

Google Scholar

OpenURL Placeholder Text

WorldCat

56.

Wang

TY

,

Wang

L

,

Alam

SK

, et al.

ScanNeo: identifying indel-derived neoantigens using RNA-Seq data

.

Bioinformatics

2019

;

35

:

4159

–

61

.

Google Scholar

OpenURL Placeholder Text

WorldCat

57.

de

Koning

AP

,

Gu

W

,

Castoe

TA

, et al.

Repetitive elements may comprise over two-thirds of the human genome

.

PLoS Genet

2011

;

7

:e1002384.

Google Scholar

OpenURL Placeholder Text

WorldCat

58.

Andreassen

R

.

Alu elements in the human genome

.

Tidsskr Nor Laegeforen

2004

;

124

:

2345

–

9

.

Google Scholar

OpenURL Placeholder Text

WorldCat

59.

Smulders-Srinivasan

TK

,

Szakmary

A

,

Lin

H

.

A Drosophila chromatin factor interacts with the Piwi-interacting RNA mechanism in niche cells to regulate germline stem cell self-renewal

.

Genetics

2010

;

186

:

573

–

83

.

Google Scholar

OpenURL Placeholder Text

WorldCat

60.

Bennett

EA

,

Keller

H

,

Mills

RE

, et al.

Active Alu retrotransposons in the human genome

.

Genome Res

2008

;

18

:

1875

–

83

.

Google Scholar

OpenURL Placeholder Text

WorldCat

61.

Deininger

PL

,

Batzer

MA

.

Mammalian retroelements

.

Genome Res

2002

;

12

:

1455

–

65

.

Google Scholar

OpenURL Placeholder Text

WorldCat

62.

Zhou

W

,

Liang

G

,

Molloy

PL

, et al.

DNA methylation enables transposable element-driven genome expansion

.

Proc Natl Acad Sci U S A

2020

;

117

:

19359

–

66

.

Google Scholar

OpenURL Placeholder Text

WorldCat

63.

Burns

KH

.

Transposable elements in cancer

.

Nat Rev Cancer

2017

;

17

:

415

–

24

.

Google Scholar

OpenURL Placeholder Text

WorldCat

64.

Larsen

PA

,

Hunnicutt

KE

,

Larsen

RJ

, et al.

Warning SINEs: Alu elements, evolution of the human brain, and the spectrum of neurological disease

.

Chromosome Res

2018

;

26

:

93

–

111

.

Google Scholar

OpenURL Placeholder Text

WorldCat

65.

Kahles

A

,

Behr

J

,

Rätsch

G

.

MMR: a tool for read multi-mapper resolution

.

Bioinformatics

2016

;

32

:

770

–

2

.

Google Scholar

OpenURL Placeholder Text

WorldCat

66.

Jeong

HH

,

Yalamanchili

HK

,

Guo

C

, et al.

An ultra-fast and scalable quantification pipeline for transposable elements from next generation sequencing data

.

Pac Symp Biocomput

2018

;

23

:

168

–

79

.

Google Scholar

OpenURL Placeholder Text

WorldCat

67.

Carnevali

D

,

Conti

A

,

Pellegrini

M

, et al.

Whole-genome expression analysis of mammalian-wide interspersed repeat elements in human cell lines

.

DNA Res

2017

;

24

:

59

–

69

.

Google Scholar

OpenURL Placeholder Text

WorldCat

68.

Tokuyama

M

,

Kong

Y

,

Song

E

, et al.

ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses

.

Proc Natl Acad Sci U S A

2018

;

115

:

12565

–

72

.

Google Scholar

OpenURL Placeholder Text

WorldCat

69.

Babaian

A

,

Thompson

IR

,

Lever

J

, et al.

LIONS: analysis suite for detecting and quantifying transposable element initiated transcription from RNA-seq

.

Bioinformatics

2019

;

35

:

3839

–

41

.

Google Scholar

OpenURL Placeholder Text

WorldCat

70.

O'Neill

K

,

Brocks

D

,

Hammell

MG

.

Mobile genomics: tools and techniques for tackling transposons

.

Philos Trans R Soc Lond B Biol Sci

2020

;

375

:20190345.

Google Scholar

OpenURL Placeholder Text

WorldCat

71.

Lerat

E

,

Casacuberta

J

,

Chaparro

C

, et al.

On the importance to acknowledge transposable elements in epigenomic analyses

.

Genes (Basel)

2019

;

10

:

258

.

Google Scholar

OpenURL Placeholder Text

WorldCat

72.

Lanciano

S

,

Cristofari

G

.

Measuring and interpreting transposable element expression

.

Nat Rev Genet

2020

;

21

:

721

–

36

.

Google Scholar

OpenURL Placeholder Text

WorldCat

73.

Teissandier

A

,

Servant

N

,

Barillot

E

, et al.

Tools and best practices for retrotransposon analysis using high-throughput sequencing data

.

Mob DNA

2019

;

10

:

52

.

Google Scholar

OpenURL Placeholder Text

WorldCat

74.

Hadfield

J

,

Eldridge

MD

.

Multi-genome alignment for quality control and contamination screening of next-generation sequencing data

.

Front Genet

2014

;

5

:

31

.

Google Scholar

OpenURL Placeholder Text

WorldCat

75.

Strong

MJ

,

Xu

G

,

Morici

L

, et al.

Microbial contamination in next generation sequencing: implications for sequence-based analysis of clinical samples

.

PLoS Pathog

2014

;

10

:e1004437.

Google Scholar

OpenURL Placeholder Text

WorldCat

76.

Khan

A

,

Liu

Q

,

Chen

X

, et al.

VirTect: a computational method for detecting virus species from RNA-Seq and its application in head and neck squamous cell carcinoma

.

bioRxiv

2018

;272278.

Google Scholar

OpenURL Placeholder Text

WorldCat

77.

Yasumizu

Y

,

Hara

A

,

Sakaguchi

S

, et al.

VIRTUS: a pipeline for comprehensive virus analysis from conventional RNA-seq data

.

Bioinformatics

2021

;

37

:

1465

–

7

.

Google Scholar

OpenURL Placeholder Text

WorldCat

78.

Bhuvaneshwar

K

,

Song

L

,

Madhavan

S

, et al.

viGEN: an open source pipeline for the detection and quantification of viral RNA in human tumors

.

Front Microbiol

2018

;

9

:

1172

.

Google Scholar

OpenURL Placeholder Text

WorldCat

79.

Schirmer

M

,

Franzosa

EA

,

Lloyd-Price

J

, et al.

Dynamics of metatranscription in the inflammatory bowel disease gut microbiome

.

Nat Microbiol

2018

;

3

:

337

–

46

.

Google Scholar

OpenURL Placeholder Text

WorldCat

80.

Granata

I

,

Nardelli

C

,

D’Argenio

V

, et al.

Duodenal metatranscriptomics to define human and microbial functional alterations associated with severe obesity: a pilot study

.

Microorganisms

2020

;

8

:

1811

.

Google Scholar

OpenURL Placeholder Text

WorldCat

81.

Heravi

FS

,

Zakrzewski

M

,

Vickery

K

, et al.

Metatranscriptomic analysis reveals active bacterial communities in diabetic foot infections

.

Front Microbiol

2020

;

11

:

1688

.

Google Scholar

OpenURL Placeholder Text

WorldCat

82.

Youngblut, Nicholas D., Georg H. Reischer, William Walters, Nathalie Schuster, Chris Walzer, Gabrielle Stalder, Ruth E. Ley, and Andreas H. Farnleitner.

Host diet and evolutionary history explain different aspects of gut microbiome diversity among vertebrate clades

.

Nature communications

2019

;

10

:

1

–

15

.

Google Scholar

OpenURL Placeholder Text

WorldCat

83.

Peters

BA

,

Wilson

M

,

Moran

U

, et al.

Relating the gut metagenome and metatranscriptome to immunotherapy responses in melanoma patients

.

Genome Med

2019

;

11

:

61

.

Google Scholar

OpenURL Placeholder Text

WorldCat

84.

Campos

GS

,

Sardi

SI

,

Falcao

MB

, et al.

Ion torrent-based nasopharyngeal swab metatranscriptomics in COVID-19

.

J Virol Methods

2020

;

282

:113888.

Google Scholar

OpenURL Placeholder Text

WorldCat

85.

Kolodny

O

,

Callahan

BJ

,

Douglas

AE

.

The role of the microbiome in host evolution

.

Philos Trans R Soc Lond B Biol Sci

2020

;

375

:20190588.

Google Scholar

OpenURL Placeholder Text

WorldCat

86.

Groussin

M

,

Mazel

F

,

Alm

EJ

.

Co-evolution and co-speciation of host-gut bacteria systems

.

Cell Host Microbe

2020

;

28

:

12

–

22

.

Google Scholar

OpenURL Placeholder Text

WorldCat

87.

Shakya

M

,

Lo

CC

,

Chain

PSG

.

Advances and challenges in metatranscriptomic analysis

.

Front Genet

2019

;

10

:

904

.

Google Scholar

OpenURL Placeholder Text

WorldCat

88.

Franzosa

EA

,

McIver

LJ

,

Rahnavard

G

, et al.

Species-level functional profiling of metagenomes and metatranscriptomes

.

Nat Methods

2018

;

15

:

962

–

8

.

Google Scholar

OpenURL Placeholder Text

WorldCat

89.

Martinez

X

,

Pozuelo

M

,

Pascal

V

, et al.

MetaTrans: an open-source pipeline for metatranscriptomics

.

Sci Rep

2016

;

6

:

26447

.

Google Scholar

OpenURL Placeholder Text

WorldCat

90.

Westreich

ST

,

Treiber

ML

,

Mills

DA

, et al.

SAMSA2: a standalone metatranscriptome analysis pipeline

.

BMC Bioinformatics

2018

;

19

:

175

.

Google Scholar

OpenURL Placeholder Text

WorldCat

91.

Tamames

J

,

Puente-Sánchez

F

.

SqueezeMeta, a highly portable, fully automatic metagenomic analysis pipeline

.

Front Microbiol

2018

;

9

:

3349

.

Google Scholar

OpenURL Placeholder Text

WorldCat

92.

Narayanasamy

S

,

Jarosz

Y

,

Muller

EE

, et al.

IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses

.

Genome Biol

2016

;

17

:

260

.

Google Scholar

OpenURL Placeholder Text

WorldCat

93.

Sequeira

JC

,

Rocha

M

,

Alves

MM

, et al. MOSCA: an automated pipeline for integrated metagenomics and metatranscriptomics data analysis. In:

International Conference on Practical Applications of Computational Biology & Bioinformatics

. Cham:

Springer

,

2018

,

183

–

91

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

94.

Mehta

S

,

Crane

M

,

Leith

E

, et al.

ASaiM-MT: a validated and optimized ASaiM workflow for metatranscriptomics analysis within Galaxy framework

.

F1000Research

2021

;

10

:

103

.

Google Scholar

OpenURL Placeholder Text

WorldCat

95.

Yang

WR

,

Ardeljan

D

,

Pacyna

CN

, et al.

SQuIRE reveals locus-specific regulation of interspersed repeat expression

.

Nucleic Acids Res

2019

;

47

:

e27

–

7

.

Google Scholar

OpenURL Placeholder Text

WorldCat

96.

McKerrow

W

,

Fenyö

D

.

L1EM: a tool for accurate locus specific LINE-1 RNA quantification

.

Bioinformatics

2020

;

36

:

1167

–

73

.

Google Scholar

OpenURL Placeholder Text

WorldCat

97.

Criscione

SW

,

Zhang

Y

,

Thompson

W

, et al.

Transcriptional landscape of repetitive elements in normal and cancer human cells

.

BMC Genomics

2014

;

15

:

1

–

7

.

Google Scholar

OpenURL Placeholder Text

WorldCat

98.

Lerat

E

,

Fablet

M

,

Modolo

L

, et al.

TEtools facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes

.

Nucleic Acids Res

2017

;

45

:

e17

–

7

.

Google Scholar

OpenURL Placeholder Text

WorldCat

99.

Jin

Y

,

Tam

OH

,

Paniagua

E

, et al.

TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets

.

Bioinformatics

2015

;

31

:

3593

–

9

.

Google Scholar

OpenURL Placeholder Text

WorldCat

100.

Hackl

H

,

Charoentong

P

,

Finotello

F

, et al.

Computational genomics tools for dissecting tumour-immune cell interactions

.

Nat Rev Genet

2016

;

17

:

441

–

58

.

Google Scholar

OpenURL Placeholder Text

WorldCat

101.

Newman

AM

,

Steen

CB

,

Liu

CL

, et al.

Determining cell type abundance and expression from bulk tissues with digital cytometry

.

Nat Biotechnol

2019

;

37

:

773

–

82

.

Google Scholar

OpenURL Placeholder Text

WorldCat

102.

Stobbe

MD

,

Thun

GA

,

Diéguez-Docampo

A

, et al.

Recurrent somatic mutations reveal new insights into consequences of mutagenic processes in cancer

.

PLoS Comput Biol

2019

;

15

:e1007496.

Google Scholar

OpenURL Placeholder Text

WorldCat

103.

Ramroop

JR

,

Gerber

MM

,

Toland

AE

.

Germline variants impact somatic events during tumorigenesis

.

Trends Genet

2019

;

35

:

515

–

26

.

Google Scholar

OpenURL Placeholder Text

WorldCat

104.

Coudray

A

,

Battenhouse

AM

,

Bucher

P

, et al.

Detection and benchmarking of somatic mutations in cancer genomes using RNA-seq data

.

PeerJ

2018

;

6

:e5362.

Google Scholar

OpenURL Placeholder Text

WorldCat

105.

Jin

J

,

Wu

X

,

Yin

J

, et al.

Identification of genetic mutations in cancer: challenge and opportunity in the new era of targeted therapy

.

Front Oncol

2019

;

9

:

263

.

Google Scholar

OpenURL Placeholder Text

WorldCat

106.

Bosio

M

,

Valencia

A

,

Capella-Gutierrez

S

.

SmartRNASeqCaller: improving germline variant calling from RNAseq

.

bioRxiv

2019

;684993.

Google Scholar

OpenURL Placeholder Text

WorldCat

107.

Wang

K

,

Li

M

,

Hakonarson

H

.

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

.

Nucleic Acids Res

2010

;

38

:

e164

.

Google Scholar

OpenURL Placeholder Text

WorldCat

108.

McLaren

W

,

Gil

L

,

Hunt

SE

, et al.

The Ensembl variant effect predictor

.

Genome Biol

2016

;

17

:

122

.

Google Scholar

OpenURL Placeholder Text

WorldCat

109.

Pagel, K.A., Kim, R., Moad, K., Busby, B., Zheng, L., Tokheim, C., Ryan, M. and Karchin, R.

Integrated informatics analysis of cancer-related variants

.

JCO clinical cancer informatics

2020

;

4

:

310

–

317

.

Google Scholar

OpenURL Placeholder Text

WorldCat

110.

Radenbaugh

AJ

,

Ma

S

,

Ewing

A

, et al.

RADIA: RNA and DNA integrated analysis for somatic mutation detection

.

PLoS One

2014

;

9

:e111516.

Google Scholar

OpenURL Placeholder Text

WorldCat

111.

Lawrence

MS

,

Stojanov

P

,

Polak

P

, et al.

Mutational heterogeneity in cancer and the search for new cancer-associated genes

.

Nature

2013

;

499

:

214

–

8

.

Google Scholar

OpenURL Placeholder Text

WorldCat

112.

Adetunji

MO

,

Lamont

SJ

,

Abasht

B

, et al.

Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data

.

PLoS One

2019

;

14

:e0216838.

Google Scholar

OpenURL Placeholder Text

WorldCat

113.

Mularoni

L

,

Sabarinathan

R

,

Deu-Pons

J

, et al.

OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations

.

Genome Biol

2016

;

17

:

128

.

Google Scholar

OpenURL Placeholder Text

WorldCat

114.

Wainberg

M

,

Sinnott-Armstrong

N

,

Mancuso

N

, et al.

Opportunities and challenges for transcriptome-wide association studies

.

Nat Genet

2019

;

51

:

592

–

9

.

Google Scholar

OpenURL Placeholder Text

WorldCat

115.

Zhu

H

,

Zhou

X

.

Transcriptome-wide association studies: a view from Mendelian randomization

.

Quant Biol

2020

;

17

:

1

–

5

.

10.1007/s40484-020-0207-4

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

116.

Veturi

Y

,

Ritchie

MD

.

How powerful are summary-based methods for identifying expression-trait associations under different genetic architectures?

Pac Symp Biocomput

2018

;

23

:

228

–

39

.

Google Scholar

OpenURL Placeholder Text

WorldCat

117.

Gamazon

ER

,

Wheeler

HE

,

Shah

KP

, et al.

A gene-based association method for mapping traits using reference transcriptome data

.

Nat Genet

2015

;

47

:

1091

–

8

.

Google Scholar

OpenURL Placeholder Text

WorldCat

118.

Qi

J

,

Asl

HF

,

Björkegren

J

, et al.

kruX: matrix-based non-parametric eQTL discovery

.

BMC Bioinformatics

2014

;

15

:

1

–

7

.

Google Scholar

OpenURL Placeholder Text

WorldCat

119.

Shabalin

AA

.

Matrix eQTL: ultra fast eQTL analysis via large matrix operations

.

Bioinformatics

2012

;

28

:

1353

–

8

.

Google Scholar

OpenURL Placeholder Text

WorldCat

120.

Yang

T-P

,

Beazley

C

,

Montgomery

SB

, et al.

Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies

.

Bioinformatics

2010

;

26

:

2474

–

6

.

Google Scholar

OpenURL Placeholder Text

WorldCat

121.

Gao

C

,

Tignor

NL

,

Salit

J

, et al.

HEFT: eQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors

.

Bioinformatics

2014

;

30

:

369

–

76

.

Google Scholar

OpenURL Placeholder Text

WorldCat

122.

Marchini

J

,

Howie

B

,

Myers

S

, et al.

A new multipoint method for genome-wide association studies by imputation of genotypes

.

Nat Genet

2007

;

39

:

906

–

13

.

Google Scholar

OpenURL Placeholder Text

WorldCat

123.

Broman

KW

,

Wu

H

,

Sen

Ś

, et al.

R/QTL: QTL mapping in experimental crosses

.

Bioinformatics

2003

;

19

:

889

–

90

.

Google Scholar

OpenURL Placeholder Text

WorldCat

124.

He

X

,

Fuller

CK

,

Song

Y

, et al.

Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS

.

Am J Hum Genet

2013

;

92

:

667

–

80

.

Google Scholar

OpenURL Placeholder Text

WorldCat

125.

Flutre

T

,

Wen

X

,

Pritchard

J

, et al.

A statistical framework for joint eQTL analysis in multiple tissues

.

PLoS Genet

2013

;

9

:e1003486.

Google Scholar

OpenURL Placeholder Text

WorldCat

126.

Imprialou

M

,

Petretto

E

,

Bottolo

L

.

Expression QTLs mapping and analysis: a Bayesian perspective

.

Systems Genetics

2017

;

1488

:

189

–

215

.

Google Scholar

OpenURL Placeholder Text

WorldCat

127.

Gusev

A

,

Ko

A

,

Shi

H

, et al.

Integrative approaches for large-scale transcriptome-wide association studies

.

Nat Genet

2016

;

48

:

245

–

52

.

Google Scholar

OpenURL Placeholder Text

WorldCat

128.

Zeng

P

,

Zhou

X

.

Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models

.

Nat Commun

2017

;

8

:

456

.

Google Scholar

OpenURL Placeholder Text

WorldCat

129.

Zhu

Z

,

Zhang

F

,

Hu

H

, et al.

Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets

.

Nat Genet

2016

;

48

:

481

–

7

.

Google Scholar

OpenURL Placeholder Text

WorldCat

130.

Zhu

Z

,

Zheng

Z

,

Zhang

F

, et al.

Causal associations between risk factors and common diseases inferred from GWAS summary data

.

Nat Commun

2018

;

9

:

224

.

Google Scholar

OpenURL Placeholder Text

WorldCat

131.

Wei

IH

,

Shi

Y

,

Jiang

H

, et al.

RNA-Seq accurately identifies cancer biomarker signatures to distinguish tissue of origin

.

Neoplasia

2014

;

16

:

918

–

27

.

Google Scholar

OpenURL Placeholder Text

WorldCat

132.

Zhang

Y

,

Parmigiani

G

,

Johnson

WE

.

ComBat-seq: batch effect adjustment for RNA-seq count data

.

NAR Genom Bioinform

2020

;

2

:lqaa078.

Google Scholar

OpenURL Placeholder Text

WorldCat

133.

Risso

D

,

Ngai

J

,

Speed

TP

, et al.

Normalization of RNA-seq data using factor analysis of control genes or samples

.

Nat Biotechnol

2014

;

32

:

896

–

902

.

Google Scholar

OpenURL Placeholder Text

WorldCat

134.

Love

MI

,

Huber

W

,

Anders

S

.

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

.

Genome Biol

2014

;

15

:

550

.

Google Scholar

OpenURL Placeholder Text

WorldCat

135.

Robinson

MD

,

McCarthy

DJ

,

Smyth

GK

.

edgeR: a bioconductor package for differential expression analysis of digital gene expression data

.

Bioinformatics

2010

;

26

:

139

–

40

.

Google Scholar

OpenURL Placeholder Text

WorldCat

136.

Leng

N

,

Dawson

JA

,

Thomson

JA

, et al.

EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments

.

Bioinformatics

2013

;

29

:

1035

–

43

.

Google Scholar

OpenURL Placeholder Text

WorldCat

137.

Frazee AC, Pertea G, Jaffe AE, Langmead B, Salzberg SL, Leek JT.

Ballgown bridges the gap between transcriptome assembly and expression analysis

.

Nature biotechnology

2015

;

33

:

243

–

6

.

Google Scholar

OpenURL Placeholder Text

WorldCat

138.

Tarazona

S

,

García

F

,

Ferrer

A

, et al.

NOIseq: a RNA-seq differential expression method robust for sequencing depth biases

.

EMBnet Journal

2011

;

17

:

18

–

9

.

Google Scholar

OpenURL Placeholder Text

WorldCat

139.

Spies

D

,

Renz

PF

,

Beyer

TA

, et al.

Comparative analysis of differential gene expression tools for RNA sequencing time course data

.

Brief Bioinform

2019

;

20

:

288

–

98

.

Google Scholar

OpenURL Placeholder Text

WorldCat

140.

Koch

CM

,

Chiu

SF

,

Akbarpour

M

, et al.

A Beginner’s guide to analysis of RNA sequencing data

.

Am J Respir Cell Mol Biol

2018

;

59

:

145

–

57

.

Google Scholar

OpenURL Placeholder Text

WorldCat

141.

Wittkopp

PJ

,

Haerum

BK

,

Clark

AG

.

Evolutionary changes in cis and trans gene regulation

.

Nature

2004

;

430

:

85

–

8

.

Google Scholar

OpenURL Placeholder Text

WorldCat

142.

Nagalakshmi

U

,

Wang

Z

,

Waern

K

, et al.

The transcriptional landscape of the yeast genome defined by RNA sequencing

.

Science

2008

;

320

:

1344

–

9

.

Google Scholar

OpenURL Placeholder Text

WorldCat

143.

Lister

R

,

O'Malley

RC

,

Tonti-Filippini

J

, et al.

Highly integrated single-base resolution maps of the epigenome in Arabidopsis

.

Cell

2008

;

133

:

523

–

36

.

Google Scholar

OpenURL Placeholder Text

WorldCat

144.

Degner

JF

,

Marioni

JC

,

Pai

AA

, et al.

Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data

.

Bioinformatics

2009

;

25

:

3207

–

12

.

Google Scholar

OpenURL Placeholder Text

WorldCat

145.

Lalonde

E

,

Ha

KC

,

Wang

Z

, et al.

RNA sequencing reveals the role of splicing polymorphisms in regulating human gene expression

.

Genome Res

2011

;

21

:

545

–

54

.

Google Scholar

OpenURL Placeholder Text

WorldCat

146.

Stevenson

KR

,

Coolon

JD

,

Wittkopp

PJ

.

Sources of bias in measures of allele-specific expression derived from RNA-sequence data aligned to a single reference genome

.

BMC Genomics

2013

;

14

:

536

.

Google Scholar

OpenURL Placeholder Text

WorldCat

147.

Castel

SE

,

Levy-Moonshine

A

,

Mohammadi

P

, et al.

Tools and best practices for data processing in allelic expression analysis

.

Genome Biol

2015

;

16

:

195

.

Google Scholar

OpenURL Placeholder Text

WorldCat

148.

Pickrell

JK

,

Marioni

JC

,

Pai

AA

, et al.

Understanding mechanisms underlying human gene expression variation with RNA sequencing

.

Nature

2010

;

464

:

768

–

72

.

Google Scholar

OpenURL Placeholder Text

WorldCat

149.

Coolon

JD

,

Stevenson

KR

,

McManus

CJ

, et al.

Genomic imprinting absent in Drosophila melanogaster adult females

.

Cell Rep

2012

;

2

:

69

–

75

.

Google Scholar

OpenURL Placeholder Text

WorldCat

150.

Edsgärd

D

,

Iglesias

MJ

,

Reilly

S-J

, et al.

GeneiASE: detection of condition-dependent and static allele-specific expression from RNA-seq data without haplotype information

.

Sci Rep

2016

;

6

:

21134

.

Google Scholar

OpenURL Placeholder Text

WorldCat

151.

Miao

Z

,

Alvarez

M

,

Pajukanta

P

, et al.

ASElux: an ultra-fast and accurate allelic reads counter

.

Bioinformatics

2018

;

34

:

1313

–

20

.

Google Scholar

OpenURL Placeholder Text

WorldCat

152.

Fan

J

,

Hu

J

,

Xue

C

, et al.

ASEP: gene-based detection of allele-specific expression across individuals in a population by RNA sequencing

.

PLoS Genet

2020

;

16

:e1008786.

Google Scholar

OpenURL Placeholder Text

WorldCat

153.

Edsgärd

D

,

Iglesias

MJ

,

Reilly

S-J

, et al.

GeneiASE: detection of condition-dependent and static allele-specific expression from RNA-seq data without haplotype information

.

Sci Rep

2016

;

6

:

1

–

13

.

Google Scholar

OpenURL Placeholder Text

WorldCat

154.

Harvey

CT

,

Moyerbrailean

GA

,

Davis

GO

, et al.

QuASAR: quantitative allele-specific analysis of reads

.

Bioinformatics

2015

;

31

:

1235

–

42

.

Google Scholar

OpenURL Placeholder Text

WorldCat

155.

Mayba

O

,

Gilbert

HN

,

Liu

J

, et al.

MBASED: allele-specific expression detection in cancer tissues and cell lines

.

Genome Biol

2014

;

15

:

1

–

21

.

Google Scholar

OpenURL Placeholder Text

WorldCat

156.

Costa-Silva

J

,

Domingues

D

,

Lopes

FM

.

RNA-Seq differential expression analysis: an extended review and a software tool

.

PLoS One

2017

;

12

:e0190152.

Google Scholar

OpenURL Placeholder Text

WorldCat

157.

Pan

Q

,

Shai

O

,

Lee

LJ

, et al.

Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing

.

Nat Genet

2008

;

40

:

1413

–

5

.

Google Scholar

OpenURL Placeholder Text

WorldCat

158.

van den

Hoogenhof

MM

,

Pinto

YM

,

Creemers

EE

.

RNA splicing: regulation and dysregulation in the heart

.

Circ Res

2016

;

118

:

454

–

68

.

Google Scholar

OpenURL Placeholder Text

WorldCat

159.

Wang

Y

,

Liu

J

,

Huang

BO

, et al.

Mechanism of alternative splicing and its regulation

.

Biomed Rep

2015

;

3

:

152

–

8

.

Google Scholar

OpenURL Placeholder Text

WorldCat

160.

Cheng

Z

,

Menees

TM

.

RNA splicing and debranching viewed through analysis of RNA lariats

.

Mol Genet Genomics

2011

;

286

:

395

–

410

.

Google Scholar

OpenURL Placeholder Text

WorldCat

161.

Cao

M

,

Zhou

W

,

Breidt

FJ

, et al.

Large scale maximum average power multiple inference on time-course count data with application to RNA-seq analysis

.

Biometrics

2020

;

76

:

9

–

22

.

Google Scholar

OpenURL Placeholder Text

WorldCat

162.

Qiu

Z

,

Chen

S

,

Qi

Y

, et al.

Exploring transcriptional switches from pairwise, temporal and population RNA-Seq data using deepTS

.

Brief Bioinform

2021

;

22

:bbaa137.

Google Scholar

OpenURL Placeholder Text

WorldCat

163.

Oh

S

,

Li

C

,

Baldwin

RL

, et al.

Temporal dynamics in meta longitudinal RNA-Seq data

.

Sci Rep

2019

;

9

:

1

–

1

.

Google Scholar

OpenURL Placeholder Text

WorldCat

164.

Fischer

DS

,

Theis

FJ

,

Yosef

N

.

Impulse model-based differential expression analysis of time course sequencing data

.

Nucleic Acids Res

201846

:

e119–e119

.

OpenURL Placeholder Text

WorldCat

165.

Bacher

R

,

Leng

N

,

Chu

LF

, et al.

Trendy: segmented regression analysis of expression dynamics in high-throughput ordered profiling experiments

.

BMC bioinformatics

2018

;

19

:

1

–

0

.

Google Scholar

OpenURL Placeholder Text

WorldCat

166.

Guo

W

,

Calixto

CPG

,

Brown

JWS

, et al.

TSIS: an R package to infer alternative splicing isoform switches for time-series data

.

Bioinformatics

2017

;

33

:

3308

–

10

.

Google Scholar

OpenURL Placeholder Text

WorldCat

167.

Sun

X

,

Dalpiaz

D

,

Wu

D

, et al.

Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model

.

BMC Bioinformatics

2016

;

17

:

1

–

3

.

Google Scholar

OpenURL Placeholder Text

WorldCat

168.

Leng

N

,

Li

Y

,

McIntosh

BE

, et al.

EBSeq-HMM: a Bayesian approach for identifying gene-expression changes in ordered RNA-seq experiments

.

Bioinformatics

2015

;

31

:

2614

–

22

.

Google Scholar

OpenURL Placeholder Text

WorldCat

169.

Heinonen

M

,

Guipaud

O

,

Milliat

F

, et al.

Detecting time periods of differential gene expression using Gaussian processes: an application to endothelial cells exposed to radiotherapy dose fraction

.

Bioinformatics

2015

;

31

:

728

–

35

.

Google Scholar

OpenURL Placeholder Text

WorldCat

170.

Nueda

MJ

,

Tarazona

S

,

Conesa

A

.

Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series

.

Bioinformatics

2014

;

30

:

2598

–

2602

.

Google Scholar

OpenURL Placeholder Text

WorldCat

171.

Tilgner

H

,

Knowles

DG

,

Johnson

R

, et al.

Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs

.

Genome Res

2012

;

22

:

1616

–

25

.

Google Scholar

OpenURL Placeholder Text

WorldCat

172.

Kim

E

,

Goren

A

,

Ast

G

.

Alternative splicing and disease

.

RNA Biol

2008

;

5

:

17

–

9

.

Google Scholar

OpenURL Placeholder Text

WorldCat

173.

Vitting-Seerup

K

,

Porse

BT

,

Sandelin

A

, et al.

spliceR: an R package for classification of alternative splicing and prediction of coding potential from RNA-seq data

.

BMC Bioinformatics

2014

;

15

:

81

.

Google Scholar

OpenURL Placeholder Text

WorldCat

174.

Ryan

MC

,

Cleland

J

,

Kim

R

, et al.

SpliceSeq: a resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts

.

Bioinformatics

2012

;

28

:

2385

–

7

.

Google Scholar

OpenURL Placeholder Text

WorldCat

175.

Liu

Q

,

Chen

C

,

Shen

E

, et al.

Detection, annotation and visualization of alternative splicing from RNA-Seq data with SplicingViewer

.

Genomics

2012

;

99

:

178

–

82

.

Google Scholar

OpenURL Placeholder Text

WorldCat

176.

Foissac

S

,

Sammeth

M

.

Analysis of alternative splicing events in custom gene datasets by AStalavista

.

Methods Mol Biol

2015

;

1269

:

379

–

92

.

Google Scholar

OpenURL Placeholder Text

WorldCat

177.

Foissac

S

,

Sammeth

M

.

ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets

.

Nucleic Acids Res

2007

;

35

:

W297

–

9

.

Google Scholar

OpenURL Placeholder Text

WorldCat

178.

Wang

J

,

Ye

Z

,

Huang

TH

, et al.

A survey of computational methods in transcriptome-wide alternative splicing analysis

.

Biomol Concepts

2015

;

6

:

59

–

66

.

Google Scholar

OpenURL Placeholder Text

WorldCat

179.

Hu

Y

,

Huang

Y

,

Du

Y

, et al.

DiffSplice: the genome-wide detection of differential splicing events with RNA-seq

.

Nucleic Acids Res

2013

;

41

:

e39

.

Google Scholar

OpenURL Placeholder Text

WorldCat

180.

Wang

W

,

Qin

Z

,

Feng

Z

, et al.

Identifying differentially spliced genes from two groups of RNA-seq samples

.

Gene

2013

;

518

:

164

–

70

.

Google Scholar

OpenURL Placeholder Text

WorldCat

181.

Zhao

K

,

Lu

ZX

,

Park

JW

, et al.

GLiMMPS: robust statistical model for regulatory variation of alternative splicing using RNA-seq data

.

Genome Biol

2013

;

14

:

R74

.

Google Scholar

OpenURL Placeholder Text

WorldCat

182.

Srivastava

S

,

Chen

L

.

A two-parameter generalized Poisson model to improve the analysis of RNA-seq data

.

Nucleic Acids Res

2010

;

38

:

e170

.

Google Scholar

OpenURL Placeholder Text

WorldCat

183.

Shen

S

,

Park

JW

,

Huang

J

, et al.

MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data

.

Nucleic Acids Res

2012

;

40

:

e61

.

Google Scholar

OpenURL Placeholder Text

WorldCat

184.

Shen

S

,

Park

JW

,

Lu

ZX

, et al.

rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data

.

Proc Natl Acad Sci U S A

2014

;

111

:

E5593

–

601

.

Google Scholar

OpenURL Placeholder Text

WorldCat

185.

Katz

Y

,

Wang

ET

,

Airoldi

EM

, et al.

Analysis and design of RNA sequencing experiments for identifying isoform regulation

.

Nat Methods

2010

;

7

:

1009

–

15

.

Google Scholar

OpenURL Placeholder Text

WorldCat

186.

Brooks

AN

,

Yang

L

,

Duff

MO

, et al.

Conservation of an RNA regulatory map between Drosophila and mammals

.

Genome Res

2011

;

21

:

193

–

202

.

Google Scholar

OpenURL Placeholder Text

WorldCat

187.

Gatto

A

,

Torroja-Fungairino

C

,

Mazzarotto

F

, et al.

FineSplice, enhanced splice junction detection and quantification: a novel pipeline based on the assessment of diverse RNA-Seq alignment solutions

.

Nucleic Acids Res

2014

;

42

:

e71

.

Google Scholar

OpenURL Placeholder Text

WorldCat

188.

Wu

J

,

Akerman

M

,

Sun

S

, et al.

SpliceTrap: a method to quantify alternative splicing under single cellular conditions

.

Bioinformatics

2011

;

27

:

3010

–

6

.

Google Scholar

OpenURL Placeholder Text

WorldCat

189.

Estefania

M

,

Andres

R

,

Javier

I

, et al.

ASpli: integrative analysis of splicing landscapes through RNA-Seq assays

.

Bioinformatics

2021

.

Google Scholar

OpenURL Placeholder Text

WorldCat

190.

Merino GA, Conesa A, Fernández EA.

A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies

.

Briefings in bioinformatics

2019

;

20

:

471

–

81

.

Google Scholar

OpenURL Placeholder Text

WorldCat

191.

Thakur

PK

,

Rawal

HC

,

Obuca

M

, et al.

Bioinformatics approaches for studying alternative splicing

.

Encyclopedia Bioinform Comput Biol

2019

;

2

:

221

–

34

.

Google Scholar

OpenURL Placeholder Text

WorldCat

192.

Mehmood

A

,

Laiho

A

,

Venalainen

MS

, et al.

Systematic evaluation of differential splicing tools for RNA-seq studies

.

Brief Bioinform

2020

;

21

:

2052

–

65

.

Google Scholar

OpenURL Placeholder Text

WorldCat

193.

Li

T

,

Fu

J

,

Zeng

Z

, et al.

TIMER2.0 for analysis of tumor-infiltrating immune cells

.

Nucleic Acids Res

2020

;

48

:

W509

–

14

.

Google Scholar

OpenURL Placeholder Text

WorldCat

194.

Finotello

F

,

Mayer

C

,

Plattner

C

et al.

Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data

,

Genome Med

2019

;

11

:1–

20

Google Scholar

OpenURL Placeholder Text

WorldCat

195.

Zaitsev

K

,

Bambouskova

M

,

Swain

A

, et al.

Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures

.

Nat Commun

2019

;

10

:

1

–

6

.

Google Scholar

OpenURL Placeholder Text

WorldCat

196.

Du

R

,

Carey

V

,

Weiss

ST

.

deconvSeq: deconvolution of cell mixture distribution in sequencing data

.

Bioinformatics

2019

;

35

:

5095

–

102

.

Google Scholar

OpenURL Placeholder Text

WorldCat

197.

Kang

K

,

Meng

Q

,

Shats

I

, et al.

CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data

.

PLoS Comput Biol

2019

;

15

:e1007510.

Google Scholar

OpenURL Placeholder Text

WorldCat

198.

Hunt

GJ

,

Freytag

S

,

Bahlo

M

, et al.

dtangle: accurate and robust cell type deconvolution

.

Bioinformatics

2019

;

35

:

2093

–

9

.

Google Scholar

OpenURL Placeholder Text

WorldCat

199.

Nadel

BB

,

Lopez

D

,

Montoya

DJ

, et al.

The Gene Expression Deconvolution Interactive Tool (GEDIT): accurate cell type quantification from gene expression data

.

GigaScience

2021

;

10

:giab002.

Google Scholar

OpenURL Placeholder Text

WorldCat

200.

Racle

J

,

de

Jonge

K

,

Baumgaertner

P

, et al.

Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data

.

Elife

2017

;

6

:e26476.

Google Scholar

OpenURL Placeholder Text

WorldCat

201.

Roman

T

,

Xie

L

,

Schwartz

R

.

Automated deconvolution of structured mixtures from heterogeneous tumor genomic data

.

PLoS Comput Biol

2017

;

13

:e1005815.

Google Scholar

OpenURL Placeholder Text

WorldCat

202.

Zaslavsky

ME

,

Novik

J

,

Chang

E

, et al.

Infino: a Bayesian hierarchical model improves estimates of immune infiltration into tumor microenvironment

bioRxiv

.

2017

;

221671

.

203.

Becht

E

,

Giraldo

NA

,

Lacroix

L

, et al.

Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression

.

Genome Biol

2016

;

17

:

1

–

20

.

Google Scholar

OpenURL Placeholder Text

WorldCat

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
July 2021	41
August 2021	247
September 2021	81
October 2021	102
November 2021	169
December 2021	211
January 2022	184
February 2022	135
March 2022	192
April 2022	133
May 2022	183
June 2022	154
July 2022	116
August 2022	157
September 2022	166
October 2022	169
November 2022	152
December 2022	117
January 2023	82
February 2023	152
March 2023	162
April 2023	151
May 2023	143
June 2023	100
July 2023	100
August 2023	130
September 2023	125
October 2023	175
November 2023	411
December 2023	418
January 2024	482
February 2024	494
March 2024	597
April 2024	344

Article Contents

Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology

Abstract

Introduction

Transcriptome assembly

Evolving applications of bulk RNA-Seq

Copy Number Alteration from bulk RNA-Seq

Indel detection from RNA-Seq

Gene fusion detection

Neoantigen prediction from RNA-Seq data

Measuring the expression of TEs using RNA-Seq data

Detection of microbial contaminants

Metatranscriptomics analysis

Determining cell type abundance (cell type deconvolution) and expression from bulk tissues

Variants calling from RNA-Seq

Expression quantitative trait loci and transcriptome-wide association study analysis

mRNA-related analyses and application

Differential expression analysis

Large-scale time series RNA-Seq

Allele-specific expression

Differential isoform expression and differential splicing

Identification and analysis of alternative splicing events

Prospective

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology

Abstract

Introduction

Transcriptome assembly

Evolving applications of bulk RNA-Seq

Copy Number Alteration from bulk RNA-Seq

Indel detection from RNA-Seq

Gene fusion detection

Neoantigen prediction from RNA-Seq data

Measuring the expression of TEs using RNA-Seq data

Detection of microbial contaminants

Metatranscriptomics analysis

Determining cell type abundance (cell type deconvolution) and expression from bulk tissues

Variants calling from RNA-Seq

Expression quantitative trait loci and transcriptome-wide association study analysis

mRNA-related analyses and application

Differential expression analysis

Large-scale time series RNA-Seq

Allele-specific expression

Differential isoform expression and differential splicing

Identification and analysis of alternative splicing events

Prospective

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only