Abstract

Significant innovations in next-generation sequencing techniques and bioinformatics tools have impacted our appreciation and understanding of RNA. Practical RNA sequencing (RNA-Seq) applications have evolved in conjunction with sequence technology and bioinformatic tools advances. In most projects, bulk RNA-Seq data is used to measure gene expression patterns, isoform expression, alternative splicing and single-nucleotide polymorphisms. However, RNA-Seq holds far more hidden biological information including details of copy number alteration, microbial contamination, transposable elements, cell type (deconvolution) and the presence of neoantigens. Recent novel and advanced bioinformatic algorithms developed the capacity to retrieve this information from bulk RNA-Seq data, thus broadening its scope. The focus of this review is to comprehend the emerging bulk RNA-Seq-based analyses, emphasizing less familiar and underused applications. In doing so, we highlight the power of bulk RNA-Seq in providing biological insights.

Introduction

Transcriptomics provides deep insights into both coding and non-coding regions functional properties, helping understand various cellular conditions or disease phenotypes. Using these analyses, one can study differential gene expression across tissues or patient cohorts, perform isoform analysis and investigate splicing events across genes. To date, RNA sequencing (RNA-Seq) has been used in many fields including healthcare [1, 2], agriculture [3, 4], microbial diversity [5] and evolutionary biology [6], proving its usefulness in various applications [7]. As these fields have expanded, different biological challenges have pushed the development of novel computational tools that have evolved and diversified according to technological advancements, leading to many novel applications using bulk RNA-Seq. Bulk RNA-Seq measures the transcriptomics of pooled cell populations, tissue sections or biopsies.

Given the bespoke nature of many of the newer bulk RNA-Seq applications, widespread use can be impacted by lack of familiarity or propagation between research groups. For example, for copy number alteration (CNA) and indel detection, DNA analysis (as targeted, exome or genome sequencing) is extensively used in cancer research. Newly developed bioinformatic tools and integrated interpretational methods for RNA-Seq have the potential to perform and explain those analyses with better accuracy than before, in scenarios where whole genome sequencing (WGS) is not available because of limited resources. Other RNA-Seq applications with expanding utility include detection of microbial contamination in experiments [8], the understanding of transposable elements (TEs) [9, 10] or applications related to specific biological interests, e.g. isoform-guided prediction of epitopes [11] and estimation of immune cell fraction [12] from RNA-Seq data. Also, significant progress has been made to develop tools for the prediction of tumour-specific mutant peptides (cancer neoantigens), utilizing single-nucleotide variants (SNVs), indels, gene fusions or alternative splicing information. A transcriptome-directed diagnosis tool for Mendelian disease has helped overcome the limitations of conventional genomic testing [13]. Other studies recommend integrated approaches that improve confidence and yields in the mutations detected in low-purity tumours [14]. The growth in options for functional interpretation of non-messenger RNA data has also progressed [15], expanding the scope of total RNA-Seq for non-coding studies (Figure 2). In this review, we will discuss the range of evolving bulk RNA-Seq applications enabled by the latest tools and pipelines (Figure 1).

Figure 1

Computational analyses based on bulk RNA-Seq. Core analyses such as transcriptome profiling, differential gene expression, and functional profiling of messenger RNA (mRNA) and microRNA (miRNA) are standard practices. However, Piwi and circular RNA-related studies are still growing from the last few years. Advanced analyses include copy number alteration, indel detections and microbial contamination investigation in RNA-Seq experiments. Other data integration and network analyses to study gene regulatory networks are quite helpful (refer to Supplementary Data).

Figure 2

Overview of transcriptomics products and biomolecules interaction. Figure shows a variety of transcriptomics products and the role of various biomolecules interactions in the regulation of gene functionality.

Transcriptome assembly

Typical RNA-Seq applications on common species (such as human or mouse) utilize reference-based methods to map RNA-Seq reads onto a pre-existing reference genome. STAR [16] and HISAT2 [17] are the popular assemblers used for reference-based assembly.

A high-quality genome is available for limited species; therefore, studying rare or new species with RNA-Seq is more challenging and requires the transcriptome of these organisms to be reconstructed through the assembly process [18, 19].

Two different strategies are used for the transcriptome assembly of new species, genome-guided (when the reference genome exists) and de novo assembly. Genome-guided assemblers utilize pre-existing close-related genomes to guide the reconstruction of an alternative consensus sequence. Assemblers map RNA-Seq reads to the reference to orient sequences of new species along the chromosomes. These assemblies are somewhat biased because of their dependence on related reference, and it increases when the transcriptome of a novel organism is not close to the chosen reference [19]. De novo assembly methods, on the other hand, assemble sequences directly from the reads, which is less biased towards using the reference genome, but may be biased towards specific tissue type and conditions from which a sample of a novel species has been sequenced. However, the bias caused by the dependence on a related reference alone can be address by merging de novo and a related reference assembly in contigs [20].

Ungaro et al. [18] compares the reads mapping of the genome of various organisms, using both the transcriptome-guided assembly and de novo assembly; they choose to use a range of different reference genomes having a divergence of >15% from the species of interest. The guided assembly outperforms de novo approach alone, including when the divergence level is high. Holzer et al. [21] compared 10 de novo assembler on nine RNA-Seq datasets spanning different kingdoms of life. Evaluations of the tools is measured in a combination of 20 biological-based and reference-free metrics such as mapping rate, mismatches, reference coverage, contig F1, KC score, etc. Based on summarized metric of 20, the Trinity [22], SPAdes [23] and Trans-ABySS [24], followed by Bridger [25] and SOAPdenovo-Trans [26], generally outperformed the other tools. However, these performances vary among species. The selection of best assembly remains a challenging task, such metrics needs additional factors to be compared, such as tools abilities to reconstruct alternatively spliced transcripts/isoforms. Another example of computational difficulties in assembly is to distinguish the back-splice junctions that arise from an internal tandem duplication (generate duplicate exons within a gene) and the ones used for inferring circular RNA, because these kinds of reads are identical [27].

Evolving applications of bulk RNA-Seq

Copy Number Alteration from bulk RNA-Seq

CNA analyses are commonly used in cancer research. Amplification and deletion of the genes have signalling pathway implications and are a key determinant of molecular dysfunction in oncology. For example, PIK3CA amplification and deletion of the CDKN2A are associated with head and neck squamous cell carcinoma [28]. Many challenges lie in calling the CNA from RNA-Seq data, because of the differences in gene expression, which lead to the read depth variance of different magnitudes across genes. Because of these difficulties, most of these analyses are usually done using whole genome sequencing / whole exome sequencing (WGS/WES). However, recent advancements in bioinformatic tools and methods have assisted in the determination of CNA from bulk and single-cell RNA-Seq (scRNA-seq) data. Some of bulk RNA-Seq-based tools are CaSpER [29], CNAPE [30], CNVkit-RNA [31] and SuperFreq [32].

CNAPE is a machine learning approach that uses prior knowledge from The Cancer Genome Atlas (TCGA) data to predict the CNA in human cancer [33]. CaSpER first utilizes multiscale smoothing of expression signal by five-state Hidden Markov Model. Later smoothing of B allele frequency (BAF) signal is performed, followed by identification of the BAF shift threshold using the Gaussian mixture model. Finally, Copy Number Variants (CNV) correction is done by using the BAF shifts. CaSpER works on both bulk and scRNA-seq data. SuperFreq was initially developed for tracking somatic mutations in cancer. It was recently updated for calling absolute allele-sensitive CNA and can establish the relationship between CNA profile and mutation load in cancer. SuperFreq claims to accommodate small number of samples, paired with normal tumour comparisons. Compared with CNVkit-RNA, SuperFreq provides absolute, allele-sensitive and subclone aware CNA calls. There is no universal definition exists of what magnitude of amplification/deletion should be considered as amplified and deleted gene. For example, as per cosmic (https://cancer.sanger.ac.uk/cosmic/analyses), the gain/amplification of the gene is when the copy number is >5 (ploidy ≤2.7) or >9 (ploidy >2.7); however, the PURPLE [34] definition is 3*sample ploidy for gain/amplification. Furthermore, the significance of CNA of genes is defined based on defined amplification/deletion and its occurrence across samples. Moreover, because of a single copy of sex chromosomes in male, the user should be careful of the threshold for chromosome Y and chromosome X.

Indel detection from RNA-Seq

Small genetic variations, either insertions or deletions, measuring from 1 to 10 000 base pairs in length, are called indels. They can be coding and non-coding, and in coding genome, indels are either frameshift or non-frameshifts. Indels are known to have various functions, such as in cancer, they disable tumour suppression, activate oncogenes and trigger an increased abundance of neoantigens [34]. Indels are less common in the genome than SNVs [35]. It is challenging to identify indels from RNA-Seq because of RNA-Seq library and mapping artefacts (often related to the misalignment of spliced reads). Furthermore, complications arise when somatic indels should be distinguished from germline indels, e.g. in cancer studies, lack of normal/healthy tissue RNA-Seq data presents challenges for the detection of indels from tumour RNA-Seq. New tools such as RNAIndel [36] can handle this three-class classification problem using supervised learning and classify coding indels into somatic, germline and artefacts. Sun et al. [37] demonstrated that RNA-Seq alignment is the most important step for accurate indel identification. In a benchmarked study of indel detection, Slosarek et al. [38] suggested the use of additional filtering using Opossum [39]. Some recent studies recommend the integration of RNA- and DNA-Seq data, which improves the sensitivity for identifying mid-sized and large indels missed by DNA-Seq [40]. For the annotation of indel and CNA tools such as Anntool [41] are in common use.

Gene fusion detection

Gene fusions result from break points combining within previous disparate coding regions and lead to unpredictable and potentially deleterious effects. Such fusions can produce greater amounts of active abnormal protein than non-fusion genes, which could lead to tumour genesis [42]. In cancer, gene fusion has been described with the combination of oncogenes, e.g. recurrent gene fusion is reported between Transmembrane protease, serine 2 (TMPRSS2) and Erythroblast Transformation Specific (ETS) transcription factor genes in prostate cancer [43].

Various tools detect gene fusions from RNA-Seq data based on either read mapping or de novo fusion transcript assembly. After benchmarking 23 different gene fusion detection methods (that leverage RNA-Seq solely as input) on simulated and real RNA-Seq data, Haas et al. [44] concluded that Arriba [45, 46], STAR-Fusion [44] and STAR-SEQR [44] are the most accurate and fastest for fusion detection. New databases and bioinformatics resources can characterize and evaluate the significance of the gene fusion [47]. ChiTaRS 2.1 [48], FusionCancer [49], the TCGA Fusion Portal [50] and FARE-CAFE [51] are the latest gene fusion databases. The ChiTaRS 2.1 database is the largest fusion database that catalogues over 29 000 fusion transcripts, primarily from humans.

Neoantigen prediction from RNA-Seq data

Different genetic events such as Indel, gene fusion, SNV can generate mutant peptides called neoantigens. These neoantigens play a vital role in tumour-specific T cell-mediated antitumour immune response and have applications in immunotherapy [52]. Not only genetic events but alternative splicing also has proven to contribute to candidate neoantigens’ generation [53], but most of the tools use indels, SNVs or gene fusions events for neoantigen prediction. Few tools such as ASNEO [54] identify personalized alternative splicing-based neoantigens from RNA-Seq. INTEGRATE-neo [55] and ScanNeo [56] pipelines have been designed to predict neoepitopes derived from gene fusion and indels, respectively.

Measuring the expression of TEs using RNA-Seq data

TEs are sequences of DNA, which switches location within a genome. These TE elements constitute ~66–69% of the human [57, 58] and 30% fruit fly (Drosophila melanogaster) genomes [59]. They are integrated into a different genomic location either using DNA (‘transposons’) or through reverse transcription (‘retrotransposons’) [60]. There are three types of retrotransposons: long terminal repeats, long intersected elements and short intersected elements (SINEs) [61]. TEs can contribute to the formation of transcription factors [62] and abnormal expression of TE associated with cancer [63], and neurodegenerative diseases [64].

Next-generation sequencing (NGS) has advantages over in vitro techniques for studying retrotransposition, e.g. NGS can detect novel SINE amplification events during development, and this is not possible with in vitro methods. Despite the challenge posed by the repetitive nature of retroelements for developing bioinformatics analyses algorithms, reasonable progress has been made recently by the bioinformatics community. Moreover, in SINE elements, the short length of SINE elements contributes additional difficulty to the identification of the SINE instances that regulate cell functions.

MMR [65] and SalmonTE [66] tools were designed to investigate transposon expression using RNA-Seq explicitly. These packages perform quantification at the subfamily level. However, newer packages SINEsFIND [67] and ERVmap [68] address the need for locus-specific quantification of TE-derived transcripts. Novel fusion events that connect TE promoter sequences to downstream coding gene sequences can also be detected using the LIONS tool [69]. Table 1 describes more TE tools designed for RNA-Seq. For detailed algorithmic strategies, we recommended following TE tools-specific reviews [70–73].

Table 1

The table lists tools and pipelines for the measurements and analysis of the transposable elements from RNA-Seq data

PipelineYearApplications or functional use
GeneTEFlow [9]2020A Nextflow-based pipeline for analysing gene and transposable elements expression from RNA-Seq data
SQuIRE [95]2019Quantifying interspersed repeat expression (SQuIRE): RNA-Seq analysis pipeline that provides a quantitative and locus-specific picture of TE expression
L1EM [96]2020For accurate locus-specific LINE-1 RNA quantification
repEnrich [97]2014To study genome-wide transcriptional regulation of repetitive elements, the original study shows transcriptionally active long terminal repeat retrotransposon
TEtools [98]2017Facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes
TEtranscripts [99]2015For including transposable elements in differential expression analysis of RNA-Seq datasets
PipelineYearApplications or functional use
GeneTEFlow [9]2020A Nextflow-based pipeline for analysing gene and transposable elements expression from RNA-Seq data
SQuIRE [95]2019Quantifying interspersed repeat expression (SQuIRE): RNA-Seq analysis pipeline that provides a quantitative and locus-specific picture of TE expression
L1EM [96]2020For accurate locus-specific LINE-1 RNA quantification
repEnrich [97]2014To study genome-wide transcriptional regulation of repetitive elements, the original study shows transcriptionally active long terminal repeat retrotransposon
TEtools [98]2017Facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes
TEtranscripts [99]2015For including transposable elements in differential expression analysis of RNA-Seq datasets
Table 1

The table lists tools and pipelines for the measurements and analysis of the transposable elements from RNA-Seq data

PipelineYearApplications or functional use
GeneTEFlow [9]2020A Nextflow-based pipeline for analysing gene and transposable elements expression from RNA-Seq data
SQuIRE [95]2019Quantifying interspersed repeat expression (SQuIRE): RNA-Seq analysis pipeline that provides a quantitative and locus-specific picture of TE expression
L1EM [96]2020For accurate locus-specific LINE-1 RNA quantification
repEnrich [97]2014To study genome-wide transcriptional regulation of repetitive elements, the original study shows transcriptionally active long terminal repeat retrotransposon
TEtools [98]2017Facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes
TEtranscripts [99]2015For including transposable elements in differential expression analysis of RNA-Seq datasets
PipelineYearApplications or functional use
GeneTEFlow [9]2020A Nextflow-based pipeline for analysing gene and transposable elements expression from RNA-Seq data
SQuIRE [95]2019Quantifying interspersed repeat expression (SQuIRE): RNA-Seq analysis pipeline that provides a quantitative and locus-specific picture of TE expression
L1EM [96]2020For accurate locus-specific LINE-1 RNA quantification
repEnrich [97]2014To study genome-wide transcriptional regulation of repetitive elements, the original study shows transcriptionally active long terminal repeat retrotransposon
TEtools [98]2017Facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes
TEtranscripts [99]2015For including transposable elements in differential expression analysis of RNA-Seq datasets

Detection of microbial contaminants

Advanced tools can detect the contaminants’ presence in the experiment using the same RNA-Seq data, which is sequenced for testing some biological hypothesis. Generally, these tools extract the contaminants information from unmapped reads to the reference genome, i.e. reads that does not map to the reference genome. Many experimental standards have been developed to reduce laboratory contamination, despite this unexpected microbial contamination is reported in many studies. Recent studies [74, 75] suggest that researchers should consider measuring the functional impacts of contamination as this affects the quality of the research. Since cell manipulation for a range of experimental designs is essential in modern research and eukaryotic cells are highly likely exposed to microorganisms at the time of transition. Sometimes these contaminations contribute to a false experimental interpretation because contaminants may cause their host cells to have significant morphological and physiological changes. Furthermore, identifying infectious agents is vital in donated cells in medical and clinical facilities to prevent donor–patient disease transmission. These studies explore the effect of cross-contamination caused by exogenous sources, including laboratory reagents.

Some tools designed for this purpose require full FASTQ files as an input, whereas other new tools work directly on unmapped BAM files. A few tools are explicitly designed for virus detection. VirTect [76], VIRTUS [77] and ViGen [78] detect viruses using RNA-Seq. In contrast, the Decontaminer tool [8] and contamination screen are designed for bacteria, fungal and virus contamination detection. Decontaminer also provides an online visualization architect to explore the experimental contamination in detail.

Metatranscriptomics analysis

Metatranscriptome denotes the total RNA copies of the genes in a community of organisms. At a specific time of sampling, it is considered as a unique entity. Metatranscriptome varies with time and environmental changes. It can be used to obtain the whole expression profile in a community and demonstrate the dynamics of gene expression patterns over time. RNA-Seq is on the rise to conduct metatranscriptomics and microbiome dynamics studies. Metatranscriptomics analysis remarkably studied the alteration of gut microflora and its association with diseases like inflammatory bowel disease [79], obesity [80] and diabetes [81]. The gut microbiome plays roles in directing the host immune system development, post-embryonic development, host behaviour, detoxification [82]. Peters et al. [83] showed a relationship between gut microbiota and immunotherapy responses via metatranscriptomics in melanoma patients. Recently, metatranscriptomics employed as a surveillance tool for detecting the presence of the severe acute respiratory syndrome coronavirus 2 genome through the nasopharyngeal swabs of COVID-19 disease patients [84]. Studies have also explored the microbiome’s role in host evolution [85], which are predicted to be co-evolved [86].

Numerous bioinformatics pipelines are available for metatranscriptomics to investigate complex microorganism communities. These pipelines can be classified as assembly-free and assembly-based pipelines [87]. HUMAnN2 [88], MetaTrans [89], SAMSA2 [90] use assembly free strategy. SqueezeMeta [91], IMP [92] and MOSCA [93] are assembly-based tools. MetaTrans and SAMSA2 implemented for both taxonomic and differential gene expression analysis purposes, whereas HUMAnN2 does not carry out differential gene expression analysis. Among assembly-based tools, only MOSCA performs differential expression (DE) studies; the rest two, SqueezeMeta and IMP, are limited to taxonomic and functional analysis. Galaxy-based pipeline, ASaiM-MT [94], is designed for functional and taxonomy quantitation using HUMAnN2 and MetaPhlan2, respectively. The accurate quantification of organisms or functional activity within the complex metatranscriptome of samples remains an open challenge, because it is difficult to identify the optimal sampling frequency for different times, and the compositional nature of microbiome data frequently violates assumptions of statistical methods.

Determining cell type abundance (cell type deconvolution) and expression from bulk tissues

The method of estimating the proportion of each cell type is called cell type deconvolution, which is essentially performed with cell type-specific gene expression profiles for bulk transcriptomics samples [100]. Understanding the composition of cell types in disease-relevant tissues is an important step towards discovering cellular disease targets. Flow cytometry or single-cell RNA-Seq are most commonly used for cell type deconvolution. Currently, advanced bioinformatics tools can perform the same task using bulk RNA-Seq, which also estimates cell type abundances. For example, CIBERSORTx [101] tool can infer cell type-specific gene expression profiles without physical cell isolation using machine learning models, and its utility is evaluated in several types of tumours. In Table 4, we list the cell type deconvolution tools for bulk RNA-Seq and highlight the essential features for each. These tools can be used to define the cellular heterogeneity of complex tissues such as cancer to investigate the mechanism of disease.

Variants calling from RNA-Seq

SNVs offer insights to decipher the novel biological mechanisms for complex diseases and define genotype and phenotype associations. The genomic and genetic variation provides clues to the drivers of both malignancy and metastasis [101–103]. The variant identification can be statistically challenging due to low sequencing depth and lower variant allele frequency. Diseases such as cancer further increase these challenges because of tumour subpopulation, low tumour purity or a highly disturbed genome. Typically, whole-genome/exome sequencing is predominantly used for calling genomics variants rather than transcriptome (RNA-Seq). However, the relatively lower cost of RNA-Seq is gaining attention as an alternative to Whole Genome sequencing/ Whole exome sequencing (WGS/WES). Even though variant calling from RNA-Seq has drawbacks, it can help in the investigation of discrepancies at both the transcriptomic and genomic level due to post-transcriptional modifications. For example, RNA editing, nonsense-mediated decay, could be missed in genomic data. Conversely, RNA-based variant calling can miss (a) variants in the alleles, which are not expressed, (b) splicing variants, (c) variants in the low-expressed gene and (d) is limited to expressed region, etc. In conclusion, both methods have pros and cons. Wherever applicable, integration of both is the best solution for exploring the genotypic and phenotypic relationship; for example, Coudray et al. [104] shows in the absence of RNA-Seq data, WES/WGS likely to miss key driver mutations [105].

The important variant calling pipeline steps are mapping, marking the duplicates, base recalibration, variants calling, filtering of low-quality variants and variants annotations. Identification of genomic variants from RNA-Seq data is further complicated by intrinsic complexity such as splicing, low-expressed regions and managing duplicate reads, but algorithms are evolving day-by-day. Many variants calling pipelines have been developed for underlying models, filters, input data requirements and targeted applications. These pipelines generally use different variant callers for calling germline and somatic variants. Table 2 details variant calling pipelines dedicated to RNA-Seq data.

Table 2

RNA-Seq-based variant calling pipelines. Various pipelines commonly used for variant calling from RNA-Seq data are reported in the table with the information of the aligner and variant caller used in the pipeline

PipelineYearAligner or inputVariant caller
Opossum + Platypus [39]2017Opossum pre-processes BAM (TopHat2 or Star 2-pass)Platypus or GATK HaplotypeCaller
STAR 2-pass + MuTect2 [104]2018STAR 2-passMuTect2
VAP (Variant analysis Pipeline) [113]2019Star 2-PassGATK Haplotypecaller
PipelineYearAligner or inputVariant caller
Opossum + Platypus [39]2017Opossum pre-processes BAM (TopHat2 or Star 2-pass)Platypus or GATK HaplotypeCaller
STAR 2-pass + MuTect2 [104]2018STAR 2-passMuTect2
VAP (Variant analysis Pipeline) [113]2019Star 2-PassGATK Haplotypecaller
Table 2

RNA-Seq-based variant calling pipelines. Various pipelines commonly used for variant calling from RNA-Seq data are reported in the table with the information of the aligner and variant caller used in the pipeline

PipelineYearAligner or inputVariant caller
Opossum + Platypus [39]2017Opossum pre-processes BAM (TopHat2 or Star 2-pass)Platypus or GATK HaplotypeCaller
STAR 2-pass + MuTect2 [104]2018STAR 2-passMuTect2
VAP (Variant analysis Pipeline) [113]2019Star 2-PassGATK Haplotypecaller
PipelineYearAligner or inputVariant caller
Opossum + Platypus [39]2017Opossum pre-processes BAM (TopHat2 or Star 2-pass)Platypus or GATK HaplotypeCaller
STAR 2-pass + MuTect2 [104]2018STAR 2-passMuTect2
VAP (Variant analysis Pipeline) [113]2019Star 2-PassGATK Haplotypecaller

Earlier tools based on non-splice aware aligners contribute little to address false positives calls due to the transcriptome complexity. To overcome these limitations, modern pipelines such as VAP work with a splice-aware aligner (e.g. STAR [16]) and use transcripts annotation data to identify reliable SNPs in RNA-Seq. Further prioritization of RNA variants can be done with tools such as RVboost. RVboost uses a boosting technique to train a model of ‘high-quality’ variants by using common HapMap variants, and RNA variants are prioritized and called on the basis of the trained model. Another machine learning tool ‘SmartRNASeqCaller’ [106] has been developed to improve true germline variants and minimize the burden of false-positive calls from RNA-Seq. ANNOVAR [107] and Ensembl Variant Effect Predictor [108] are top-rated tools for the annotation of the variants called by variant callers. However, new platforms such as open-cravat [109] provide varieties of tools for annotating variants as per user needs. Radenbaugh et al. [110] reported that inclusion of the RNA-Seq with genomic data increases the power to detect somatic mutations.

Identifying the cancer malignancy driver genes is quite challenging, because a high mutation rate in cancer often introduces random mutations everywhere in the genome. Highly mutable genes are not simply driver’s genes, as it is often assumed that genome-wide mutation rate is constant, e.g. mutation rate for lowly expressed and late replication genes is usually higher. Moreover, higher gene length contributes towards a false-positive signal for a driver gene. Whole exome sequencing (WXS)/WES approaches-based tools like MutsigCV [111] and OncodriveFML [112] deal with such biases and help to predict the positive driver signals. As per our knowledge, currently, similar tools are not developed for RNA-Seq-based data.

Expression quantitative trait loci and transcriptome-wide association study analysis

The genome-wide association studies (GWAS) link the association between the gene variations (SNPs) and the complex phenotype/disease. However, GWAS approaches are unable to find the causal SNPs or in vitro confirmation to interpret biological functionality. GWAS also ignored the association between genetic variants, DNA functional elements and complicated traits and diseases. Expression quantitative trait loci (eQTL) studies can help identify the extent of influence that a variant can have on gene expression. eQTL are the genetic variants associated with the expression levels of one or more genes located near a variant (cis-eQTLs) or in a distant location from the variant (trans-eQTLs). A typical eQTL analysis is based on statistical hypothesis testing between SNP genotypes and gene expression levels are result in P-values that represent the strength of the association between the two measurements. Since trans-eQTL identification involves many more genes to be tested, the eQTL association to the distant genes remains challenging. In the same study subjects, the importance of gene expression regulation proved by eQTL studies by measuring both genetic variation and transcriptome data. However, these designs are not cost effective. Because of the complicated extraction procedure and other clinical reasons, for some trait, usually, a limited number of datasets with matched GWAS and transcriptome data are available. Limited sample size does not provide enough power for the analysis. When transcriptome data are not available in GWAS samples, transcriptome-wide association study (TWAS) offers novel insights into complex disease mechanisms [114]. TWAS emerged as a powerful approach that integrates common features of eQTL and GWAS studies (generally utilize different population-based data) to test the gene–trait associations and prioritize potential causal genes having the effects of a mediate variant on the traits. TWAS helped in discovering this relationship with improved statistical power and reduced multiple testing burden (i.e. number of genes) compared with testing traits with genetic variations (i.e. number of SNPs) [115]. TWAS is a robust framework to discover susceptible risk genes in complex human traits [116]. TWAS determines if GWAS-significant variants are also enriched for eQTLs by detecting co-localization of expression signals at known GWAS loci [117].

For eQTL analysis, tools use various strategies such as one-at-a-time test (Krux [118], MatrixeQTL [119], GeneVar [120]), multiple regression (HEFT [121]) and one-at-a-time-regression (SNPTEST [122], R/QTL [123]). When an intricate linkage disequilibrium structure is present, the one-at-a-time strategy cannot control the false positives. Also, this approach is often underpowered to detect the full spectrum of trans-acting regulatory effects. Other tools based on Bayesian statistics are Sherlock [124], eQTLBMA [125], etc. Multivariate eQTL mapping approaches, either least absolute shrinkage and selection operator (LASSO) or Bayesian Evolutionary Stochastic Search (ESS), are more suitable for detecting polygenic effects [126]. Furthermore, many TWAS integration and analysis methods have been published in the recent past, such as PrediXcan, which constructs a weighted SNP-set-based test in GWAS using SNP weights directed from the expression mapping studies [117]. Another method, TWAS, predicts the association between the complex phenotype/disease and the predicted quantitative gene expression built using the Bayesian sparse linear mixed model [127]. To incorporate the randomness and to deal with complexity in the polygenic architecture of expression data, a non-parametric latent Dirichlet process regression (DPR) model was incorporated for TWAS [128]. Similarly, TIGAR is also DPR model based, but provides a user-friendly interface. Other tools are developed on Mendelian randomization and perform tests to predict the causal relationship between gene expression and disease trait; these are summary data-based Mendelian randomization (SMR) [129] and Generalized SMR (GSMR) [130], respectively. In conclusion, TWAS can facilitate our understanding of the molecular and causal mechanisms underlying variant–trait associations. TWAS developed a few years ago, and since then, it helped into the GWAS-prioritized SNPs, which may have a crucial role in revealing the cis-SNPs regulating the upstream target gene. Moreover, eQTL analysis is not limited to the coding part, but it is also used for various non coding RNAs (ncRNAs), such as long non-coding RNA (IncR)-eQTLs, miRNA QTLs and circular RNA (circRNA)-eQTLs, etc. eQTL mapping can be performed based on tissue types as well. However, much work is still required to elucidate the effect of trans-SNPs on the neighbouring upstream target gene of interest and the effect of rare variants.

mRNA-related analyses and application

One of the primary motives of most RNA-Seq studies is DE analysis between two or multiple conditions or comparing different phenotypes.

Differential expression analysis

DE analysis of RNA-Seq compares the transcriptomic expression of different diseases’ phenotypes or conditions and measures unknown transcripts that other old techniques such as microarray were not capable of detecting. Because of its low cost, it became a popular choice in various research applications, including the discovery of potential biomarkers and therapeutic targets in cancer [131].

Regardless of the type of the study, to access transcriptomic differences using read counts, one has to first map reads to the reference genome and quantify them. The quantification of reads depends on various transcriptomics features that one wants to measure i.e. genes, exons or gene isoforms. After reads are quantified, few steps are recommended before performing downstream DE analysis, such as filtering lowly expressed genes, normalization and/or batch effect correction. Filtering is recommended to eliminate low abundant genes that are believed to contribute to false discovery rate. Normalization (although not always required) corrects counts for various technical factors, including sequencing depth, RNA composition or gene length. The most common normalization techniques are Fragments Per Kilobase of transcript per Million mapped reads (FPKM), Reads Per Kilobase of transcript per Million mapped reads (RPKM), Counts Per Million (CPM), Trimmed Mean of M-values (TMM) and median of ratios. Depending on the design of the experiment, RNA-Seq data may also suffer from various systematic biases that can affect downstream analysis results. To remove unwanted variation in RNA-Seq data, one can apply methods like ComBat [132] or RUV-Seq [133]. Different complex statistical models are used to test which features have significant differences in their counts between the conditions. These models are often based on various data distribution assumptions, including negative binomial (NB) in DESeq2 [134], edgeR [135], or EBSeq [136], log-normal distribution in Ballgown [137], or relaxing the assumption of any distribution (non-parametric) as in NOISeq [138]. In case of high variabilities of gene expression within a condition or phenotype, a higher number of biological replicates are preferred to get significant statistical power in detecting differentially expressed features. Depending on the biological question to answer, comparison between the conditions can be performed in various forms such as pairwise, multiconditional or time series. To date, several reviews and studies have been published for the guidelines and best practices of the RNA-Seq applications [139, 140].

Large-scale time series RNA-Seq

Typically, static RNA-Seq captures a snapshot of the transcriptome at the time of sample collection, whereas the time series RNA-Seq methods are applications of RNA-Seq, which measures RNA abundances over time in biological samples. Its applications include the exploration of cellular differentiation, developmental biology processes, disease progression, etc. The general aim is to identify specific gene expression pattern over time. However, statistical analysis of time course data is challenging because of complex experimental design, large dynamic changes of the transcriptome. Compared with static RNA-Seq methods, these methods are not well benchmarked and validated [139]. Recent reports inform that no comparative study compares these tools considering metrics of factors such as the impact of pre-processing steps, various time points, different numbers of replicates, multiple conditions and variable noise levels. Table 3 explain various tools based on time course analysis. Each tool has its pros and cons, and at the individual level, no pipeline can address the complete characterization of the dynamic biological process. Tools based on static RNA-Seq have a major assumption that observed samples are collected independently at a fixed time point; hence, not appropriate for time course RNA-Seq statistic.

Table 3

Large-scale time series RNA-Seq analysis tools

SoftwareYearApplications feature
MAPTest [161]2020Two series time course; longitudinal settings in repeatedly measured data; also enables classification between parallel differential expression or non-parallel differential expression
deepTS [162]2020Flexible and powerful web-based platform for identification and analysis of pairwise transcriptome data along with the visualization
AR (auto-regressive model) [163]2019Single series of longitudinal RNA-Seq time course data; balanced longitudinal data required with an even number of sample size; longer running time required as number of time point increases
ImpulseDE2 [164]2018One- or two-series time course data, longitudinally measured RNA-Seq and Chip-Seq data with more than six time points
Trendy [165]2018Single-series time course; longitudinally measured, use optimal segmented regression model; provides the location and direction of dynamic changes in expression
TSIS [166]2017Detects transcript isoform switches, graphic interface performed by Shiny app from R package
timeSeq [167]2016Two-series time course; method based on negative binomial mixed-effects model with major factors; distinguish non-parallel versus parallel temporal changes between conditions over time
EBSeq-HMM [168]2015Single-series time course; longitudinally measured, use auto-regressive Hidden Markov Model
Ngsp [169]2015One- or two-series time course; longitudinally measured, and advantageous for temporal dynamic patterns incorporated course experimental designs
Next maSigPro [170]2014One- or multi-series time course data; non-longitudinally measured, and use polynomial regression model for a different time variable. DEG is selected by the best-fitted model in a temporal manner
SoftwareYearApplications feature
MAPTest [161]2020Two series time course; longitudinal settings in repeatedly measured data; also enables classification between parallel differential expression or non-parallel differential expression
deepTS [162]2020Flexible and powerful web-based platform for identification and analysis of pairwise transcriptome data along with the visualization
AR (auto-regressive model) [163]2019Single series of longitudinal RNA-Seq time course data; balanced longitudinal data required with an even number of sample size; longer running time required as number of time point increases
ImpulseDE2 [164]2018One- or two-series time course data, longitudinally measured RNA-Seq and Chip-Seq data with more than six time points
Trendy [165]2018Single-series time course; longitudinally measured, use optimal segmented regression model; provides the location and direction of dynamic changes in expression
TSIS [166]2017Detects transcript isoform switches, graphic interface performed by Shiny app from R package
timeSeq [167]2016Two-series time course; method based on negative binomial mixed-effects model with major factors; distinguish non-parallel versus parallel temporal changes between conditions over time
EBSeq-HMM [168]2015Single-series time course; longitudinally measured, use auto-regressive Hidden Markov Model
Ngsp [169]2015One- or two-series time course; longitudinally measured, and advantageous for temporal dynamic patterns incorporated course experimental designs
Next maSigPro [170]2014One- or multi-series time course data; non-longitudinally measured, and use polynomial regression model for a different time variable. DEG is selected by the best-fitted model in a temporal manner
Table 3

Large-scale time series RNA-Seq analysis tools

SoftwareYearApplications feature
MAPTest [161]2020Two series time course; longitudinal settings in repeatedly measured data; also enables classification between parallel differential expression or non-parallel differential expression
deepTS [162]2020Flexible and powerful web-based platform for identification and analysis of pairwise transcriptome data along with the visualization
AR (auto-regressive model) [163]2019Single series of longitudinal RNA-Seq time course data; balanced longitudinal data required with an even number of sample size; longer running time required as number of time point increases
ImpulseDE2 [164]2018One- or two-series time course data, longitudinally measured RNA-Seq and Chip-Seq data with more than six time points
Trendy [165]2018Single-series time course; longitudinally measured, use optimal segmented regression model; provides the location and direction of dynamic changes in expression
TSIS [166]2017Detects transcript isoform switches, graphic interface performed by Shiny app from R package
timeSeq [167]2016Two-series time course; method based on negative binomial mixed-effects model with major factors; distinguish non-parallel versus parallel temporal changes between conditions over time
EBSeq-HMM [168]2015Single-series time course; longitudinally measured, use auto-regressive Hidden Markov Model
Ngsp [169]2015One- or two-series time course; longitudinally measured, and advantageous for temporal dynamic patterns incorporated course experimental designs
Next maSigPro [170]2014One- or multi-series time course data; non-longitudinally measured, and use polynomial regression model for a different time variable. DEG is selected by the best-fitted model in a temporal manner
SoftwareYearApplications feature
MAPTest [161]2020Two series time course; longitudinal settings in repeatedly measured data; also enables classification between parallel differential expression or non-parallel differential expression
deepTS [162]2020Flexible and powerful web-based platform for identification and analysis of pairwise transcriptome data along with the visualization
AR (auto-regressive model) [163]2019Single series of longitudinal RNA-Seq time course data; balanced longitudinal data required with an even number of sample size; longer running time required as number of time point increases
ImpulseDE2 [164]2018One- or two-series time course data, longitudinally measured RNA-Seq and Chip-Seq data with more than six time points
Trendy [165]2018Single-series time course; longitudinally measured, use optimal segmented regression model; provides the location and direction of dynamic changes in expression
TSIS [166]2017Detects transcript isoform switches, graphic interface performed by Shiny app from R package
timeSeq [167]2016Two-series time course; method based on negative binomial mixed-effects model with major factors; distinguish non-parallel versus parallel temporal changes between conditions over time
EBSeq-HMM [168]2015Single-series time course; longitudinally measured, use auto-regressive Hidden Markov Model
Ngsp [169]2015One- or two-series time course; longitudinally measured, and advantageous for temporal dynamic patterns incorporated course experimental designs
Next maSigPro [170]2014One- or multi-series time course data; non-longitudinally measured, and use polynomial regression model for a different time variable. DEG is selected by the best-fitted model in a temporal manner
Table 4

Tools and pipelines for cell type abundance measurement from bulk tissue

ToolYearFeaturesProgramming languageSoftware link
TIMER2.0 [193]2020Web server for comprehensive analysis of tumour-infiltrating immune cellsWeb-tool R, Javascripthttps://github.com/taiwenli/TIMER
CIBERSORTx [101]2019Impute gene expression profiles and provide an estimation of the abundances of member cell types in a mixed cell populationWeb-tool Java, Rhttps://cibersortx.stanford.edu/
quanTIseq [194]2019Quantify the fractions of 10 immune cell types from bulk RNA-sequencing dataR, Shell (Bash)https://icbi.imed.ac.at/quantiseq
Immunedeconv [12]2019Benchmarking of transcriptome-based cell type quantification methods for immune oncologyRhttps://github.com/icbilab/immunedeconv
Linseed [195]2019Deconvolution of cellular mixtures based on linearity of transcriptional signaturesC++, Rhttps://github.com/ctlab/LinSeed
deconvSEQ [196]2019Deconvolution of cell mixture distribution based on a generalized linear modelRhttps://github.com/rosedu1/deconvSeq
CDSeq [197]2019Simultaneously estimate both cell type proportions and cell type-specific expression profilesMATLAB, Rhttps://github.com/kkan g7/CDSeq_R_Package
dtangle [198]2019Estimates cell type proportions using publicly available, often cross-platform, and reference dataRhttps://github.com/gjhunt/dtangle
GEDIT [199]2019Estimate cell type abundancesWeb-based tool Python, Rhttp://webtools.mcdb.ucla.edu/
EPIC [200]2017Simultaneously estimates the fraction of cancer and immune cell typesRhttp://epic.gfellerlab.org/
WSCUnmix [201]2017Automated deconvolution of structured mixturesMATLABhttps://github.com/tedro man/WSCUnmix
Infino [202]2017Deconvolves bulk RNA-Seq into cell type abundances and captures gene expression variability in a Bayesian model to measure deconvolution uncertaintyR, Pythonhttps://github.com/ham merlab/infino
MCPcounter [203]2016Estimating the population abundance of tissue-infiltrating immune and stromal cell populationsRhttps://github.com/ebec ht/MCPcounter
ToolYearFeaturesProgramming languageSoftware link
TIMER2.0 [193]2020Web server for comprehensive analysis of tumour-infiltrating immune cellsWeb-tool R, Javascripthttps://github.com/taiwenli/TIMER
CIBERSORTx [101]2019Impute gene expression profiles and provide an estimation of the abundances of member cell types in a mixed cell populationWeb-tool Java, Rhttps://cibersortx.stanford.edu/
quanTIseq [194]2019Quantify the fractions of 10 immune cell types from bulk RNA-sequencing dataR, Shell (Bash)https://icbi.imed.ac.at/quantiseq
Immunedeconv [12]2019Benchmarking of transcriptome-based cell type quantification methods for immune oncologyRhttps://github.com/icbilab/immunedeconv
Linseed [195]2019Deconvolution of cellular mixtures based on linearity of transcriptional signaturesC++, Rhttps://github.com/ctlab/LinSeed
deconvSEQ [196]2019Deconvolution of cell mixture distribution based on a generalized linear modelRhttps://github.com/rosedu1/deconvSeq
CDSeq [197]2019Simultaneously estimate both cell type proportions and cell type-specific expression profilesMATLAB, Rhttps://github.com/kkan g7/CDSeq_R_Package
dtangle [198]2019Estimates cell type proportions using publicly available, often cross-platform, and reference dataRhttps://github.com/gjhunt/dtangle
GEDIT [199]2019Estimate cell type abundancesWeb-based tool Python, Rhttp://webtools.mcdb.ucla.edu/
EPIC [200]2017Simultaneously estimates the fraction of cancer and immune cell typesRhttp://epic.gfellerlab.org/
WSCUnmix [201]2017Automated deconvolution of structured mixturesMATLABhttps://github.com/tedro man/WSCUnmix
Infino [202]2017Deconvolves bulk RNA-Seq into cell type abundances and captures gene expression variability in a Bayesian model to measure deconvolution uncertaintyR, Pythonhttps://github.com/ham merlab/infino
MCPcounter [203]2016Estimating the population abundance of tissue-infiltrating immune and stromal cell populationsRhttps://github.com/ebec ht/MCPcounter
Table 4

Tools and pipelines for cell type abundance measurement from bulk tissue

ToolYearFeaturesProgramming languageSoftware link
TIMER2.0 [193]2020Web server for comprehensive analysis of tumour-infiltrating immune cellsWeb-tool R, Javascripthttps://github.com/taiwenli/TIMER
CIBERSORTx [101]2019Impute gene expression profiles and provide an estimation of the abundances of member cell types in a mixed cell populationWeb-tool Java, Rhttps://cibersortx.stanford.edu/
quanTIseq [194]2019Quantify the fractions of 10 immune cell types from bulk RNA-sequencing dataR, Shell (Bash)https://icbi.imed.ac.at/quantiseq
Immunedeconv [12]2019Benchmarking of transcriptome-based cell type quantification methods for immune oncologyRhttps://github.com/icbilab/immunedeconv
Linseed [195]2019Deconvolution of cellular mixtures based on linearity of transcriptional signaturesC++, Rhttps://github.com/ctlab/LinSeed
deconvSEQ [196]2019Deconvolution of cell mixture distribution based on a generalized linear modelRhttps://github.com/rosedu1/deconvSeq
CDSeq [197]2019Simultaneously estimate both cell type proportions and cell type-specific expression profilesMATLAB, Rhttps://github.com/kkan g7/CDSeq_R_Package
dtangle [198]2019Estimates cell type proportions using publicly available, often cross-platform, and reference dataRhttps://github.com/gjhunt/dtangle
GEDIT [199]2019Estimate cell type abundancesWeb-based tool Python, Rhttp://webtools.mcdb.ucla.edu/
EPIC [200]2017Simultaneously estimates the fraction of cancer and immune cell typesRhttp://epic.gfellerlab.org/
WSCUnmix [201]2017Automated deconvolution of structured mixturesMATLABhttps://github.com/tedro man/WSCUnmix
Infino [202]2017Deconvolves bulk RNA-Seq into cell type abundances and captures gene expression variability in a Bayesian model to measure deconvolution uncertaintyR, Pythonhttps://github.com/ham merlab/infino
MCPcounter [203]2016Estimating the population abundance of tissue-infiltrating immune and stromal cell populationsRhttps://github.com/ebec ht/MCPcounter
ToolYearFeaturesProgramming languageSoftware link
TIMER2.0 [193]2020Web server for comprehensive analysis of tumour-infiltrating immune cellsWeb-tool R, Javascripthttps://github.com/taiwenli/TIMER
CIBERSORTx [101]2019Impute gene expression profiles and provide an estimation of the abundances of member cell types in a mixed cell populationWeb-tool Java, Rhttps://cibersortx.stanford.edu/
quanTIseq [194]2019Quantify the fractions of 10 immune cell types from bulk RNA-sequencing dataR, Shell (Bash)https://icbi.imed.ac.at/quantiseq
Immunedeconv [12]2019Benchmarking of transcriptome-based cell type quantification methods for immune oncologyRhttps://github.com/icbilab/immunedeconv
Linseed [195]2019Deconvolution of cellular mixtures based on linearity of transcriptional signaturesC++, Rhttps://github.com/ctlab/LinSeed
deconvSEQ [196]2019Deconvolution of cell mixture distribution based on a generalized linear modelRhttps://github.com/rosedu1/deconvSeq
CDSeq [197]2019Simultaneously estimate both cell type proportions and cell type-specific expression profilesMATLAB, Rhttps://github.com/kkan g7/CDSeq_R_Package
dtangle [198]2019Estimates cell type proportions using publicly available, often cross-platform, and reference dataRhttps://github.com/gjhunt/dtangle
GEDIT [199]2019Estimate cell type abundancesWeb-based tool Python, Rhttp://webtools.mcdb.ucla.edu/
EPIC [200]2017Simultaneously estimates the fraction of cancer and immune cell typesRhttp://epic.gfellerlab.org/
WSCUnmix [201]2017Automated deconvolution of structured mixturesMATLABhttps://github.com/tedro man/WSCUnmix
Infino [202]2017Deconvolves bulk RNA-Seq into cell type abundances and captures gene expression variability in a Bayesian model to measure deconvolution uncertaintyR, Pythonhttps://github.com/ham merlab/infino
MCPcounter [203]2016Estimating the population abundance of tissue-infiltrating immune and stromal cell populationsRhttps://github.com/ebec ht/MCPcounter

Allele-specific expression

The diploid genome with two paternal alleles of a gene has a relative difference in the allelic transcription due to various factors, viz differential rates of transcription, relative differences in the mRNA stability and multiple cis- trans-regulatory elements in action to control the allelic transcription abundance. These allelic expression differences can range from a slight quantitative change to many folds up/down in magnitude [141–143]. Thus, accurate estimation of the differential abundance of a gene’s allelic copies is referred to as allele-specific expression (ASE). It provides a substantial understanding of alleles’ relative expression in a diseased state termed as eQTLs [144, 145]. RNA-Seq can measure the ASE of the gene coming from parental alleles. Usually, the heterozygous SNP information in the transcribed genomic region helps differentiate the RNA-Seq reads’ origin. The paternal and maternal allele gene expression ratio is used to measure the ASE differences, which may lead to some phenotypic variations. RNA-Seq analysis platforms have grown robust and sensitive enough to provide quantitative information on differential ASE for genes [142, 143].

The standard reference-based mapping approaches suffer from the biased estimation in favour of the most abundant allelic counterpart matching with the reference genome or transcriptome [144, 146]—many attempts to circumvent this bias while scoring the alignment [145, 147, 148]. Among many approaches, one of the chief steps involved was hashing method, which refers to assigning reads to the alleles with the already known SNP [149]. WASP tools are based on this approach [150] and speculated that approach helped in navigating the fact that SNPs matching to non-reference transcripts usually match to multiple genomic locus. Thus, allele assignment of these reads is discarded in WASP. Another tool, ASElux, assigns SNPs, quantifies allelic reads directly from unmapped RNA-Seq datasets [151]. Some tools developed using statistical approaches to deal with the individual-specific random effects to assign SNPs using multi-SNP correlations for a single transcript. ASE analysis in a population tool intakes RNA-Seq datasets and infers multi-SNPs correlations for a single gene-level ASE between two conditions [152]. The other tools in this related domain using statistical approaches are GeneiASE [153], QuASAR [154] and MBASED [155]. In addition, a set of best practices, comprehensive discussion on the ASE expression methods from RNA-Seq datasets is provided here [156].

Differential isoform expression and differential splicing

Recent transcriptome analyses have shown that up to 95% of human genes are alternatively spliced to encode proteins with different functions [157]. There are two types of splicing described: (1) constitutive splicing and (2) alternative splicing. In constitutive splicing, non-coding introns are removed from the pre-mRNA, and exons are joined together to form a mature mRNA [158, 159], whereas in alternative splicing, exons can be included or excluded in different combinations to create a diverse range of mRNA transcripts from a single pre-mRNA. It serves as one of the major mechanisms to increase the diversity of the transcriptome. Alternative splicing events are commonly classified into various categories such as cassette exons, mutually exclusive exons, retained intron, alternative 5′ splice sites, alternative 3′ splice sites, alternative promoters and alternative poly-A sites [160]. In a cassette exon, an intervening exon between two other exons from the mature mRNA sequence can be either included or skipped to generate two distinct protein isoforms. In the case of intron retention, the intron removal can be suppressed, resulting in the entire intron’s retention. Alternate splice sites of 5 ‘or 3’ can extend or shorten the exons. Intron retention is generally the most difficult type to detect due to the experimental artefacts. Since incompletely spliced transcripts may retain the intron fragments, which can be mislabelled as the intron’s retention.

Identification and analysis of alternative splicing events

The advancement of RNA-Seq methods has facilitated the study of the splicing process’s genome-wide impact and regulation. New bioinformatics tools led to the identification of novel transcribed regions and splicing isoforms and provided that a significant fraction of the transcribed RNA in human cells undergo alternative splicing [171]. An increasing number of studies illustrated that the selection of wrong splice sites causes human disease [172], emphasizing the importance of studying differentially spliced transcripts. However, the large datasets produced from RNA-Seq experiments and complex information to analyse have turned this into a challenging task.

Several tools have been developed for quantification and identification of alternative splicing events such as SpliceR [173], SpliceSeq [174], splicing Viewer [175], ASTALAVISTA [176, 177], GESS [178], DiffSplice [179], DSGseq [180], GLiMMPS [181], GPSeq [182], MATS [183], rMATS [184], MISO [185], JuncBASE [186], finesplice [187], SpliceTrap [188] and ASpli2 [189]. Each tool has its advantages and disadvantages. In the context of alternative splicing, the SpliceSeq, SpliceViewer, DiffSplice and ASTALAVISTA tools also facilitate intuitive visualization of RNA-Seq results. GESS (graph-based exon-skipping scanner) performs de novo detection of skipping event sites from raw RNA-Seq reads without prior knowledge of gene annotations. GESS method builds a splice site-link graph from first hand, raw RNA-Seq reads and then implements a walking strategy on this graph by iteratively navigating subgraphs to reveal those with a pattern corresponding to an exon-skipping event. Thus, it can provide a more accurate and comprehensive picture of skipping events associated with a cell’s particular physiological condition.

Both GESS and DiffSplice can identify splicing events without prior annotation. MATS compares two samples without replicates, whereas rMATS compares two samples with replicates. Running rMATS without replicates automatically invokes the MATS algorithm. DSGseq is an exon-based method that uses the NB distribution to model sequencing reads on exons and proposes an NB-statistic to detect differentially spliced genes. It can also detect the novel alternative splicing events and highlight differentially spliced.

MISO (mixture-of-isoforms) model is a statistical model that estimates the expression of alternatively spliced exons and isoforms and assesses confidence in these estimates from RNA-Seq data. The MISO model uses Bayesian inference to compute the probability that a read originated from a particular isoform. ASpli evaluate the difference in AS events’ magnitude by comparing the exon inclusion or intron retention percentage using splice junctions, and for the differential use of introns, exons and splicing junctions, read counts are used. This integrative process enables to detect the differences in both annotated and novel AS events. Main steps involved in the ASpli workflow are parsing the genome annotation into bins and aligning the read against the bins. Finally, based on the count of the reads aligned to the bins or junction, differential bins are inferred.

To identify alternative splicing events, SpliceR utilizes the full-length transcript output from RNA-Seq assemblers. SpliceR also annotates the differentially spliced elements’ genomic coordinates, facilitating downstream sequence analysis of each splicing pattern. Fraction values are calculated for each transcript isoform to detect the transcript switching between conditions. MISO, ASTALAVISTA and DiffSplice are unable to evaluate novel features because these methods do not generate the genomic coordinates of differentially spliced regions. Recently various approaches for differential splicing and DE at isoform have been benchmarked [190] and reviewed [191, 192], highlighting basic characteristics along with the pros and cons of some of the commonly used software.

Prospective

RNA-Seq algorithms and methodologies are continuing to evolve. Advancement in statistical models vastly improved bioinformatics methods, and many new applications of the RNA-Seq-based analyses are now possible. The key statistical considerations and methods evolved at sample level, gene/transcript level, exon level and the methods diversified with respect to different statistical models and data distributional assumptions. As the novel tools are now providing various ways to answer the same biological question, the pitfall for the user is to choose the most optimal and appropriate model. The choice of parameters within the method can also impact the final results and interpretation of data. In that case, it is recommended to try various tools and settings and to support the choice of the model by benchmarking studies and in-depth reviews of the methods. However, the user should carefully design their experiment and optimize the experimental design by power analysis for RNA-Seq beforehand. If biological samples are processed in different batches of the sequencing, the use of technical replicates is also recommended.

Bias due to sequencing library can also alter the outcome, e.g. insufficient small non-coding RNA (sncRNA) detection bias due to RNA modification, isoforms detection bias due to short reads. The new sequencing method PANDORA can detect sncRNAs that once undetected by other libraries due to RNA modification that interferes with ligation and reverse transcription. Similarly, an improved read coverage and sequencing depth can overcome the disadvantage of short read length. New technologies such as long-read sequencing and scRNA-seq open new possibilities where short-read sequencing and bulk RNA-Seq have limitations. For example, long RNA-Seq can better resolve the genome assembly of the highly repetitive region of the larger genomes. Furthermore, it can detect long structural variants, full-length isoforms sequence and examine splicing better than short-read sequencing. On the other hand, scRNA-Seq helps understand tumour heterogeneity and drug-resistant clones with higher resolution. It is also very efficient in the investigation of cell lineages and cancer stem cell evolution. Other NGS and microarray technologies restricted rare cell type analyses to a large cell population only, but scRNA-Seq can analyse rare cell type more efficiently. scRNA-Seq assisted in the identification of exhausted CD8 cells in cancer therapy and establishing drug-resistant clones with higher resolution. Furthermore, with the help of time course scRNA-Seq experiments, we are opening up a wealth of biological information regarding spatio-temporal gene expression at different stages of cell growth or under disease stages. On the other hand, it is not straightforward to construct the cell lineages from the bulk RNA-Seq and microarray-based technologies. In the past, the gene regulatory networks was used to construct time series datasets from bulk and microarray platforms. However, these methods cannot identify the rare cell type and perform downstream analyses in a computationally efficient way.

This review described the evolving methodology for bulk RNA-Seq-based analysis and highlighted the knowledge gaps. Due to different biological aims and scenarios, finding consent on the best tools or pipelines is challenging. However, within the scope of this article, we provided a good summary of methods and information on available resources together with benchmarking studies that can help non-experts users orient themselves on new applications that bulk-RNA-Seq analysis can accommodate.

Key Points
  • Detection of genomic features from RNA-Seq data with advanced bioinformatic algorithms.

  • Evolving bioinformatic tools have the potential to discover more hidden information in bulk RNA-Seq data such as microbial contamination, TEs expression, CNAs, neoantigen prediction and cell type deconvolution, etc.

  • Exploration of advance TWAS using bulk RNA-Seq.

Funding

The National Health and Medical Research Council Project Grant APP1181179 Ideas Grant (to A.S.T., B.A. and M.R.). Czech Science Foundation (20-04099S to P.K.T.).

Amarinder Singh Thind is a postdoctoral researcher (Bioinformatics) at the University of Wollongong, Australia and IHMRI. His current research interests include the development of methodology for the integration of genomics, coding and non-coding transcriptome to discover biomarkers in cancer contexts.

Isha Monga is working as postdoctoral research scientist at the Department of Dermatology, Columbia University Medical Center, New York. Her research interests include understanding and elucidation of the molecular mechanisms of skin disorders including autoimmune disease alopecia areata. To achieve that, she is currently analysing high-throughput datasets of the single-cell sequencing, GWAS fine-mapping and 3D genomics methods from the mouse model and patients’ skin samples.

Prasoon Kumar Thakur is a Bioinformatician at the laboratory of RNA Biology at the Institute of Molecular genetics, Prague, Czech Republic and pursuing a PhD degree at Charles University, Prague, Czech Republic. The focus of his current research is RNA splicing and high-throughput data analysis.

Pallawi Kumari is a Bioinformatics researcher at IMTECH-Bioinformatics Centre, Chandigarh, India. She has expertise in NGS genomics and transcriptomics. Her research interests lie in human cancer genetics.

Kiran Dindhoria is a research fellow at MTCC, CSIR-IMTech, Chandigarh, India. She has expertise in NGS genomics and metagenomics. Her research interests lie in human microbiome studies.

Monika Krzak is a postdoctoral researcher in Bioinformatics at the Genomics of Inflammation and Immunity Group at Sanger Institute, UK. Her research interests are focused on understanding the pathogenesis of immune-mediated diseases.

Marie Ranson is a Professor of Molecular Biology at the University of Wollongong, Australia. She has longstanding research interests in biomarkers of cancer invasion and metastasis. She has made substantial contributions to the plasminogen activation system field in cancer as well as in inflammatory diseases. She has also successfully led and/or contributed IP to several drug development projects related.

Bruce Ashford is a surgeon scientist within the Medical School, University of Wollongong, Australia. His current research interests include bioinformatics, health informatics, molecular oncology and reconstructive microsurgery. The main focus of his research is to discover biomarkers in cSCC cancer using various omics methods.

References

1.

Byron
SA
,
Van Keuren-Jensen
KR
,
Engelthaler
DM
, et al. 
Translating RNA sequencing into clinical diagnostics: opportunities and challenges
.
Nat Rev Genet
2016
;
17
:
257
71
.

2.

Hong
M
,
Tao
S
,
Zhang
L
, et al. 
RNA sequencing: new technologies and applications in cancer research
.
J Hematol Oncol
2020
;
13
:
166
.

3.

Howlader
J
,
Robin
AHK
,
Natarajan
S
, et al. 
Transcriptome analysis by RNA-Seq reveals genes related to plant height in two sets of parent-hybrid combinations in Easter lily (Lilium longiflorum)
.
Sci Rep
2020
;
10
:
9082
.

4.

Kaur
G
,
Shukla
V
,
Kumar
A
, et al. 
Integrative analysis of hexaploid wheat roots identifies signature components during iron starvation
.
J Exp Bot
2019
;
70
:
6141
61
.

5.

Bang-Andreasen
T
,
Anwar
MZ
,
Lanzen
A
, et al. 
Total RNA sequencing reveals multilevel microbial community changes and functional responses to wood ash application in agricultural and forest soil
.
FEMS Microbiol Ecol
2020
;
96
:
1
12
.

6.

Thind
AS
,
Vitali
V
,
Guarracino
MR
, et al. 
What’s genetic variation got to do with it? Starvation-induced self-fertilization enhances survival in paramecium
.
Genome Biol Evol
2020
;
12
:
626
38
.

7.

Conesa
A
,
Madrigal
P
,
Tarazona
S
, et al. 
A survey of best practices for RNA-seq data analysis
.
Genome Biol
2016
;
17
:
13
.

8.

Sangiovanni
M
,
Granata
I
,
Thind
AS
, et al. 
From trash to treasure: detecting unexpected contamination in unmapped NGS data
.
BMC Bioinformatics
2019
;
20
:
168
.

9.

Liu
X
,
Bienkowska
JR
,
Zhong
W
.
GeneTEFlow: a Nextflow-based pipeline for analysing gene and transposable elements expression from RNA-Seq data
.
PLoS One
2020
;
15
:e0232994.

10.

Roman
AC
,
Morales-Hernandez
A
,
Fernandez-Salguero
PM
.
RNA-Seq analysis to measure the expression of SINE retroelements
.
Methods Mol Biol
2016
;
1400
:
107
16
.

11.

Trincado
JL
,
Reixachs-Sole
M
,
Pérez-Granado
J
, et al. 
ISOTOPE: ISOform-guided prediction of epiTOPEs in cancer
.
2020
. .

12.

Sturm
G
,
Finotello
F
,
Petitprez
F
, et al. 
Comprehensive evaluation of transcriptome-based cell-type quantification methods for immuno-oncology
.
Bioinformatics
2019
;
35
:
i436
45
.

13.

Murdock
DR
,
Dai
H
,
Burrage
LC
, et al. 
Transcriptome-directed analysis for Mendelian disease diagnosis overcomes limitations of conventional genomic testing
.
J Clin Invest
2021
;
131
:
1
13
.

14.

Wilkerson
MD
,
Cabanski
CR
,
Sun
W
, et al. 
Integrated RNA and DNA sequencing improves mutation detection in low purity tumors
.
Nucleic Acids Res
2014
;
42
:
e107
.

15.

Mora
A
.
Gene set analysis methods for the functional interpretation of non-mRNA data-genomic range and ncRNA data
.
Brief Bioinform
2020
;
21
:
1495
508
.

16.

Dobin
A
,
Davis
CA
,
Schlesinger
F
, et al. 
STAR: ultrafast universal RNA-seq aligner
.
Bioinformatics
2013
;
29
:
15
21
.

17.

Kim
D
,
Paggi
JM
,
Park
C
, et al. 
Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype
.
Nat Biotechnol
2019
;
37
:
907
15
.

18.

Ungaro
A
,
Pech
N
,
Martin
JF
, et al. 
Challenges and advances for transcriptome assembly in non-model species
.
PLoS One
2017
;
12
:e0185020.

19.

Duarte
GT
,
Volkova
PY
,
Geras'kin
SA
.
A pipeline for non-model organisms for de novo transcriptome assembly, annotation, and gene ontology analysis using open tools: case study with Scots pine
.
Bio Protoc
2021
;
11
:
e3912
.

20.

Alhakami
H
,
Mirebrahim
H
,
Lonardi
S
.
A comparative evaluation of genome assembly reconciliation tools
.
Genome Biol
2017
;
18
:
93
.

21.

Hölzer
M
,
Marz
M
.
De novo transcriptome assembly: a comprehensive cross-species comparison of short-read RNA-Seq assemblers
.
Gigascience
2019
;
8
:
1
16
.

22.

Grabherr
MG
,
Haas
BJ
,
Yassour
M
, et al. 
Full-length transcriptome assembly from RNA-Seq data without a reference genome
.
Nat Biotechnol
2011
;
29
:
644
52
.

23.

Bankevich
A
,
Nurk
S
,
Antipov
D
, et al. 
SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing
.
J Comput Biol
2012
;
19
:
455
77
.

24.

Robertson
G
,
Schein
J
,
Chiu
R
, et al. 
De novo assembly and analysis of RNA-seq data
.
Nat Methods
2010
;
7
:
909
12
.

25.

Chang
Z
,
Li
G
,
Liu
J
, et al. 
Bridger: a new framework for de novo transcriptome assembly using RNA-seq data
.
Genome Biol
2015
;
16
:
30
.

26.

Xie
Y
,
Wu
G
,
Tang
J
, et al. 
SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads
.
Bioinformatics
2014
;
30
:
1660
6
.

27.

Jakobi
T
,
Dieterich
C
.
Computational approaches for circular RNA analysis
.
Wiley Interdiscip Rev RNA
2019
;
10
:
e1528
.

28.

Johnson
DE
,
Burtness
B
,
Leemans
CR
.
Head and neck squamous cell carcinoma
.
Nat Rev Dis Primers
2020
;
6
:
93
.

29.

Serin Harmanci
A
,
Harmanci
AO
,
Zhou
X
.
CaSpER identifies and visualizes CNV events by integrative analysis of single-cell or bulk RNA-sequencing data
.
Nat Commun
2020
;
11
:
89
.

30.

Mu
Q
,
Wang
J
.
CNAPE: a machine learning method for copy number alteration prediction from gene expression
.
IEEE/ACM Trans Comput Biol Bioinform
2021
;
18
:
306
11
.

31.

Talevich
E
,
Shain
AH
.
CNVkit-RNA: copy number inference from RNA-sequencing data
.
bioRxiv
2018
;408534.

32.

Flensburg
C
,
Oshlack
A
,
Majewski
IJ
.
Detecting copy number alterations in RNA-Seq using SuperFreq
.
Bioinformatics
2021
;btab440, https://doi.org/10.1093/bioinformatics/btab440.

33.

Tomczak
K
,
Czerwinska
P
,
Wiznerowicz
M
.
The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge
.
Contemp Oncol (Pozn)
2015
;
19
:
A68
77
.

34.

Turajlic
S
,
Litchfield
K
,
Xu
H
, et al. 
Insertion-and-deletion-derived tumour-specific neoantigens and the immunogenic phenotype: a pan-cancer analysis
.
Lancet Oncol
2017
;
18
:
1009
21
.

35.

Mills
RE
,
Luttig
CT
,
Larkins
CE
, et al. 
An initial map of insertion and deletion (INDEL) variation in the human genome
.
Genome Res
2006
;
16
:
1182
90
.

36.

Hagiwara
K
,
Ding
L
,
Edmonson
MN
, et al. 
RNAIndel: discovering somatic coding indels from tumor RNA-Seq data
.
Bioinformatics
2020
;
36
:
1382
90
.

37.

Sun
Z
,
Bhagwate
A
,
Prodduturi
N
, et al. 
Indel detection from RNA-seq data: tool evaluation and strategies for accurate detection of actionable mutations
.
Brief Bioinform
2017
;
18
:
973
83
.

38.

Slosarek
T
,
Kraus
M
,
Schapranow
M-P
, et al. 
Qualitative Comparison of Selected Indel Detection Methods for RNA-Seq Data
.
Cham
:
Springer International Publishing
,
2019
,
166
77
.

39.

Oikkonen
L
,
Lise
S
.
Making the most of RNA-seq: pre-processing sequencing data with Opossum for reliable SNP variant detection
.
Wellcome Open Res
2017
;
2
:
6
.

40.

Yang
R
,
Van Etten
JL
,
Dehm
SM
.
Indel detection from DNA and RNA sequencing data with transIndel
.
BMC Genomics
2018
;
19
:
270
.

41.

Makarov
V
,
O'Grady
T
,
Cai
G
, et al. 
AnnTools: a comprehensive and versatile annotation toolkit for genomic variants
.
Bioinformatics
2012
;
28
:
724
5
.

42.

Edwards
PA
.
Fusion genes and chromosome translocations in the common epithelial cancers
.
J Pathol
2010
;
220
:
244
54
.

43.

Tomlins
SA
,
Rhodes
DR
,
Perner
S
, et al. 
Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer
.
Science
2005
;
310
:
644
8
.

44.

Haas
BJ
,
Dobin
A
,
Li
B
, et al. 
Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods
.
Genome Biol
2019
;
20
:
213
.

45.

Kim
P
,
Jang
YE
,
Lee
S
.
FusionScan: accurate prediction of fusion genes from RNA-Seq data
.
Genomics Inform
2019
;
17
:
e26
.

46.

Uhrig
S
,
Ellermann
J
,
Walther
T
, et al. 
Accurate and efficient detection of gene fusions from RNA sequencing data
.
Genome Res
2021
;
31
:
448
60
.

47.

Latysheva
NS
,
Babu
MM
.
Discovering and understanding oncogenic gene fusions through data intensive computational approaches
.
Nucleic Acids Res
2016
;
44
:
4487
503
.

48.

Frenkel-Morgenstern
M
,
Gorohovski
A
,
Vucenovic
D
, et al. 
ChiTaRS 2.1--an improved database of the chimeric transcripts and RNA-seq data with novel sense-antisense chimeric RNA transcripts
.
Nucleic Acids Res
2015
;
43
:
D68
75
.

49.

Wang
Y
,
Wu
N
,
Liu
J
, et al. 
FusionCancer: a database of cancer fusion genes derived from RNA-seq data
.
Diagn Pathol
2015
;
10
:
131
.

50.

Yoshihara
K
,
Wang
Q
,
Torres-Garcia
W
, et al. 
The landscape and therapeutic relevance of cancer-associated transcript fusions
.
Oncogene
2015
;
34
:
4845
54
.

51.

Korla
PK
,
Cheng
J
,
Huang
CH
, et al. 
FARE-CAFE: a database of functional and regulatory elements of cancer-associated fusion events
.
Database (Oxford)
2015
;
2015
:bav086.

52.

Jiang
T
,
Shi
T
,
Zhang
H
, et al. 
Tumor neoantigens: from basic research to clinical applications
.
J Hematol Oncol
2019
;
12
:
93
.

53.

Kahles
A
,
Lehmann
KV
,
Toussaint
NC
, et al. 
Comprehensive analysis of alternative splicing across tumors from 8,705 patients
.
Cancer Cell
2018
;
34
:
211
24.e216
.

54.

Zhang
Z
,
Zhou
C
,
Tang
L
, et al. 
ASNEO: identification of personalized alternative splicing based neoantigens with RNA-seq
.
Aging (Albany NY)
2020
;
12
:
14633
48
.

55.

Zhang
J
,
Mardis
ER
,
Maher
CA
.
INTEGRATE-neo: a pipeline for personalized gene fusion neoantigen discovery
.
Bioinformatics
2017
;
33
:
555
7
.

56.

Wang
TY
,
Wang
L
,
Alam
SK
, et al. 
ScanNeo: identifying indel-derived neoantigens using RNA-Seq data
.
Bioinformatics
2019
;
35
:
4159
61
.

57.

de
Koning
AP
,
Gu
W
,
Castoe
TA
, et al. 
Repetitive elements may comprise over two-thirds of the human genome
.
PLoS Genet
2011
;
7
:e1002384.

58.

Andreassen
R
.
Alu elements in the human genome
.
Tidsskr Nor Laegeforen
2004
;
124
:
2345
9
.

59.

Smulders-Srinivasan
TK
,
Szakmary
A
,
Lin
H
.
A Drosophila chromatin factor interacts with the Piwi-interacting RNA mechanism in niche cells to regulate germline stem cell self-renewal
.
Genetics
2010
;
186
:
573
83
.

60.

Bennett
EA
,
Keller
H
,
Mills
RE
, et al. 
Active Alu retrotransposons in the human genome
.
Genome Res
2008
;
18
:
1875
83
.

61.

Deininger
PL
,
Batzer
MA
.
Mammalian retroelements
.
Genome Res
2002
;
12
:
1455
65
.

62.

Zhou
W
,
Liang
G
,
Molloy
PL
, et al. 
DNA methylation enables transposable element-driven genome expansion
.
Proc Natl Acad Sci U S A
2020
;
117
:
19359
66
.

63.

Burns
KH
.
Transposable elements in cancer
.
Nat Rev Cancer
2017
;
17
:
415
24
.

64.

Larsen
PA
,
Hunnicutt
KE
,
Larsen
RJ
, et al. 
Warning SINEs: Alu elements, evolution of the human brain, and the spectrum of neurological disease
.
Chromosome Res
2018
;
26
:
93
111
.

65.

Kahles
A
,
Behr
J
,
Rätsch
G
.
MMR: a tool for read multi-mapper resolution
.
Bioinformatics
2016
;
32
:
770
2
.

66.

Jeong
HH
,
Yalamanchili
HK
,
Guo
C
, et al. 
An ultra-fast and scalable quantification pipeline for transposable elements from next generation sequencing data
.
Pac Symp Biocomput
2018
;
23
:
168
79
.

67.

Carnevali
D
,
Conti
A
,
Pellegrini
M
, et al. 
Whole-genome expression analysis of mammalian-wide interspersed repeat elements in human cell lines
.
DNA Res
2017
;
24
:
59
69
.

68.

Tokuyama
M
,
Kong
Y
,
Song
E
, et al. 
ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses
.
Proc Natl Acad Sci U S A
2018
;
115
:
12565
72
.

69.

Babaian
A
,
Thompson
IR
,
Lever
J
, et al. 
LIONS: analysis suite for detecting and quantifying transposable element initiated transcription from RNA-seq
.
Bioinformatics
2019
;
35
:
3839
41
.

70.

O'Neill
K
,
Brocks
D
,
Hammell
MG
.
Mobile genomics: tools and techniques for tackling transposons
.
Philos Trans R Soc Lond B Biol Sci
2020
;
375
:20190345.

71.

Lerat
E
,
Casacuberta
J
,
Chaparro
C
, et al. 
On the importance to acknowledge transposable elements in epigenomic analyses
.
Genes (Basel)
2019
;
10
:
258
.

72.

Lanciano
S
,
Cristofari
G
.
Measuring and interpreting transposable element expression
.
Nat Rev Genet
2020
;
21
:
721
36
.

73.

Teissandier
A
,
Servant
N
,
Barillot
E
, et al. 
Tools and best practices for retrotransposon analysis using high-throughput sequencing data
.
Mob DNA
2019
;
10
:
52
.

74.

Hadfield
J
,
Eldridge
MD
.
Multi-genome alignment for quality control and contamination screening of next-generation sequencing data
.
Front Genet
2014
;
5
:
31
.

75.

Strong
MJ
,
Xu
G
,
Morici
L
, et al. 
Microbial contamination in next generation sequencing: implications for sequence-based analysis of clinical samples
.
PLoS Pathog
2014
;
10
:e1004437.

76.

Khan
A
,
Liu
Q
,
Chen
X
, et al. 
VirTect: a computational method for detecting virus species from RNA-Seq and its application in head and neck squamous cell carcinoma
.
bioRxiv
2018
;272278.

77.

Yasumizu
Y
,
Hara
A
,
Sakaguchi
S
, et al. 
VIRTUS: a pipeline for comprehensive virus analysis from conventional RNA-seq data
.
Bioinformatics
2021
;
37
:
1465
7
.

78.

Bhuvaneshwar
K
,
Song
L
,
Madhavan
S
, et al. 
viGEN: an open source pipeline for the detection and quantification of viral RNA in human tumors
.
Front Microbiol
2018
;
9
:
1172
.

79.

Schirmer
M
,
Franzosa
EA
,
Lloyd-Price
J
, et al. 
Dynamics of metatranscription in the inflammatory bowel disease gut microbiome
.
Nat Microbiol
2018
;
3
:
337
46
.

80.

Granata
I
,
Nardelli
C
,
D’Argenio
V
, et al. 
Duodenal metatranscriptomics to define human and microbial functional alterations associated with severe obesity: a pilot study
.
Microorganisms
2020
;
8
:
1811
.

81.

Heravi
FS
,
Zakrzewski
M
,
Vickery
K
, et al. 
Metatranscriptomic analysis reveals active bacterial communities in diabetic foot infections
.
Front Microbiol
2020
;
11
:
1688
.

82.

Youngblut, Nicholas D., Georg H. Reischer, William Walters, Nathalie Schuster, Chris Walzer, Gabrielle Stalder, Ruth E. Ley, and Andreas H. Farnleitner.

Host diet and evolutionary history explain different aspects of gut microbiome diversity among vertebrate clades
.
Nature communications
2019
;
10
:
1
15
.

83.

Peters
BA
,
Wilson
M
,
Moran
U
, et al. 
Relating the gut metagenome and metatranscriptome to immunotherapy responses in melanoma patients
.
Genome Med
2019
;
11
:
61
.

84.

Campos
GS
,
Sardi
SI
,
Falcao
MB
, et al. 
Ion torrent-based nasopharyngeal swab metatranscriptomics in COVID-19
.
J Virol Methods
2020
;
282
:113888.

85.

Kolodny
O
,
Callahan
BJ
,
Douglas
AE
.
The role of the microbiome in host evolution
.
Philos Trans R Soc Lond B Biol Sci
2020
;
375
:20190588.

86.

Groussin
M
,
Mazel
F
,
Alm
EJ
.
Co-evolution and co-speciation of host-gut bacteria systems
.
Cell Host Microbe
2020
;
28
:
12
22
.

87.

Shakya
M
,
Lo
CC
,
Chain
PSG
.
Advances and challenges in metatranscriptomic analysis
.
Front Genet
2019
;
10
:
904
.

88.

Franzosa
EA
,
McIver
LJ
,
Rahnavard
G
, et al. 
Species-level functional profiling of metagenomes and metatranscriptomes
.
Nat Methods
2018
;
15
:
962
8
.

89.

Martinez
X
,
Pozuelo
M
,
Pascal
V
, et al. 
MetaTrans: an open-source pipeline for metatranscriptomics
.
Sci Rep
2016
;
6
:
26447
.

90.

Westreich
ST
,
Treiber
ML
,
Mills
DA
, et al. 
SAMSA2: a standalone metatranscriptome analysis pipeline
.
BMC Bioinformatics
2018
;
19
:
175
.

91.

Tamames
J
,
Puente-Sánchez
F
.
SqueezeMeta, a highly portable, fully automatic metagenomic analysis pipeline
.
Front Microbiol
2018
;
9
:
3349
.

92.

Narayanasamy
S
,
Jarosz
Y
,
Muller
EE
, et al. 
IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses
.
Genome Biol
2016
;
17
:
260
.

93.

Sequeira
JC
,
Rocha
M
,
Alves
MM
, et al.  MOSCA: an automated pipeline for integrated metagenomics and metatranscriptomics data analysis. In:
International Conference on Practical Applications of Computational Biology & Bioinformatics
. Cham:
Springer
,
2018
,
183
91
.

94.

Mehta
S
,
Crane
M
,
Leith
E
, et al. 
ASaiM-MT: a validated and optimized ASaiM workflow for metatranscriptomics analysis within Galaxy framework
.
F1000Research
2021
;
10
:
103
.

95.

Yang
WR
,
Ardeljan
D
,
Pacyna
CN
, et al. 
SQuIRE reveals locus-specific regulation of interspersed repeat expression
.
Nucleic Acids Res
2019
;
47
:
e27
7
.

96.

McKerrow
W
,
Fenyö
D
.
L1EM: a tool for accurate locus specific LINE-1 RNA quantification
.
Bioinformatics
2020
;
36
:
1167
73
.

97.

Criscione
SW
,
Zhang
Y
,
Thompson
W
, et al. 
Transcriptional landscape of repetitive elements in normal and cancer human cells
.
BMC Genomics
2014
;
15
:
1
7
.

98.

Lerat
E
,
Fablet
M
,
Modolo
L
, et al. 
TEtools facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes
.
Nucleic Acids Res
2017
;
45
:
e17
7
.

99.

Jin
Y
,
Tam
OH
,
Paniagua
E
, et al. 
TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets
.
Bioinformatics
2015
;
31
:
3593
9
.

100.

Hackl
H
,
Charoentong
P
,
Finotello
F
, et al. 
Computational genomics tools for dissecting tumour-immune cell interactions
.
Nat Rev Genet
2016
;
17
:
441
58
.

101.

Newman
AM
,
Steen
CB
,
Liu
CL
, et al. 
Determining cell type abundance and expression from bulk tissues with digital cytometry
.
Nat Biotechnol
2019
;
37
:
773
82
.

102.

Stobbe
MD
,
Thun
GA
,
Diéguez-Docampo
A
, et al. 
Recurrent somatic mutations reveal new insights into consequences of mutagenic processes in cancer
.
PLoS Comput Biol
2019
;
15
:e1007496.

103.

Ramroop
JR
,
Gerber
MM
,
Toland
AE
.
Germline variants impact somatic events during tumorigenesis
.
Trends Genet
2019
;
35
:
515
26
.

104.

Coudray
A
,
Battenhouse
AM
,
Bucher
P
, et al. 
Detection and benchmarking of somatic mutations in cancer genomes using RNA-seq data
.
PeerJ
2018
;
6
:e5362.

105.

Jin
J
,
Wu
X
,
Yin
J
, et al. 
Identification of genetic mutations in cancer: challenge and opportunity in the new era of targeted therapy
.
Front Oncol
2019
;
9
:
263
.

106.

Bosio
M
,
Valencia
A
,
Capella-Gutierrez
S
.
SmartRNASeqCaller: improving germline variant calling from RNAseq
.
bioRxiv
2019
;684993.

107.

Wang
K
,
Li
M
,
Hakonarson
H
.
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
.
Nucleic Acids Res
2010
;
38
:
e164
.

108.

McLaren
W
,
Gil
L
,
Hunt
SE
, et al. 
The Ensembl variant effect predictor
.
Genome Biol
2016
;
17
:
122
.

109.

Pagel, K.A., Kim, R., Moad, K., Busby, B., Zheng, L., Tokheim, C., Ryan, M. and Karchin, R.

Integrated informatics analysis of cancer-related variants
.
JCO clinical cancer informatics
2020
;
4
:
310
317
.

110.

Radenbaugh
AJ
,
Ma
S
,
Ewing
A
, et al. 
RADIA: RNA and DNA integrated analysis for somatic mutation detection
.
PLoS One
2014
;
9
:e111516.

111.

Lawrence
MS
,
Stojanov
P
,
Polak
P
, et al. 
Mutational heterogeneity in cancer and the search for new cancer-associated genes
.
Nature
2013
;
499
:
214
8
.

112.

Adetunji
MO
,
Lamont
SJ
,
Abasht
B
, et al. 
Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data
.
PLoS One
2019
;
14
:e0216838.

113.

Mularoni
L
,
Sabarinathan
R
,
Deu-Pons
J
, et al. 
OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations
.
Genome Biol
2016
;
17
:
128
.

114.

Wainberg
M
,
Sinnott-Armstrong
N
,
Mancuso
N
, et al. 
Opportunities and challenges for transcriptome-wide association studies
.
Nat Genet
2019
;
51
:
592
9
.

115.

Zhu
H
,
Zhou
X
.
Transcriptome-wide association studies: a view from Mendelian randomization
.
Quant Biol
2020
;
17
:
1
5
. .

116.

Veturi
Y
,
Ritchie
MD
.
How powerful are summary-based methods for identifying expression-trait associations under different genetic architectures?
Pac Symp Biocomput
2018
;
23
:
228
39
.

117.

Gamazon
ER
,
Wheeler
HE
,
Shah
KP
, et al. 
A gene-based association method for mapping traits using reference transcriptome data
.
Nat Genet
2015
;
47
:
1091
8
.

118.

Qi
J
,
Asl
HF
,
Björkegren
J
, et al. 
kruX: matrix-based non-parametric eQTL discovery
.
BMC Bioinformatics
2014
;
15
:
1
7
.

119.

Shabalin
AA
.
Matrix eQTL: ultra fast eQTL analysis via large matrix operations
.
Bioinformatics
2012
;
28
:
1353
8
.

120.

Yang
T-P
,
Beazley
C
,
Montgomery
SB
, et al. 
Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies
.
Bioinformatics
2010
;
26
:
2474
6
.

121.

Gao
C
,
Tignor
NL
,
Salit
J
, et al. 
HEFT: eQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors
.
Bioinformatics
2014
;
30
:
369
76
.

122.

Marchini
J
,
Howie
B
,
Myers
S
, et al. 
A new multipoint method for genome-wide association studies by imputation of genotypes
.
Nat Genet
2007
;
39
:
906
13
.

123.

Broman
KW
,
Wu
H
,
Sen
Ś
, et al. 
R/QTL: QTL mapping in experimental crosses
.
Bioinformatics
2003
;
19
:
889
90
.

124.

He
X
,
Fuller
CK
,
Song
Y
, et al. 
Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS
.
Am J Hum Genet
2013
;
92
:
667
80
.

125.

Flutre
T
,
Wen
X
,
Pritchard
J
, et al. 
A statistical framework for joint eQTL analysis in multiple tissues
.
PLoS Genet
2013
;
9
:e1003486.

126.

Imprialou
M
,
Petretto
E
,
Bottolo
L
.
Expression QTLs mapping and analysis: a Bayesian perspective
.
Systems Genetics
2017
;
1488
:
189
215
.

127.

Gusev
A
,
Ko
A
,
Shi
H
, et al. 
Integrative approaches for large-scale transcriptome-wide association studies
.
Nat Genet
2016
;
48
:
245
52
.

128.

Zeng
P
,
Zhou
X
.
Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models
.
Nat Commun
2017
;
8
:
456
.

129.

Zhu
Z
,
Zhang
F
,
Hu
H
, et al. 
Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets
.
Nat Genet
2016
;
48
:
481
7
.

130.

Zhu
Z
,
Zheng
Z
,
Zhang
F
, et al. 
Causal associations between risk factors and common diseases inferred from GWAS summary data
.
Nat Commun
2018
;
9
:
224
.

131.

Wei
IH
,
Shi
Y
,
Jiang
H
, et al. 
RNA-Seq accurately identifies cancer biomarker signatures to distinguish tissue of origin
.
Neoplasia
2014
;
16
:
918
27
.

132.

Zhang
Y
,
Parmigiani
G
,
Johnson
WE
.
ComBat-seq: batch effect adjustment for RNA-seq count data
.
NAR Genom Bioinform
2020
;
2
:lqaa078.

133.

Risso
D
,
Ngai
J
,
Speed
TP
, et al. 
Normalization of RNA-seq data using factor analysis of control genes or samples
.
Nat Biotechnol
2014
;
32
:
896
902
.

134.

Love
MI
,
Huber
W
,
Anders
S
.
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
.
Genome Biol
2014
;
15
:
550
.

135.

Robinson
MD
,
McCarthy
DJ
,
Smyth
GK
.
edgeR: a bioconductor package for differential expression analysis of digital gene expression data
.
Bioinformatics
2010
;
26
:
139
40
.

136.

Leng
N
,
Dawson
JA
,
Thomson
JA
, et al. 
EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments
.
Bioinformatics
2013
;
29
:
1035
43
.

137.

Frazee AC, Pertea G, Jaffe AE, Langmead B, Salzberg SL, Leek JT.

Ballgown bridges the gap between transcriptome assembly and expression analysis
.
Nature biotechnology
2015
;
33
:
243
6
.

138.

Tarazona
S
,
García
F
,
Ferrer
A
, et al. 
NOIseq: a RNA-seq differential expression method robust for sequencing depth biases
.
EMBnet Journal
2011
;
17
:
18
9
.

139.

Spies
D
,
Renz
PF
,
Beyer
TA
, et al. 
Comparative analysis of differential gene expression tools for RNA sequencing time course data
.
Brief Bioinform
2019
;
20
:
288
98
.

140.

Koch
CM
,
Chiu
SF
,
Akbarpour
M
, et al. 
A Beginner’s guide to analysis of RNA sequencing data
.
Am J Respir Cell Mol Biol
2018
;
59
:
145
57
.

141.

Wittkopp
PJ
,
Haerum
BK
,
Clark
AG
.
Evolutionary changes in cis and trans gene regulation
.
Nature
2004
;
430
:
85
8
.

142.

Nagalakshmi
U
,
Wang
Z
,
Waern
K
, et al. 
The transcriptional landscape of the yeast genome defined by RNA sequencing
.
Science
2008
;
320
:
1344
9
.

143.

Lister
R
,
O'Malley
RC
,
Tonti-Filippini
J
, et al. 
Highly integrated single-base resolution maps of the epigenome in Arabidopsis
.
Cell
2008
;
133
:
523
36
.

144.

Degner
JF
,
Marioni
JC
,
Pai
AA
, et al. 
Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data
.
Bioinformatics
2009
;
25
:
3207
12
.

145.

Lalonde
E
,
Ha
KC
,
Wang
Z
, et al. 
RNA sequencing reveals the role of splicing polymorphisms in regulating human gene expression
.
Genome Res
2011
;
21
:
545
54
.

146.

Stevenson
KR
,
Coolon
JD
,
Wittkopp
PJ
.
Sources of bias in measures of allele-specific expression derived from RNA-sequence data aligned to a single reference genome
.
BMC Genomics
2013
;
14
:
536
.

147.

Castel
SE
,
Levy-Moonshine
A
,
Mohammadi
P
, et al. 
Tools and best practices for data processing in allelic expression analysis
.
Genome Biol
2015
;
16
:
195
.

148.

Pickrell
JK
,
Marioni
JC
,
Pai
AA
, et al. 
Understanding mechanisms underlying human gene expression variation with RNA sequencing
.
Nature
2010
;
464
:
768
72
.

149.

Coolon
JD
,
Stevenson
KR
,
McManus
CJ
, et al. 
Genomic imprinting absent in Drosophila melanogaster adult females
.
Cell Rep
2012
;
2
:
69
75
.

150.

Edsgärd
D
,
Iglesias
MJ
,
Reilly
S-J
, et al. 
GeneiASE: detection of condition-dependent and static allele-specific expression from RNA-seq data without haplotype information
.
Sci Rep
2016
;
6
:
21134
.

151.

Miao
Z
,
Alvarez
M
,
Pajukanta
P
, et al. 
ASElux: an ultra-fast and accurate allelic reads counter
.
Bioinformatics
2018
;
34
:
1313
20
.

152.

Fan
J
,
Hu
J
,
Xue
C
, et al. 
ASEP: gene-based detection of allele-specific expression across individuals in a population by RNA sequencing
.
PLoS Genet
2020
;
16
:e1008786.

153.

Edsgärd
D
,
Iglesias
MJ
,
Reilly
S-J
, et al. 
GeneiASE: detection of condition-dependent and static allele-specific expression from RNA-seq data without haplotype information
.
Sci Rep
2016
;
6
:
1
13
.

154.

Harvey
CT
,
Moyerbrailean
GA
,
Davis
GO
, et al. 
QuASAR: quantitative allele-specific analysis of reads
.
Bioinformatics
2015
;
31
:
1235
42
.

155.

Mayba
O
,
Gilbert
HN
,
Liu
J
, et al. 
MBASED: allele-specific expression detection in cancer tissues and cell lines
.
Genome Biol
2014
;
15
:
1
21
.

156.

Costa-Silva
J
,
Domingues
D
,
Lopes
FM
.
RNA-Seq differential expression analysis: an extended review and a software tool
.
PLoS One
2017
;
12
:e0190152.

157.

Pan
Q
,
Shai
O
,
Lee
LJ
, et al. 
Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing
.
Nat Genet
2008
;
40
:
1413
5
.

158.

van den
Hoogenhof
MM
,
Pinto
YM
,
Creemers
EE
.
RNA splicing: regulation and dysregulation in the heart
.
Circ Res
2016
;
118
:
454
68
.

159.

Wang
Y
,
Liu
J
,
Huang
BO
, et al. 
Mechanism of alternative splicing and its regulation
.
Biomed Rep
2015
;
3
:
152
8
.

160.

Cheng
Z
,
Menees
TM
.
RNA splicing and debranching viewed through analysis of RNA lariats
.
Mol Genet Genomics
2011
;
286
:
395
410
.

161.

Cao
M
,
Zhou
W
,
Breidt
FJ
, et al. 
Large scale maximum average power multiple inference on time-course count data with application to RNA-seq analysis
.
Biometrics
2020
;
76
:
9
22
.

162.

Qiu
Z
,
Chen
S
,
Qi
Y
, et al. 
Exploring transcriptional switches from pairwise, temporal and population RNA-Seq data using deepTS
.
Brief Bioinform
2021
;
22
:bbaa137.

163.

Oh
S
,
Li
C
,
Baldwin
RL
, et al. 
Temporal dynamics in meta longitudinal RNA-Seq data
.
Sci Rep
2019
;
9
:
1
1
.

164.

Fischer
DS
,
Theis
FJ
,
Yosef
N
.
Impulse model-based differential expression analysis of time course sequencing data
.
Nucleic Acids Res
201846
:
e119–e119
.

165.

Bacher
R
,
Leng
N
,
Chu
LF
, et al. 
Trendy: segmented regression analysis of expression dynamics in high-throughput ordered profiling experiments
.
BMC bioinformatics
2018
;
19
:
1
0
.

166.

Guo
W
,
Calixto
CPG
,
Brown
JWS
, et al. 
TSIS: an R package to infer alternative splicing isoform switches for time-series data
.
Bioinformatics
2017
;
33
:
3308
10
.

167.

Sun
X
,
Dalpiaz
D
,
Wu
D
, et al. 
Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model
.
BMC Bioinformatics
2016
;
17
:
1
3
.

168.

Leng
N
,
Li
Y
,
McIntosh
BE
, et al. 
EBSeq-HMM: a Bayesian approach for identifying gene-expression changes in ordered RNA-seq experiments
.
Bioinformatics
2015
;
31
:
2614
22
.

169.

Heinonen
M
,
Guipaud
O
,
Milliat
F
, et al. 
Detecting time periods of differential gene expression using Gaussian processes: an application to endothelial cells exposed to radiotherapy dose fraction
.
Bioinformatics
2015
;
31
:
728
35
.

170.

Nueda
MJ
,
Tarazona
S
,
Conesa
A
.
Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series
.
Bioinformatics
2014
;
30
:
2598
2602
.

171.

Tilgner
H
,
Knowles
DG
,
Johnson
R
, et al. 
Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs
.
Genome Res
2012
;
22
:
1616
25
.

172.

Kim
E
,
Goren
A
,
Ast
G
.
Alternative splicing and disease
.
RNA Biol
2008
;
5
:
17
9
.

173.

Vitting-Seerup
K
,
Porse
BT
,
Sandelin
A
, et al. 
spliceR: an R package for classification of alternative splicing and prediction of coding potential from RNA-seq data
.
BMC Bioinformatics
2014
;
15
:
81
.

174.

Ryan
MC
,
Cleland
J
,
Kim
R
, et al. 
SpliceSeq: a resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts
.
Bioinformatics
2012
;
28
:
2385
7
.

175.

Liu
Q
,
Chen
C
,
Shen
E
, et al. 
Detection, annotation and visualization of alternative splicing from RNA-Seq data with SplicingViewer
.
Genomics
2012
;
99
:
178
82
.

176.

Foissac
S
,
Sammeth
M
.
Analysis of alternative splicing events in custom gene datasets by AStalavista
.
Methods Mol Biol
2015
;
1269
:
379
92
.

177.

Foissac
S
,
Sammeth
M
.
ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets
.
Nucleic Acids Res
2007
;
35
:
W297
9
.

178.

Wang
J
,
Ye
Z
,
Huang
TH
, et al. 
A survey of computational methods in transcriptome-wide alternative splicing analysis
.
Biomol Concepts
2015
;
6
:
59
66
.

179.

Hu
Y
,
Huang
Y
,
Du
Y
, et al. 
DiffSplice: the genome-wide detection of differential splicing events with RNA-seq
.
Nucleic Acids Res
2013
;
41
:
e39
.

180.

Wang
W
,
Qin
Z
,
Feng
Z
, et al. 
Identifying differentially spliced genes from two groups of RNA-seq samples
.
Gene
2013
;
518
:
164
70
.

181.

Zhao
K
,
Lu
ZX
,
Park
JW
, et al. 
GLiMMPS: robust statistical model for regulatory variation of alternative splicing using RNA-seq data
.
Genome Biol
2013
;
14
:
R74
.

182.

Srivastava
S
,
Chen
L
.
A two-parameter generalized Poisson model to improve the analysis of RNA-seq data
.
Nucleic Acids Res
2010
;
38
:
e170
.

183.

Shen
S
,
Park
JW
,
Huang
J
, et al. 
MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data
.
Nucleic Acids Res
2012
;
40
:
e61
.

184.

Shen
S
,
Park
JW
,
Lu
ZX
, et al. 
rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data
.
Proc Natl Acad Sci U S A
2014
;
111
:
E5593
601
.

185.

Katz
Y
,
Wang
ET
,
Airoldi
EM
, et al. 
Analysis and design of RNA sequencing experiments for identifying isoform regulation
.
Nat Methods
2010
;
7
:
1009
15
.

186.

Brooks
AN
,
Yang
L
,
Duff
MO
, et al. 
Conservation of an RNA regulatory map between Drosophila and mammals
.
Genome Res
2011
;
21
:
193
202
.

187.

Gatto
A
,
Torroja-Fungairino
C
,
Mazzarotto
F
, et al. 
FineSplice, enhanced splice junction detection and quantification: a novel pipeline based on the assessment of diverse RNA-Seq alignment solutions
.
Nucleic Acids Res
2014
;
42
:
e71
.

188.

Wu
J
,
Akerman
M
,
Sun
S
, et al. 
SpliceTrap: a method to quantify alternative splicing under single cellular conditions
.
Bioinformatics
2011
;
27
:
3010
6
.

189.

Estefania
M
,
Andres
R
,
Javier
I
, et al. 
ASpli: integrative analysis of splicing landscapes through RNA-Seq assays
.
Bioinformatics
2021
.

190.

Merino GA, Conesa A, Fernández EA.

A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies
.
Briefings in bioinformatics
2019
;
20
:
471
81
.

191.

Thakur
PK
,
Rawal
HC
,
Obuca
M
, et al. 
Bioinformatics approaches for studying alternative splicing
.
Encyclopedia Bioinform Comput Biol
2019
;
2
:
221
34
.

192.

Mehmood
A
,
Laiho
A
,
Venalainen
MS
, et al. 
Systematic evaluation of differential splicing tools for RNA-seq studies
.
Brief Bioinform
2020
;
21
:
2052
65
.

193.

Li
T
,
Fu
J
,
Zeng
Z
, et al. 
TIMER2.0 for analysis of tumor-infiltrating immune cells
.
Nucleic Acids Res
2020
;
48
:
W509
14
.

194.

Finotello
F
,
Mayer
C
,
Plattner
C
et al. 
Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data
,
Genome Med
2019
;
11
:1–
20

195.

Zaitsev
K
,
Bambouskova
M
,
Swain
A
, et al. 
Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures
.
Nat Commun
2019
;
10
:
1
6
.

196.

Du
R
,
Carey
V
,
Weiss
ST
.
deconvSeq: deconvolution of cell mixture distribution in sequencing data
.
Bioinformatics
2019
;
35
:
5095
102
.

197.

Kang
K
,
Meng
Q
,
Shats
I
, et al. 
CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data
.
PLoS Comput Biol
2019
;
15
:e1007510.

198.

Hunt
GJ
,
Freytag
S
,
Bahlo
M
, et al. 
dtangle: accurate and robust cell type deconvolution
.
Bioinformatics
2019
;
35
:
2093
9
.

199.

Nadel
BB
,
Lopez
D
,
Montoya
DJ
, et al. 
The Gene Expression Deconvolution Interactive Tool (GEDIT): accurate cell type quantification from gene expression data
.
GigaScience
2021
;
10
:giab002.

200.

Racle
J
,
de
Jonge
K
,
Baumgaertner
P
, et al. 
Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data
.
Elife
2017
;
6
:e26476.

201.

Roman
T
,
Xie
L
,
Schwartz
R
.
Automated deconvolution of structured mixtures from heterogeneous tumor genomic data
.
PLoS Comput Biol
2017
;
13
:e1005815.

202.

Zaslavsky
ME
,
Novik
J
,
Chang
E
, et al. 
Infino: a Bayesian hierarchical model improves estimates of immune infiltration into tumor microenvironment
bioRxiv
.
2017
;
221671
.

203.

Becht
E
,
Giraldo
NA
,
Lacroix
L
, et al. 
Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression
.
Genome Biol
2016
;
17
:
1
20
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data