A survey of best practices for RNA-seq data analysis.

PubWeight™: 2.37‹?› | Rank: Top 2%

🔗 View Article (PMID 26813401)

Published in Genome Biol on January 26, 2016

Authors

Ana Conesa1,2, Pedro Madrigal3,4, Sonia Tarazona5,6, David Gomez-Cabrero7,8,9,10, Alejandra Cervera11, Andrew McPherson12, Michał Wojciech Szcześniak13, Daniel J Gaffney14, Laura L Elo15, Xuegong Zhang16,17, Ali Mortazavi18,19

Author Affiliations

1: Institute for Food and Agricultural Sciences, Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32603, USA. aconesa@ufl.edu.
2: Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain. aconesa@ufl.edu.
3: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. pm12@sanger.ac.uk.
4: Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, University of Cambridge, Cambridge, CB2 0SZ, UK. pm12@sanger.ac.uk.
5: Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.
6: Department of Applied Statistics, Operations Research and Quality, Universidad Politécnica de Valencia, 46020, Valencia, Spain.
7: Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, Karolinska University Hospital, 171 77, Stockholm, Sweden.
8: Center for Molecular Medicine, Karolinska Institutet, 17177, Stockholm, Sweden.
9: Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176, Stockholm, Sweden.
10: Science for Life Laboratory, 17121, Solna, Sweden.
11: Systems Biology Laboratory, Institute of Biomedicine and Genome-Scale Biology Research Program, University of Helsinki, 00014, Helsinki, Finland.
12: School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, BC, Canada.
13: Department of Bioinformatics, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University in Poznań, 61-614, Poznań, Poland.
14: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
15: Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland.
16: Key Lab of Bioinformatics/Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing, 100084, China.
17: School of Life Sciences, Tsinghua University, Beijing, 100084, China.
18: Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697-2300, USA. ali.mortazavi@uci.edu.
19: Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA. ali.mortazavi@uci.edu.

Articles citing this

A set of genes conserved in sequence and expression traces back the establishment of multicellularity in social amoebae. BMC Genomics (2016) 0.81

Erratum to: A survey of best practices for RNA-seq data analysis. Genome Biol (2016) 0.79

Robust detection of immune transcripts in FFPE samples using targeted RNA sequencing. Oncotarget (2016) 0.79

RNA sequencing analysis of the developing chicken retina. Sci Data (2016) 0.79

Large-Scale Profiling Reveals the Influence of Genetic Variation on Gene Expression in Human Induced Pluripotent Stem Cells. Cell Stem Cell (2017) 0.78

The Lair: a resource for exploratory analysis of published RNA-Seq data. BMC Bioinformatics (2016) 0.78

The state of play in higher eukaryote gene annotation. Nat Rev Genet (2016) 0.77

Reference standards for next-generation sequencing. Nat Rev Genet (2017) 0.77

Radiogenomic Analysis of Oncological Data: A Technical Survey. Int J Mol Sci (2017) 0.77

Multi-omics approaches to disease. Genome Biol (2017) 0.76

SCnorm: robust normalization of single-cell RNA-seq data. Nat Methods (2017) 0.76

Dietary restriction protects from age-associated DNA methylation and induces epigenetic reprogramming of lipid metabolism. Genome Biol (2017) 0.76

Assessing the Gene Content of the Megagenome: Sugar Pine (Pinus lambertiana). G3 (Bethesda) (2016) 0.76

Next generation sequencing technology and genomewide data analysis: Perspectives for retinal research. Prog Retin Eye Res (2016) 0.76

spongeScan: A web for detecting microRNA binding elements in lncRNA sequences. Nucleic Acids Res (2016) 0.76

MicroScope: ChIP-seq and RNA-seq software analysis suite for gene expression heatmaps. BMC Bioinformatics (2016) 0.76

Differential expression analysis for RNAseq using Poisson mixed models. Nucleic Acids Res (2017) 0.76

Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries. BMC Bioinformatics (2017) 0.75

Transcriptomics technologies. PLoS Comput Biol (2017) 0.75

ATGC transcriptomics: a web-based application to integrate, explore and analyze de novo transcriptomic data. BMC Bioinformatics (2017) 0.75

Brain transcriptomes of harbor seals demonstrate gene expression patterns of animals undergoing a metabolic disease and a viral infection. PeerJ (2016) 0.75

phylo-node: A molecular phylogenetic toolkit using Node.js. PLoS One (2017) 0.75

The Evolutionary Relationship between Alternative Splicing and Gene Duplication. Front Genet (2017) 0.75

Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats. Mol Biol Evol (2016) 0.75

RNA-Seq for Gene Expression Profiling of Human Necrotizing Enterocolitis: a Pilot Study. J Korean Med Sci (2017) 0.75

Transcriptomic Analysis of Thermally Stressed Symbiodinium Reveals Differential Expression of Stress and Metabolism Genes. Front Plant Sci (2017) 0.75

Characterisation of the Whole Blood mRNA Transcriptome in Holstein-Friesian and Jersey Calves in Response to Gradual Weaning. PLoS One (2016) 0.75

Transcriptome analysis of Corynebacterium glutamicum in the process of recombinant protein expression in bioreactors. PLoS One (2017) 0.75

Antibody-independent mechanisms regulate the establishment of chronic Plasmodium infection. Nat Microbiol (2017) 0.75

Variation-preserving normalization unveils blind spots in gene expression profiling. Sci Rep (2017) 0.75

Analysis of Gene Expression in an Inbred Line of Soft-Shell Clams (Mya arenaria) Displaying Growth Heterosis: Regulation of Structural Genes and the NOD2 Pathway. Int J Genomics (2016) 0.75

OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid. Bioinform Biol Insights (2016) 0.75

Comprehensive evaluation of RNA-seq quantification methods for linearity. BMC Bioinformatics (2017) 0.75

Rewiring of the inferred protein interactome during blood development studied with the tool PPICompare. BMC Syst Biol (2017) 0.75

Large Differences in Gene Expression Responses to Drought and Heat Stress between Elite Barley Cultivar Scarlett and a Spanish Landrace. Front Plant Sci (2017) 0.75

IBTK Differently Modulates Gene Expression and RNA Splicing in HeLa and K562 Cells. Int J Mol Sci (2016) 0.75

Correspondence: Spontaneous secondary mutations confound analysis of the essential two-component system WalKR in Staphylococcus aureus. Nat Commun (2017) 0.75

PARRoT- a homology-based strategy to quantify and compare RNA-sequencing from non-model organisms. BMC Bioinformatics (2016) 0.75

Screening the Molecular Framework Underlying Local Dendritic mRNA Translation. Front Mol Neurosci (2017) 0.75

Testing Human Skin and Respiratory Sensitizers-What Is Good Enough? Int J Mol Sci (2017) 0.75

Transcriptomic changes in an animal-bacterial symbiosis under modeled microgravity conditions. Sci Rep (2017) 0.75

Resolving host-pathogen interactions by dual RNA-seq. PLoS Pathog (2017) 0.75

Ribosome RNA Profiling to Quantify Ovarian Development and Identify Sex in Fish. Sci Rep (2017) 0.75

An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks. BMC Syst Biol (2017) 0.75

De novo assembly, annotation, and characterization of the whole brain transcriptome of male and female Syrian hamsters. Sci Rep (2017) 0.75

Transcriptional Responses in Root and Leaf of Prunus persica under Drought Stress Using RNA Sequencing. Front Plant Sci (2016) 0.75

Allele-specific expression in the human heart and its application to postoperative atrial fibrillation and myocardial ischemia. Genome Med (2016) 0.75

Vector Integration Sites Identification for Gene-Trap Screening in Mammalian Haploid Cells. Sci Rep (2017) 0.75

Comparative tissue transcriptomics highlights dynamic differences among tissues but conserved metabolic transcript prioritization in preparation for arousal from torpor. J Comp Physiol B (2017) 0.75

Transcriptomic differentiation underlying marine-to-freshwater transitions in the South American silversides Odontesthes argentinensis and O. bonariensis (Atheriniformes). Ecol Evol (2017) 0.75

Functional Changes in the Gut Microbiome Contribute to Transforming Growth Factor β-Deficient Colon Cancer. mSystems (2017) 0.75

Molecular dissection of transcriptional reprogramming of steviol glycosides synthesis in leaf tissue during developmental phase transitions in Stevia rebaudiana Bert. Sci Rep (2017) 0.75

A systems approach to a spatio-temporal understanding of the drought stress response in maize. Sci Rep (2017) 0.75

Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data. BMC Bioinformatics (2017) 0.75

Global variation in gene expression and the value of diverse sampling. Curr Opin Syst Biol (2017) 0.75

Transcriptomic Analysis in Strawberry Fruits Reveals Active Auxin Biosynthesis and Signaling in the Ripe Receptacle. Front Plant Sci (2017) 0.75

SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines. BMC Bioinformatics (2017) 0.75

Organoid and Organ-On-A-Chip Systems: New Paradigms for Modeling Neurological and Gastrointestinal Disease. Curr Stem Cell Rep (2017) 0.75

Impact of sequencing depth and read length on single cell RNA sequencing data of T cells. Sci Rep (2017) 0.75

Comparison of alternative approaches for analysing multi-level RNA-seq data. PLoS One (2017) 0.75

GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis. PeerJ (2017) 0.75

Maize RNA PolIV affects the expression of genes with nearby TE insertions and has a genome-wide repressive impact on transcription. BMC Plant Biol (2017) 0.75

SuperTranscripts: a data driven reference for analysis and visualisation of transcriptomes. Genome Biol (2017) 0.75

Identification of Genes Associated with Lemon Floral Transition and Flower Development during Floral Inductive Water Deficits: A Hypothetical Model. Front Plant Sci (2017) 0.75

Carbohydrate-active enzymes in Trichoderma harzianum: a bioinformatic analysis bioprospecting for key enzymes for the biofuels industry. BMC Genomics (2017) 0.75

Copper homeostasis networks in the bacterium Pseudomonas aeruginosa. J Biol Chem (2017) 0.75

Articles cited by this

(truncated to the top 100)

Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet (2000) 336.52

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol (2009) 235.12

Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (2009) 190.94

The human genome browser at UCSC. Genome Res (2002) 168.23

Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc (2009) 137.99

Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods (2008) 126.81

Statistical significance for genomewide studies. Proc Natl Acad Sci U S A (2003) 88.64

Fast gapped-read alignment with Bowtie 2. Nat Methods (2012) 83.79

TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (2009) 81.13

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol (2010) 75.21

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics (2009) 67.17

Differential expression analysis for sequence count data. Genome Biol (2010) 64.56

Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res (2008) 54.83

Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol (2011) 53.86

Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics (2005) 46.40

Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics (2014) 44.23

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol (2010) 39.63

miRBase: tools for microRNA genomics. Nucleic Acids Res (2007) 38.61

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc (2012) 35.75

TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol (2013) 32.42

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol (2014) 27.48

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics (2011) 25.76

STAR: ultrafast universal RNA-seq aligner. Bioinformatics (2012) 25.21

MicroRNA targets in Drosophila. Genome Biol (2003) 23.59

Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform (2012) 23.58

HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics (2014) 23.22

Moderated statistical tests for assessing differences in tag abundance. Bioinformatics (2007) 22.87

Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics (2006) 22.61

Pfam: the protein families database. Nucleic Acids Res (2013) 22.48

A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol (2010) 22.10

Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics (2010) 19.86

Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics (2010) 17.53

Genome-wide associations of gene expression variation in humans. PLoS Genet (2005) 17.27

Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature (2010) 16.86

Transcriptome genetics using second generation sequencing in a Caucasian population. Nature (2010) 14.85

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics (2009) 14.83

Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature (2007) 14.43

Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol (2012) 14.01

InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res (2011) 13.45

miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res (2013) 13.41

De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc (2013) 13.33

Rfam: updates to the RNA families database. Nucleic Acids Res (2008) 11.61

The Genotype-Tissue Expression (GTEx) project. Nat Genet (2013) 10.77

Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol (2011) 10.63

Integrative analysis of 111 reference human epigenomes. Nature (2015) 10.32

Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods (2010) 10.13

featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics (2013) 9.98

MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res (2010) 9.80

Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics (2012) 9.68

Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res (2009) 9.40

Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol (2010) 9.15

Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat Methods (2010) 9.09

Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res (2010) 9.08

Transcriptome and genome sequencing uncovers functional variation in humans. Nature (2013) 8.89

Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat Rev Genet (2012) 8.55

Transcript length bias in RNA-seq data confounds systems biology. Biol Direct (2009) 8.53

miRWalk--database: prediction of possible miRNA binding sites by "walking" the genes of three genomes. J Biomed Inform (2011) 8.50

voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol (2014) 8.13

Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics (2011) 8.05

baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics (2010) 8.01

Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods (2011) 7.52

Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol (2012) 7.37

Differential expression in RNA-seq: a matter of depth. Genome Res (2011) 7.13

Savant: genome browser for high-throughput sequencing data. Bioinformatics (2010) 6.51

Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell (2015) 6.38

Detecting differential usage of exons from RNA-seq data. Genome Res (2012) 6.34

Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods (2015) 6.11

Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat Biotechnol (2013) 5.89

Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet (2008) 5.78

Swiss-Prot: juggling between evolution and stability. Brief Bioinform (2004) 5.73

Chimeric transcript discovery by paired-end transcriptome sequencing. Proc Natl Acad Sci U S A (2009) 5.30

Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science (2014) 4.91

EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics (2013) 4.79

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol (2013) 4.67

deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data. PLoS Comput Biol (2011) 4.66

Streaming fragment assignment for real-time analysis of sequencing experiments. Nat Methods (2012) 4.43

Global signatures of protein and mRNA expression levels. Mol Biosyst (2009) 4.43

A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics (2013) 4.36

RSeQC: quality control of RNA-seq experiments. Bioinformatics (2012) 4.31

Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics (2012) 4.28

Counting absolute numbers of molecules using unique molecular identifiers. Nat Methods (2011) 4.18

AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res (2006) 4.11

The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol (2014) 4.06

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol (2014) 3.99

The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Methods (2012) 3.89

GC-content normalization for RNA-Seq data. BMC Bioinformatics (2011) 3.89

Qualimap: evaluating next-generation sequencing alignment data. Bioinformatics (2012) 3.62

Accounting for technical noise in single-cell RNA-seq experiments. Nat Methods (2013) 3.61

Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat Methods (2013) 3.59

Synthetic spike-in standards for RNA-seq experiments. Genome Res (2011) 3.58

Statistical design and analysis of RNA sequencing data. Genetics (2010) 3.56

Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet (2014) 3.52

Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat Methods Med Res (2011) 3.44

Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat Biotechnol (2014) 3.43

Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling. Nucleic Acids Res (2010) 3.36

Quantitative single-cell RNA-seq with unique molecular identifiers. Nat Methods (2013) 3.31

Sequencing technology does not eliminate biological variability. Nat Biotechnol (2011) 3.20

Assessment of transcript reconstruction methods for RNA-seq. Nat Methods (2013) 3.11

How does multiple testing correction work? Nat Biotechnol (2009) 2.92

Systematic evaluation of spliced alignment programs for RNA-seq data. Nat Methods (2013) 2.92

Articles by these authors

Erratum to: A survey of best practices for RNA-seq data analysis. Genome Biol (2016) 0.79