Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.

PubWeight™: 19.86‹?› | Rank: Top 0.1% | All-Time Top 10000

🔗 View Article (PMC 2838869)

Published in BMC Bioinformatics on February 18, 2010

Authors

James H Bullard1, Elizabeth Purdom, Kasper D Hansen, Sandrine Dudoit

Author Affiliations

1: Division of Biostatistics, University of California, Berkeley, Berkeley, CA, USA. bullard@berkeley.edu

Associated clinical trials:

Genetic and Other Aspects of Podoconiosis | NCT01939431

Articles citing this

(truncated to the top 100)

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol (2010) 75.21

Differential expression analysis for sequence count data. Genome Biol (2010) 64.56

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics (2011) 25.76

De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc (2013) 13.33

Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol (2011) 10.63

Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol (2010) 10.05

Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res (2010) 9.08

baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics (2010) 8.01

Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res (2012) 7.52

RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res (2012) 7.48

Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol (2012) 7.37

Differential expression in RNA-seq: a matter of depth. Genome Res (2011) 7.13

Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM). Bioinformatics (2011) 6.20

From RNA-seq reads to differential expression results. Genome Biol (2010) 5.77

EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics (2013) 4.79

Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol (2014) 4.74

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol (2013) 4.67

Differential abundance analysis for microbial marker-gene surveys. Nat Methods (2013) 4.49

The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nat Protoc (2012) 4.41

A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics (2013) 4.36

Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat Protoc (2013) 4.00

GC-content normalization for RNA-Seq data. BMC Bioinformatics (2011) 3.89

Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics (2012) 3.74

Synthetic spike-in standards for RNA-seq experiments. Genome Res (2011) 3.58

Quantitative assessment of single-cell RNA-sequencing methods. Nat Methods (2013) 3.52

Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet (2014) 3.52

Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat Methods Med Res (2011) 3.44

Sequencing technology does not eliminate biological variability. Nat Biotechnol (2011) 3.20

A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res (2010) 3.16

ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets. BMC Bioinformatics (2011) 3.12

Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol (2011) 3.12

Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays. PLoS One (2011) 3.01

A quantitative atlas of polyadenylation in five mammals. Genome Res (2012) 2.87

Whole transcriptome sequencing reveals gene expression and splicing differences in brain regions affected by Alzheimer's disease. PLoS One (2011) 2.80

RNA sequencing reveals two major classes of gene expression levels in metazoan cells. Mol Syst Biol (2011) 2.77

Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol (2014) 2.67

Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat Biotechnol (2014) 2.65

A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae. Nucleic Acids Res (2012) 2.53

Computational analysis of bacterial RNA-Seq data. Nucleic Acids Res (2013) 2.50

A powerful and flexible approach to the analysis of RNA sequence count data. Bioinformatics (2011) 2.49

A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics (2012) 2.46

Changes in transcript abundance in Chlamydomonas reinhardtii following nitrogen deprivation predict diversion of metabolism. Plant Physiol (2010) 2.44

A survey of best practices for RNA-seq data analysis. Genome Biol (2016) 2.37

Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics (2011) 2.36

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate. Nucleic Acids Res (2012) 2.36

Adaptive evolution and the birth of CTCF binding sites in the Drosophila genome. PLoS Biol (2012) 2.29

RNA-seq: technical variability and sampling. BMC Genomics (2011) 2.19

Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing. BMC Genomics (2012) 2.19

A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species. Nucleic Acids Res (2010) 2.07

A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. Genome Res (2011) 2.00

Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics (2011) 2.00

Bias detection and correction in RNA-Sequencing data. BMC Bioinformatics (2011) 1.93

Comparison of software packages for detecting differential expression in RNA-seq studies. Brief Bioinform (2013) 1.71

Evaluation of normalization methods in mammalian microRNA-Seq data. RNA (2012) 1.68

Genome and transcriptome analysis of the fungal pathogen Fusarium oxysporum f. sp. cubense causing banana vascular wilt disease. PLoS One (2014) 1.64

RNA-Seq of human neurons derived from iPS cells reveals candidate long non-coding RNAs involved in neurogenesis and neuropsychiatric disorders. PLoS One (2011) 1.60

A low-cost library construction protocol and data analysis pipeline for Illumina-based strand-specific multiplex RNA-seq. PLoS One (2011) 1.60

Transcriptional programs in transient embryonic zones of the cerebral cortex defined by high-resolution mRNA sequencing. Proc Natl Acad Sci U S A (2011) 1.60

Sensitive gene fusion detection using ambiguously mapping RNA-Seq read pairs. Bioinformatics (2011) 1.58

Chromatin state signatures associated with tissue-specific gene expression and enhancer activity in the embryonic limb. Genome Res (2012) 1.57

Liver transcriptomic networks reveal main biological processes associated with feed efficiency in beef cattle. BMC Genomics (2015) 1.52

Statistical Modeling of RNA-Seq Data. Stat Sci (2011) 1.49

GENE-counter: a computational pipeline for the analysis of RNA-Seq data for gene expression differences. PLoS One (2011) 1.49

De novo transcriptome sequence assembly and analysis of RNA silencing genes of Nicotiana benthamiana. PLoS One (2013) 1.48

Transcriptome analyses of the human retina identify unprecedented transcript diversity and 3.5 Mb of novel transcribed sequence via significant alternative splicing and novel genes. BMC Genomics (2013) 1.46

Sequence-dependent but not sequence-specific piRNA adhesion traps mRNAs to the germ plasm. Nature (2016) 1.46

Genomic analysis of parent-of-origin allelic expression in Arabidopsis thaliana seeds. PLoS One (2011) 1.45

svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Res (2014) 1.43

Sequencing and characterization of the guppy (Poecilia reticulata) transcriptome. BMC Genomics (2011) 1.43

From sequence to molecular pathology, and a mechanism driving the neuroendocrine phenotype in prostate cancer. J Pathol (2012) 1.42

Analysis of transcriptome complexity through RNA sequencing in normal and failing murine hearts. Circ Res (2011) 1.41

Knowledge Discovery and interactive Data Mining in Bioinformatics--State-of-the-Art, future challenges and research directions. BMC Bioinformatics (2014) 1.41

TCC: an R package for comparing tag count data with robust normalization strategies. BMC Bioinformatics (2013) 1.35

Experimental design, preprocessing, normalization and differential expression analysis of small RNA sequencing experiments. Silence (2011) 1.35

NPEBseq: nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data. BMC Bioinformatics (2013) 1.34

Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat Biotechnol (2014) 1.33

Genome-wide annotation and quantitation of translation by ribosome profiling. Curr Protoc Mol Biol (2013) 1.30

The characteristic direction: a geometrical approach to identify differentially expressed genes. BMC Bioinformatics (2014) 1.29

Use of mRNA-seq to discriminate contributions to the transcriptome from the constituent genomes of the polyploid crop species Brassica napus. BMC Genomics (2012) 1.26

Ontogeny of the maize shoot apical meristem. Plant Cell (2012) 1.26

Physico-chemical foundations underpinning microarray and next-generation sequencing experiments. Nucleic Acids Res (2013) 1.25

Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size. Bioinformatics (2013) 1.23

Estimation of sequencing error rates in short reads. BMC Bioinformatics (2012) 1.22

RNA-seq and microarray complement each other in transcriptome profiling. BMC Genomics (2012) 1.22

Transcriptomics and molecular evolutionary rate analysis of the bladderwort (Utricularia), a carnivorous plant with a minimal genome. BMC Plant Biol (2011) 1.21

Phenotypic dissection of bone mineral density reveals skeletal site specificity and facilitates the identification of novel loci in the genetic regulation of bone mass attainment. PLoS Genet (2014) 1.21

Complementation contributes to transcriptome complexity in maize (Zea mays L.) hybrids relative to their inbred parents. Genome Res (2012) 1.20

Accounting for biases in riboprofiling data indicates a major role for proline in stalling translation. Genome Res (2014) 1.19

Identification of a pan-cancer oncogenic microRNA superfamily anchored by a central core seed motif. Nat Commun (2013) 1.19

ATTED-II in 2014: evaluation of gene coexpression in agriculturally important plants. Plant Cell Physiol (2013) 1.19

De novo assembly of bacterial transcriptomes from RNA-seq data. Genome Biol (2015) 1.18

Whole transcriptome RNA-Seq analysis of breast cancer recurrence risk using formalin-fixed paraffin-embedded tumor tissue. PLoS One (2012) 1.17

The transcriptional landscape of Chlamydia pneumoniae. Genome Biol (2011) 1.16

A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly. BMC Genomics (2013) 1.16

Length bias correction for RNA-seq data in gene set analyses. Bioinformatics (2011) 1.15

Ribosome Footprint Profiling of Translation throughout the Genome. Cell (2016) 1.14

Deep sequencing-based analysis of the anaerobic stimulon in Neisseria gonorrhoeae. BMC Genomics (2011) 1.14

Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data. Bioinformatics (2011) 1.13

A comparative study of techniques for differential expression analysis on RNA-Seq data. PLoS One (2014) 1.12

Power analysis and sample size estimation for RNA-Seq differential expression. RNA (2014) 1.12

Articles cited by this

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol (2009) 235.12

Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods (2008) 126.81

Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res (1998) 106.16

Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics (2003) 100.88

Accurate whole human genome sequencing using reversible terminator chemistry. Nature (2008) 90.20

RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res (2008) 62.07

Alternative isoform regulation in human tissue transcriptomes. Nature (2008) 52.76

The transcriptional landscape of the yeast genome defined by RNA sequencing. Science (2008) 48.99

The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol (2006) 30.90

Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res (2008) 26.36

Moderated statistical tests for assessing differences in tag abundance. Bioinformatics (2007) 22.87

High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods (2008) 12.56

Determination of tag density required for digital transcriptome analysis: application to an androgen-sensitive prostate cancer model. Proc Natl Acad Sci U S A (2008) 11.86

Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res (2008) 10.10

Evaluation of DNA microarray results with quantitative gene expression platforms. Nat Biotechnol (2006) 8.87

Transcript length bias in RNA-seq data confounds systems biology. Biol Direct (2009) 8.53

GenomeGraphs: integrated genomic data visualization with R. BMC Bioinformatics (2009) 4.16

Novel low abundance and transient RNAs in yeast revealed by tiling microarrays and ultra high-throughput sequencing are not conserved across closely related yeast species. PLoS Genet (2008) 3.69

Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach. BMC Bioinformatics (2005) 3.12

Articles by these authors

Bioconductor: open software development for computational biology and bioinformatics. Genome Biol (2004) 143.19

Diversity of the human intestinal microbial flora. Science (2005) 49.64

Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res (2002) 40.03

Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell (2010) 39.09

The developmental transcriptome of Drosophila melanogaster. Nature (2010) 11.85

Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol (2010) 10.05

Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res (2010) 9.08

Diversity, topographic differentiation, and positional memory in human fibroblasts. Proc Natl Acad Sci U S A (2002) 6.98

Gene expression patterns in human liver cancers. Mol Biol Cell (2002) 6.93

A fine-scale linkage-disequilibrium measure based on length of haplotype sharing. Am J Hum Genet (2006) 6.02

Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics (2014) 5.93

Subtype and pathway specific responses to anticancer compounds in breast cancer. Proc Natl Acad Sci U S A (2011) 4.55

GenomeGraphs: integrated genomic data visualization with R. BMC Bioinformatics (2009) 4.16

GC-content normalization for RNA-Seq data. BMC Bioinformatics (2011) 3.89

Stereotyped and specific gene expression programs in human innate immune responses to bacteria. Proc Natl Acad Sci U S A (2002) 3.85

Novel low abundance and transient RNAs in yeast revealed by tiling microarrays and ultra high-throughput sequencing are not conserved across closely related yeast species. PLoS Genet (2008) 3.69

Sequencing technology does not eliminate biological variability. Nat Biotechnol (2011) 3.20

BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol (2012) 3.13

Survival ensembles. Biostatistics (2005) 2.88

Stage II colon cancer prognosis prediction by tumor gene expression profiling. J Clin Oncol (2006) 2.49

Reversible switching between epigenetic states in honeybee behavioral subcastes. Nat Neurosci (2012) 2.44

Conservation of an RNA regulatory map between Drosophila and mammals. Genome Res (2010) 2.43

Temporal dissection of tumorigenesis in primary cancers. Cancer Discov (2011) 2.40

Functional normalization of 450k methylation array data improves replication in large cancer studies. Genome Biol (2014) 2.37

Multiple testing methods for ChIP-Chip high density oligonucleotide array data. J Comput Biol (2006) 2.08

Exon-level microarray analyses identify alternative splicing programs in breast cancer. Mol Cancer Res (2010) 1.84

A method to increase the power of multiple testing procedures through sample splitting. Stat Appl Genet Mol Biol (2006) 1.81

Polygenic and directional regulatory evolution across pathways in Saccharomyces. Proc Natl Acad Sci U S A (2010) 1.75

A Tetrahymena Piwi bound to mature tRNA 3' fragments activates the exonuclease Xrn2 for RNA processing in the nucleus. Mol Cell (2012) 1.65

Social environment is associated with gene regulatory variation in the rhesus macaque immune system. Proc Natl Acad Sci U S A (2012) 1.63

Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives. Stat Appl Genet Mol Biol (2004) 1.55

Multiple testing. Part II. Step-down procedures for control of the family-wise error rate. Stat Appl Genet Mol Biol (2004) 1.47

Genome-wide identification of alternative splice forms down-regulated by nonsense-mediated mRNA decay in Drosophila. PLoS Genet (2009) 1.46

Asymptotic optimality of likelihood-based cross-validation. Stat Appl Genet Mol Biol (2004) 1.39

The establishment of gene silencing at single-cell resolution. Nat Genet (2009) 1.22

Colon cancer prognosis prediction by gene expression profiling. Oncogene (2005) 1.13

Unifying gene expression measures from multiple platforms using factor analysis. PLoS One (2011) 1.06

Diverse transcriptional programs associated with environmental stress and hormones in the Arabidopsis receptor-like kinase gene family. Mol Plant (2009) 1.03

Exploration of global gene expression in human liver steatosis by high-density oligonucleotide microarray. Lab Invest (2006) 1.02

Ischemic preconditioning modulates the expression of several genes, leading to the overproduction of IL-1Ra, iNOS, and Bcl-2 in a human model of liver ischemia-reperfusion. FASEB J (2005) 0.99

Gene expression profiling of nonneoplastic mucosa may predict clinical outcome of colon cancer patients. Dis Colon Rectum (2005) 0.93

Microarray analysis reveals differences in gene expression of circulating CD8(+) T cells in melanoma patients and healthy donors. Cancer Res (2004) 0.91

Laser captured hepatocytes show association of butyrylcholinesterase gene loss and fibrosis progression in hepatitis C-infected drug users. Hepatology (2012) 0.89

Supervised detection of regulatory motifs in DNA sequences. Stat Appl Genet Mol Biol (2003) 0.82

Public data and open source tools for multi-assay genomic investigation of disease. Brief Bioinform (2015) 0.80

Clustering of mRNA-Seq data based on alternative splicing patterns. Biostatistics (2016) 0.75

[Gene expression profiling in colon cancer]. Bull Acad Natl Med (2007) 0.75

Special issue on computational statistical methods for genomics and systems biology. Stat Appl Genet Mol Biol (2012) 0.75