Capturing heterogeneity in gene expression studies by surrogate variable analysis.

PubWeight™: 12.08‹?› | Rank: Top 0.1% | All-Time Top 10000

🔗 View Article (PMC 1994707)

Published in PLoS Genet on August 01, 2007

Authors

Jeffrey T Leek1, John D Storey

Author Affiliations

1: Department of Biostatistics, University of Washington, Seattle, Washington, USA.

Associated clinical trials:

Inflammation and the Host Response to Injury (Trauma) | NCT00257231

Articles citing this

(truncated to the top 100)

Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet (2010) 11.82

DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics (2012) 9.31

DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol (2011) 6.48

Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res (2010) 6.03

Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat Biotechnol (2013) 5.89

Temporal dynamics and genetic control of transcription in the human prefrontal cortex. Nature (2011) 5.01

The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics (2012) 4.48

Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics (2012) 4.28

A general framework for multiple testing dependence. Proc Natl Acad Sci U S A (2008) 4.27

Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat Biotechnol (2015) 4.09

Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat Protoc (2013) 4.00

Genetic control of individual differences in gene-specific methylation in human brain. Am J Hum Genet (2010) 3.91

Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol (2014) 3.89

Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics (2008) 3.66

Trans-eQTLs reveal that independent genetic variants associated with a complex phenotype converge on intermediate genes, with a major role for the HLA. PLoS Genet (2011) 3.65

Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int J Epidemiol (2012) 3.62

Sex-specific and lineage-specific alternative splicing in primates. Genome Res (2009) 3.61

An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS One (2009) 3.33

A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput Biol (2010) 3.23

Protecting human health from air pollution: shifting from a single-pollutant to a multipollutant approach. Epidemiology (2010) 3.21

DNA methylation signatures in development and aging of the human prefrontal cortex. Am J Hum Genet (2012) 3.17

Correction for hidden confounders in the genetic analysis of gene expression. Proc Natl Acad Sci U S A (2010) 3.08

Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS One (2011) 2.89

Using control genes to correct for unwanted variation in microarray data. Biostatistics (2011) 2.70

Identification, replication, and functional fine-mapping of expression quantitative trait loci in primary human liver tissue. PLoS Genet (2011) 2.68

Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol (2014) 2.67

Common genetic variants modulate pathogen-sensing responses in human dendritic cells. Science (2014) 2.63

Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics (2014) 2.61

Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res (2012) 2.54

Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat Biotechnol (2013) 2.53

Dissecting the regulatory architecture of gene expression QTLs. Genome Biol (2012) 2.51

A statin-dependent QTL for GATM expression is associated with statin-induced myopathy. Nature (2013) 2.46

Functional normalization of 450k methylation array data improves replication in large cancer studies. Genome Biol (2014) 2.37

Genome-wide DNA methylation analysis for diabetic nephropathy in type 1 diabetes mellitus. BMC Med Genomics (2010) 2.34

Phylogenetic dependency networks: inferring patterns of CTL escape and codon covariation in HIV-1 Gag. PLoS Comput Biol (2008) 2.32

An evaluation of analysis pipelines for DNA methylation profiling using the Illumina HumanMethylation450 BeadChip platform. Epigenetics (2013) 2.28

Cigarette smoking behaviors and time since quitting are associated with differential DNA methylation across the human genome. Hum Mol Genet (2012) 2.24

Global analyses of human immune variation reveal baseline predictors of postvaccination responses. Cell (2014) 2.23

Population genomics in a disease targeted primary cell model. Genome Res (2009) 2.20

A genome-wide gene expression signature of environmental geography in leukocytes of Moroccan Amazighs. PLoS Genet (2008) 2.19

Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat Protoc (2012) 2.14

The relationship between DNA methylation, genetic and expression inter-individual variation in untransformed human fibroblasts. Genome Biol (2014) 2.01

Supervised normalization of microarrays. Bioinformatics (2010) 2.00

Heritability and genomics of gene expression in peripheral blood. Nat Genet (2014) 2.00

Batch effect correction for genome-wide methylation data with Illumina Infinium platform. BMC Med Genomics (2011) 1.84

Enrichment of cis-regulatory gene expression SNPs and methylation quantitative trait loci among bipolar disorder susceptibility variants. Mol Psychiatry (2012) 1.84

Blood-based profiles of DNA methylation predict the underlying distribution of cell types: a validation analysis. Epigenetics (2013) 1.79

Challenges of Big Data Analysis. Natl Sci Rev (2014) 1.79

Age-associated epigenetic drift: implications, and a case of epigenetic thrift? Hum Mol Genet (2013) 1.76

Overcoming bias and systematic errors in next generation sequencing data. Genome Med (2010) 1.75

The effects of genetic variation on gene expression dynamics during development. Nature (2013) 1.75

Significance analysis and statistical dissection of variably methylated regions. Biostatistics (2011) 1.74

A cross-platform analysis of 14,177 expression quantitative trait loci derived from lymphoblastoid cell lines. Genome Res (2013) 1.72

Whole-genome association mapping of gene expression in the human prefrontal cortex. Mol Psychiatry (2010) 1.69

Partitioning the heritability of Tourette syndrome and obsessive compulsive disorder reveals differences in genetic architecture. PLoS Genet (2013) 1.68

Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study. Nat Biotechnol (2014) 1.68

Comments on the analysis of unbalanced microarray data. Bioinformatics (2009) 1.66

DNA methylation contributes to natural human variation. Genome Res (2013) 1.66

Heading down the wrong pathway: on the influence of correlation within gene sets. BMC Genomics (2010) 1.59

Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability. BMC Genomics (2008) 1.45

CpG sites associated with cigarette smoking: analysis of epigenome-wide data from the Sister Study. Environ Health Perspect (2014) 1.44

Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat Rev Genet (2014) 1.44

svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Res (2014) 1.43

Genome-wide association study identifies five loci associated with susceptibility to pancreatic cancer in Chinese populations. Nat Genet (2011) 1.42

A comparative evaluation of data-merging and meta-analysis methods for reconstructing gene-gene interactions. BMC Bioinformatics (2016) 1.41

Using the R Package crlmm for Genotyping and Copy Number Estimation. J Stat Softw (2011) 1.41

Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition. Bioinformatics (2009) 1.40

Practical impacts of genomic data "cleaning" on biological discovery using surrogate variable analysis. BMC Bioinformatics (2015) 1.39

Genomic innovations, transcriptional plasticity and gene loss underlying the evolution and divergence of two highly polyphagous and invasive Helicoverpa pest species. BMC Biol (2017) 1.39

Altered regulation and expression of genes by BET family of proteins in COPD patients. PLoS One (2017) 1.36

Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies. PLoS Comput Biol (2012) 1.36

miR-1202 is a primate-specific and brain-enriched microRNA involved in major depression and antidepressant treatment. Nat Med (2014) 1.35

Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs. PLoS Genet (2013) 1.34

Review of processing and analysis methods for DNA methylation array data. Br J Cancer (2013) 1.33

Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat Biotechnol (2014) 1.33

DNA methylation shows genome-wide association of NFIX, RAPGEF2 and MSRB3 with gestational age at birth. Int J Epidemiol (2012) 1.30

Deciphering normal blood gene expression variation--The NOWAC postgenome study. PLoS Genet (2010) 1.29

Extent, causes, and consequences of small RNA expression variation in human adipose tissue. PLoS Genet (2012) 1.28

Large-scale East-Asian eQTL mapping reveals novel candidate genes for LD mapping and the genomic landscape of transcriptional effects of sequence variants. PLoS One (2014) 1.27

Correction of technical bias in clinical microarray data improves concordance with known biological information. Genome Biol (2008) 1.27

Association between SNPs and gene expression in multiple regions of the human brain. Transl Psychiatry (2012) 1.27

Joint genetic analysis of gene expression data with inferred cellular phenotypes. PLoS Genet (2011) 1.26

A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform. BMC Bioinformatics (2012) 1.26

Beyond modules and hubs: the potential of gene coexpression networks for investigating molecular mechanisms of complex brain disorders. Genes Brain Behav (2013) 1.25

Accounting for population stratification in DNA methylation studies. Genet Epidemiol (2014) 1.25

Genome-wide association study implicates NDST3 in schizophrenia and bipolar disorder. Nat Commun (2013) 1.25

Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics (2012) 1.22

Asymptotic conditional singular value decomposition for high-dimensional genomic data. Biometrics (2010) 1.22

Multi-dimensional correlations for gene coexpression and application to the large-scale data of Arabidopsis. Bioinformatics (2009) 1.17

Molecular signatures from omics data: from chaos to consensus. Biotechnol J (2012) 1.17

Critical reasoning on causal inference in genome-wide linkage and association studies. Trends Genet (2010) 1.15

Systems biology and gene networks in neurodevelopmental and neurodegenerative disorders. Nat Rev Genet (2015) 1.14

A-clustering: a novel method for the detection of co-regulated methylation regions, and regions associated with exposure. Bioinformatics (2013) 1.14

Genetic effects on gene expression across human tissues. Nature (2017) 1.13

Thawing Frozen Robust Multi-array Analysis (fRMA). BMC Bioinformatics (2011) 1.13

Making informed choices about microarray data analysis. PLoS Comput Biol (2010) 1.13

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed. Biostatistics (2015) 1.12

Characteristics and predictive value of blood transcriptome signature in males with autism spectrum disorders. PLoS One (2012) 1.09

Comparative RNA-Seq and microarray analysis of gene expression changes in B-cell lymphomas of Canis familiaris. PLoS One (2013) 1.09

Large-Scale trans-eQTLs Affect Hundreds of Transcripts and Mediate Patterns of Transcriptional Co-regulation. Am J Hum Genet (2017) 1.08

Articles cited by this

Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A (1998) 192.97

Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet (2006) 115.71

Statistical significance for genomewide studies. Proc Natl Acad Sci U S A (2003) 88.64

Exploring the metabolic and genetic control of gene expression on a genomic scale. Science (1997) 60.15

Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res (2002) 40.03

Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell (2000) 36.09

Gene-expression profiles in hereditary breast cancer. N Engl J Med (2001) 29.80

Genetic analysis of genome-wide variation in human gene expression. Nature (2004) 27.28

The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science (2006) 25.99

Genetic dissection of transcriptional regulation in budding yeast. Science (2002) 25.01

Genetics of gene expression surveyed in maize, mouse and man. Nature (2003) 22.17

Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A (2000) 18.42

Analysis of variance for gene expression microarray data. J Comput Biol (2000) 16.28

Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res (2001) 12.92

Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat Genet (2003) 11.84

Significance analysis of time course microarray experiments. Proc Natl Acad Sci U S A (2005) 8.65

Genetic interactions between polymorphisms that affect gene expression in yeast. Nature (2005) 6.42

Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biol (2005) 5.61

Experimental design for gene expression microarrays. Biostatistics (2001) 5.46

Fundamental patterns underlying gene expression profiles: simplicity from complexity. Proc Natl Acad Sci U S A (2000) 5.19

Integrative analysis of the cancer transcriptome. Nat Genet (2005) 4.44

A transcriptional profile of aging in the human kidney. PLoS Biol (2004) 3.69

Correlation between gene expression levels and limitations of the empirical bayes methodology for finding differentially expressed genes. Stat Appl Genet Mol Biol (2005) 3.25

A reanalysis of a published Affymetrix GeneChip control dataset. Genome Biol (2006) 3.17

Fluorescent cDNA microarray hybridization reveals complexity and heterogeneity of cellular genotoxic stress responses. Oncogene (1999) 2.29

Molecular classification of familial non-BRCA1/BRCA2 breast cancer. Proc Natl Acad Sci U S A (2003) 2.10

Assessing stability of gene selection in microarray data analysis. BMC Bioinformatics (2006) 1.96

A new approach to intensity-dependent normalization of two-channel microarrays. Biostatistics (2006) 1.86

Molecular heterogeneity of inflammatory breast cancer: a hyperproliferative phenotype. Clin Cancer Res (2006) 1.67

Some comments on instability of false discovery rate estimation. J Bioinform Comput Biol (2006) 1.51

Treating expression levels of different genes as a sample in microarray data analysis: is it worth a risk? Stat Appl Genet Mol Biol (2006) 1.45

Remarks on Parallel Analysis. Multivariate Behav Res (1992) 1.42

Estimation of false discovery proportion under general dependence. Bioinformatics (2006) 1.35

Articles by these authors

Mapping the genetic architecture of gene expression in human liver. PLoS Biol (2008) 19.44

Precision and functional specificity in mRNA decay. Proc Natl Acad Sci U S A (2002) 8.20

Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A (2003) 7.45

A genomic storm in critically injured humans. J Exp Med (2011) 6.67

Genetic interactions between polymorphisms that affect gene expression in yeast. Nature (2005) 6.42

Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res (2007) 5.57

The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics (2012) 4.48

A general framework for multiple testing dependence. Proc Natl Acad Sci U S A (2008) 4.27

A reanalysis of a published Affymetrix GeneChip control dataset. Genome Biol (2006) 3.17

EDGE: extraction and analysis of differential gene expression. Bioinformatics (2005) 3.14

On the design and analysis of gene expression studies in human populations. Nat Genet (2007) 3.07

Posterior error probabilities and false discovery rates: two sides of the same coin. J Proteome Res (2007) 2.61

A genome-wide gene expression signature of environmental geography in leukocytes of Moroccan Amazighs. PLoS Genet (2008) 2.19

Supervised normalization of microarrays. Bioinformatics (2010) 2.00

Harnessing naturally randomized transcription to infer regulatory relationships among genes. Genome Biol (2007) 1.94

Calibrating the performance of SNP arrays for whole-genome association studies. PLoS Genet (2008) 1.86

A new approach to intensity-dependent normalization of two-channel microarrays. Biostatistics (2006) 1.86

Relaxed significance criteria for linkage analysis. Genetics (2006) 1.70

Human transcriptome array for high-throughput clinical studies. Proc Natl Acad Sci U S A (2011) 1.62

Mapping gene expression quantitative trait loci by singular value decomposition and independent component analysis. BMC Bioinformatics (2008) 1.51

Design and analysis of Bar-seq experiments. G3 (Bethesda) (2014) 1.48

Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry. Bioinformatics (2008) 1.47

QVALITY: non-parametric estimation of q-values and posterior error probabilities. Bioinformatics (2009) 1.38

In vivo regulation of human skeletal muscle gene expression by thyroid hormone. Genome Res (2002) 1.33

Normalization of two-channel microarrays accounting for experimental design and intensity-dependent relationships. Genome Biol (2007) 1.29

System-level analysis of genes and functions affecting survival during nutrient starvation in Saccharomyces cerevisiae. Genetics (2010) 1.16

Dissecting inflammatory complications in critically injured patients by within-patient gene expression changes: a longitudinal clinical genomics study. PLoS Med (2011) 1.11

Optimality driven nearest centroid classification from genomic data. PLoS One (2007) 0.98

Eigen-R2 for dissecting variation in high-dimensional studies. Bioinformatics (2008) 0.97

A computational statistics approach for estimating the spatial range of morphogen gradients. Development (2011) 0.93

A computationally efficient modular optimal discovery procedure. Bioinformatics (2010) 0.87

Identifying and mapping cell-type-specific chromatin programming of gene expression. Proc Natl Acad Sci U S A (2014) 0.85

Understanding newsworthiness of an emerging pandemic: international newspaper coverage of the H1N1 outbreak. Influenza Other Respir Viruses (2012) 0.83

Longitudinal transcriptional analysis of developing neointimal vascular occlusion and pulmonary hypertension in rats. Physiol Genomics (2004) 0.79

Gene set bagging for estimating the probability a statistically significant result will replicate. BMC Bioinformatics (2013) 0.77

Cause and express. Nat Biotechnol (2009) 0.75