An application of Random Forests to a genome-wide association dataset: methodological considerations & new findings.

PubWeight™: 1.96‹?› | Rank: Top 2%

🔗 View Article (PMC 2896336)

Published in BMC Genet on June 14, 2010

Authors

Benjamin A Goldstein1, Alan E Hubbard, Adele Cutler, Lisa F Barcellos

Author Affiliations

1: Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA. bgoldstein@genepi.berkeley.edu

Articles citing this

Random forests for genetic association studies. Stat Appl Genet Mol Biol (2011) 1.45

SNP interaction detection with Random Forests in high-dimensional genetic data. BMC Bioinformatics (2012) 1.32

Genome-wide prediction of discrete traits using Bayesian regressions and machine learning. Genet Sel Evol (2011) 1.30

Regularized machine learning in the genetic prediction of complex traits. PLoS Genet (2014) 1.15

Plasma metabolomic profiles in different stages of CKD. Clin J Am Soc Nephrol (2012) 0.98

Risk estimation and risk prediction using machine-learning methods. Hum Genet (2012) 0.98

Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes. BMC Bioinformatics (2013) 0.97

Pathway-based identification of SNPs predictive of survival. Eur J Hum Genet (2011) 0.95

High-dimensional pharmacogenetic prediction of a continuous trait using machine learning techniques with application to warfarin dose prediction in African Americans. Bioinformatics (2011) 0.95

A Weighted Random Forests Approach to Improve Predictive Performance. Stat Anal Data Min (2013) 0.94

Evidence for CRHR1 in multiple sclerosis using supervised machine learning and meta-analysis in 12,566 individuals. Hum Mol Genet (2010) 0.93

Impact of natural genetic variation on gene expression dynamics. PLoS Genet (2013) 0.93

Genome-wide association study for backfat thickness in Canchim beef cattle using Random Forest approach. BMC Genet (2013) 0.92

Mapping of the circulating metabolome reveals α-ketoglutarate as a predictor of morbid obesity-associated non-alcoholic fatty liver disease. Int J Obes (Lond) (2014) 0.91

Exploiting SNP correlations within random forest for genome-wide association studies. PLoS One (2014) 0.89

Serum metabolomic profiling in acute alcoholic hepatitis identifies multiple dysregulated pathways. PLoS One (2014) 0.89

Integrative analysis using module-guided random forests reveals correlated genetic factors related to mouse weight. PLoS Comput Biol (2013) 0.89

Contemporary Considerations for Constructing a Genetic Risk Score: An Empirical Approach. Genet Epidemiol (2015) 0.88

An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data. Anal Chim Acta (2013) 0.88

Predicting adaptive phenotypes from multilocus genotypes in Sitka spruce (Picea sitchensis) using random forest. G3 (Bethesda) (2012) 0.86

DNA methylation loci associated with atopy and high serum IgE: a genome-wide application of recursive Random Forest feature selection. Genome Med (2015) 0.86

Towards the identification of the loci of adaptive evolution. Methods Ecol Evol (2015) 0.85

An integrated approach to reduce the impact of minor allele frequency and linkage disequilibrium on variable importance measures for genome-wide data. Bioinformatics (2012) 0.83

Oxidative Stress and Metabolic Perturbations in Wooden Breast Disorder in Chickens. PLoS One (2016) 0.82

Large-scale risk prediction applied to Genetic Analysis Workshop 17 mini-exome sequence data. BMC Proc (2011) 0.82

Performance of random forests and logic regression methods using mini-exome sequence data. BMC Proc (2011) 0.82

Integrative systems biology approaches in asthma pharmacogenomics. Pharmacogenomics (2012) 0.81

Puf3p induces translational repression of genes linked to oxidative stress. Nucleic Acids Res (2013) 0.81

TRM: a powerful two-stage machine learning approach for identifying SNP-SNP interactions. Ann Hum Genet (2011) 0.81

A unified sparse representation for sequence variant identification for complex traits. Genet Epidemiol (2014) 0.81

Modeling X Chromosome Data Using Random Forests: Conquering Sex Bias. Genet Epidemiol (2015) 0.81

Systems biology data analysis methodology in pharmacogenomics. Pharmacogenomics (2011) 0.80

Random forest fishing: a novel approach to identifying organic group of risk factors in genome-wide association studies. Eur J Hum Genet (2013) 0.80

Alteration in circulating metabolites during and after heat stress in the conscious rat: potential biomarkers of exposure and organ-specific injury. BMC Physiol (2014) 0.80

Hierarchical Naive Bayes for genetic association studies. BMC Bioinformatics (2012) 0.79

EPAS1 gene variants are associated with sprint/power athletic performance in two cohorts of European athletes. BMC Genomics (2014) 0.79

Correction for population stratification in random forest analysis. Int J Epidemiol (2012) 0.79

Identifying CpG sites associated with eczema via random forest screening of epigenome-scale DNA methylation. Clin Epigenetics (2015) 0.78

Detection of Hereditary 1,25-Hydroxyvitamin D-Resistant Rickets Caused by Uniparental Disomy of Chromosome 12 Using Genome-Wide Single Nucleotide Polymorphism Array. PLoS One (2015) 0.78

Application of multi-SNP approaches Bayesian LASSO and AUC-RF to detect main effects of inflammatory-gene variants associated with bladder cancer risk. PLoS One (2013) 0.78

Gene-Gene Interaction Among WNT Genes for Oral Cleft in Trios. Genet Epidemiol (2015) 0.78

Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data. J Data Mining Genomics Proteomics (2013) 0.78

Variants of Interleukin-7/Interleukin-7 Receptor Alpha are Associated with Both Neuromyelitis Optica and Multiple Sclerosis Among Chinese Han Population in Southeastern China. Chin Med J (Engl) (2015) 0.77

Neural activity tied to reading predicts individual differences in extended-text comprehension. Front Hum Neurosci (2013) 0.76

A forest-based feature screening approach for large-scale genome data with complex structures. BMC Genet (2015) 0.76

Building a genetic risk model for bipolar disorder from genome-wide association data with random forest algorithm. Sci Rep (2017) 0.75

Evaluation of potential novel variations and their interactions related to bipolar disorders: analysis of genome-wide association study data. Neuropsychiatr Dis Treat (2016) 0.75

Metabolic profiling reveals biochemical pathways and potential biomarkers associated with the pathogenesis of Krabbe disease. J Neurosci Res (2016) 0.75

Lipid profile of human synovial fluid following intra-articular ankle fracture. J Orthop Res (2016) 0.75

Identification of immune correlates of protection in Shigella infection by application of machine learning. J Biomed Inform (2017) 0.75

Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology. Environ Monit Assess (2017) 0.75

Articles cited by this

PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet (2007) 209.92

Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature (2007) 144.95

Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet (2007) 24.68

Risk alleles for multiple sclerosis identified by a genomewide study. N Engl J Med (2007) 17.06

Gene selection and classification of microarray data using random forest. BMC Bioinformatics (2006) 12.45

Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics (2007) 8.23

How to interpret a genome-wide association study. JAMA (2008) 7.54

Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nat Genet (2009) 7.16

Screening large-scale association study data: exploiting interactions using random forests. BMC Genet (2004) 5.76

Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol (2005) 5.43

Genome-wide association study identifies new multiple sclerosis susceptibility loci on chromosomes 12 and 20. Nat Genet (2009) 4.97

Identifying interacting SNPs using Monte Carlo logic regression. Genet Epidemiol (2005) 4.18

The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases. BMC Genet (2006) 3.29

Performance of random forest when SNPs are in linkage disequilibrium. BMC Bioinformatics (2009) 1.81

Multiple sclerosis genetics: leaving no stone unturned. Genes Immun (2005) 1.56

Multifactor dimensionality reduction: an analysis strategy for modelling and detecting gene-gene interactions in human genetics and pharmacogenomics studies. Hum Genomics (2006) 1.37

Classification of rheumatoid arthritis status with candidate gene and genome-wide single-nucleotide polymorphisms using random forests. BMC Proc (2007) 1.24

Association analysis of 528 intra-genic SNPs in a region of chromosome 10 linked to late onset Alzheimer's disease. Am J Med Genet B Neuropsychiatr Genet (2008) 1.04

Analysis of multiple single nucleotide polymorphisms of candidate genes related to coronary heart disease susceptibility by using support vector machines. Clin Chem Lab Med (2003) 1.03

Role of interleukin-7 in degenerative and inflammatory joint diseases. Arthritis Res Ther (2008) 0.93

Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests. BMC Proc (2007) 0.90

Application of two machine learning algorithms to genetic association studies in the presence of covariates. BMC Genet (2008) 0.90

Phactr2 and Parkinson's disease. Neurosci Lett (2009) 0.87

Glucocorticoids plus opioids up-regulate genes that influence neuronal function. Cell Mol Neurobiol (2007) 0.82

Articles by these authors

(truncated to the top 100)

Risk alleles for multiple sclerosis identified by a genomewide study. N Engl J Med (2007) 17.06

Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature (2011) 13.23

Random forests for classification in ecology. Ecology (2007) 10.10

Super learner. Stat Appl Genet Mol Biol (2007) 5.73

Interleukin 7 receptor alpha chain (IL7R) shows allelic and functional association with multiple sclerosis. Nat Genet (2007) 5.09

Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat Genet (2013) 4.62

Mapping multiple sclerosis susceptibility to the HLA-DR locus in African Americans. Am J Hum Genet (2003) 2.71

Longitudinal change of biomarkers in cognitive decline. Arch Neurol (2011) 2.51

A high-density screen for linkage in multiple sclerosis. Am J Hum Genet (2005) 2.50

Occupational exposure to formaldehyde, hematotoxicity, and leukemia-specific chromosome changes in cultured myeloid progenitor cells. Cancer Epidemiol Biomarkers Prev (2010) 2.34

Gene copy number regulates the production of the human chemokine CCL3-L1. Eur J Immunol (2002) 2.24

Seasonality of rotavirus disease in the tropics: a systematic review and meta-analysis. Int J Epidemiol (2008) 2.14

A second major histocompatibility complex susceptibility locus for multiple sclerosis. Ann Neurol (2007) 1.99

Pharmacogenetic analysis of lithium-induced delayed aging in Caenorhabditis elegans. J Biol Chem (2007) 1.86

Causal inference methods to study nonrandomized, preexisting development interventions. Proc Natl Acad Sci U S A (2010) 1.86

Clustering of inflammatory bowel disease with immune mediated diseases among members of a northern california-managed care organization. Am J Gastroenterol (2007) 1.78

118 SNPs of folate-related genes and risks of spina bifida and conotruncal heart defects. BMC Med Genet (2009) 1.74

Do changes in spousal employment status lead to domestic violence? Insights from a prospective study in Bangalore, India. Soc Sci Med (2009) 1.66

Genome-wide pharmacogenomic analysis of the response to interferon beta therapy in multiple sclerosis. Arch Neurol (2008) 1.59

Fine-mapping the genetic association of the major histocompatibility complex in multiple sclerosis: HLA and non-HLA effects. PLoS Genet (2013) 1.55

Drivers of water quality variability in northern coastal Ecuador. Environ Sci Technol (2009) 1.47

Insulin-like signaling determines survival during stress via posttranscriptional mechanisms in C. elegans. Cell Metab (2010) 1.46

Optimal recall period for caregiver-reported illness in risk factor and intervention studies: a multicountry study. Am J Epidemiol (2013) 1.46

Cluster-randomised controlled trials of individual and combined water, sanitation, hygiene and nutritional interventions in rural Bangladesh and Kenya: the WASH Benefits study design and rationale. BMJ Open (2013) 1.39

Temporal transcriptomic microarray analysis of "Dehalococcoides ethenogenes" strain 195 during the transition into stationary phase. Appl Environ Microbiol (2008) 1.31

Toxicogenomic profiling of chemically exposed humans in risk assessment. Mutat Res (2010) 1.30

Lower-body function, neighborhoods, and walking in an older population. Am J Prev Med (2010) 1.26

The R620W polymorphism of the protein tyrosine phosphatase PTPN22 is not associated with multiple sclerosis. Am J Hum Genet (2005) 1.26

Novel childhood ALL susceptibility locus BMI1-PIP4K2A is specifically associated with the hyperdiploid subtype. Blood (2013) 1.25

Discovery of novel biomarkers by microarray analysis of peripheral blood mononuclear cell gene expression in benzene-exposed workers. Environ Health Perspect (2005) 1.24

Aberrations in chromosomes associated with lymphoma and therapy-related leukemia in benzene-exposed workers. Environ Mol Mutagen (2007) 1.17

Lack of support for association between the KIF1B rs10492972[C] variant and multiple sclerosis. Nat Genet (2010) 1.16

Quantile-function based null distribution in resampling based multiple testing. Stat Appl Genet Mol Biol (2006) 1.14

Uncoupling the roles of HLA-DRB1 and HLA-DRB5 genes in multiple sclerosis. J Immunol (2008) 1.13

Evidence for both innate and acquired mechanisms of protection from Plasmodium falciparum in children with sickle cell trait. Blood (2012) 1.13

Genetic analysis of multiple sclerosis in Europeans. J Neuroimmunol (2003) 1.12

Prospective study of Dietary Approaches to Stop Hypertension- and Mediterranean-style dietary patterns and age-related cognitive change: the Cache County Study on Memory, Health and Aging. Am J Clin Nutr (2013) 1.11

Association of the truncating splice site mutation in BTNL2 with multiple sclerosis is secondary to HLA-DRB1*15. Hum Mol Genet (2005) 1.10

Global gene expression profiling of a population exposed to a range of benzene levels. Environ Health Perspect (2010) 1.10

DNA Macroarray profiling of Lactococcus lactis subsp. lactis IL1403 gene expression during environmental stresses. Appl Environ Microbiol (2004) 1.08

The association and linkage of the HLA-A2 class I allele with autism. Hum Immunol (2006) 1.07

Confirmation of the association of the C4B null allelle in autism. Hum Immunol (2005) 1.07

Empirical Bayes and resampling based multiple testing procedure controlling tail probability of the proportion of false positives. Stat Appl Genet Mol Biol (2005) 1.07

Inequalities in body mass index and smoking behavior in 70 countries: evidence for a social transition in chronic disease risk. Am J Epidemiol (2012) 1.05

Gel versus capillary electrophoresis genotyping for categorizing treatment outcomes in two anti-malarial trials in Uganda. Malar J (2010) 1.04

Associations between single nucleotide polymorphisms in iron-related genes and iron status in multiethnic populations. PLoS One (2012) 1.03

Chromosome-wide aneuploidy study (CWAS) in workers exposed to an established leukemogen, benzene. Carcinogenesis (2011) 1.03

Unraveling multiple MHC gene associations with systemic lupus erythematosus: model choice indicates a role for HLA alleles and non-HLA genes in Europeans. Am J Hum Genet (2012) 1.02

The transmission disequilibrium test suggests that HLA-DR4 and DR13 are linked to autism spectrum disorder. Hum Immunol (2002) 1.02

Genome-wide association study identifies genetic loci associated with iron deficiency. PLoS One (2011) 1.01

Changes in the peripheral blood transcriptome associated with occupational benzene exposure identified by cross-comparison on two microarray platforms. Genomics (2009) 1.01

Use of OctoChrome fluorescence in situ hybridization to detect specific aneuploidy among all 24 chromosomes in benzene-exposed workers. Chem Biol Interact (2005) 1.01

Multiple susceptibility loci for multiple sclerosis. Hum Mol Genet (2002) 1.01

Simulation methods to estimate design power: an overview for applied research. BMC Med Res Methodol (2011) 1.01

CIITA variation in the presence of HLA-DRB1*1501 increases risk for multiple sclerosis. Hum Mol Genet (2010) 1.01

Genetic variants in the folate pathway and risk of childhood acute lymphoblastic leukemia. Cancer Causes Control (2011) 1.00

Searching for additional disease loci in a genomic region. Adv Genet (2008) 1.00

Real-time visualization of cytoplasmic calpain activation and calcium deregulation in acute glutamate excitotoxicity. J Neurochem (2009) 1.00

Systems biology of human benzene exposure. Chem Biol Interact (2009) 0.99

Complex gene-gene interactions in multiple sclerosis: a multifactorial approach reveals associations with inflammatory genes. Neurogenetics (2006) 0.97

Genetic variants in ARID5B and CEBPE are childhood ALL susceptibility loci in Hispanics. Cancer Causes Control (2013) 0.97

Pregnancy intentions and teenage pregnancy among Latinas: a mediation analysis. Perspect Sex Reprod Health (2010) 0.97

Environmental justice implications of arsenic contamination in California's San Joaquin Valley: a cross-sectional, cluster-design examining exposure and compliance in community drinking water systems. Environ Health (2012) 0.97

The histone deacetylase inhibitor trichostatin a has genotoxic effects in human lymphoblasts in vitro. Toxicol Sci (2006) 0.97

Predictive ability and stability of adolescents' pregnancy intentions in a predominantly Latino community. Stud Fam Plann (2010) 0.97

Inverse probability weighting in sexually transmitted infection/human immunodeficiency virus prevention research: methods for evaluating social and community interventions. Sex Transm Dis (2010) 0.97

Association of polymorphisms in the apolipoprotein E region with susceptibility to and progression of multiple sclerosis. Am J Hum Genet (2002) 0.96

Nonrandom aneuploidy of chromosomes 1, 5, 6, 7, 8, 9, 11, 12, and 21 induced by the benzene metabolites hydroquinone and benzenetriol. Environ Mol Mutagen (2005) 0.95

Interrogating the complex role of chromosome 16p13.13 in multiple sclerosis susceptibility: independent genetic signals in the CIITA-CLEC16A-SOCS1 gene complex. Hum Mol Genet (2011) 0.95

Organophosphorous pesticide breakdown products in house dust and children's urine. J Expo Sci Environ Epidemiol (2012) 0.94

Depressive symptoms in low-income women in rural Mexico. Epidemiology (2007) 0.93

Evidence for CRHR1 in multiple sclerosis using supervised machine learning and meta-analysis in 12,566 individuals. Hum Mol Genet (2010) 0.93

Variation within DNA repair pathway genes and risk of multiple sclerosis. Am J Epidemiol (2010) 0.93

Eccentric exercise activates novel transcriptional regulation of hypertrophic signaling pathways not affected by hormone changes. PLoS One (2010) 0.93

The HLA locus and multiple sclerosis in Spain. Role in disease susceptibility, clinical course and response to interferon-beta. J Neuroimmunol (2002) 0.93

Genomic ancestry and somatic alterations correlate with age at diagnosis in Hispanic children with B-cell acute lymphoblastic leukemia. Am J Hematol (2014) 0.92

Genome-wide functional profiling reveals genes required for tolerance to benzene metabolites in yeast. PLoS One (2011) 0.91

Microarray analysis of gene expression in peripheral blood mononuclear cells from dioxin-exposed human subjects. Toxicology (2006) 0.90

Leukaemia-specific chromosome damage detected by comet with fluorescence in situ hybridization (comet-FISH). Mutagenesis (2007) 0.90

CYP1A1/2 haplotypes and lung cancer and assessment of confounding by population stratification. Cancer Res (2009) 0.90

Using variable importance measures from causal inference to rank risk factors of schistosomiasis infection in a rural setting in China. Epidemiol Perspect Innov (2010) 0.90

Comparison of statistical methods for estimating genetic admixture in a lung cancer study of African Americans and Latinos. Am J Epidemiol (2008) 0.89

Variation in xenobiotic transport and metabolism genes, household chemical exposures, and risk of childhood acute lymphoblastic leukemia. Cancer Causes Control (2012) 0.88

Polymorphisms in ghrelin and neuropeptide Y genes are associated with non-Hodgkin lymphoma. Cancer Epidemiol Biomarkers Prev (2005) 0.88

Association of genetic variation in cystathionine-beta-synthase and arsenic metabolism. Environ Res (2010) 0.88

Use of 'Omic' technologies to study humans exposed to benzene. Chem Biol Interact (2005) 0.88

Effect of chemical mutagens and carcinogens on gene expression profiles in human TK6 cells. PLoS One (2012) 0.88

GATA3 risk alleles are associated with ancestral components in Hispanic children with ALL. Blood (2013) 0.87

Socioeconomic status and lung cancer: unraveling the contribution of genetic admixture. Am J Public Health (2013) 0.86

Male microchimerism in peripheral blood leukocytes from women with multiple sclerosis. Chimerism (2011) 0.86

Worker exposure to volatile organic compounds in the vehicle repair industry. J Occup Environ Hyg (2007) 0.85

Genetic polymorphisms in adaptive immunity genes and childhood acute lymphoblastic leukemia. Cancer Epidemiol Biomarkers Prev (2010) 0.85

Investigation of seven proposed regions of linkage in multiple sclerosis: an American and French collaborative study. Neurogenetics (2003) 0.85

HLA-DP genetic variation, proxies for early life immune modulation and childhood acute lymphoblastic leukemia risk. Blood (2012) 0.83

Haplotypes of DNA repair and cell cycle control genes, X-ray exposure, and risk of childhood acute lymphoblastic leukemia. Cancer Causes Control (2011) 0.83

Association between celiac disease and iron deficiency in Caucasians, but not non-Caucasians. Clin Gastroenterol Hepatol (2013) 0.83

Statistical estimation of parameters in a disease transmission model: analysis of a Cryptosporidium outbreak. Stat Med (2002) 0.82

Fetal growth and body size genes and risk of childhood acute lymphoblastic leukemia. Cancer Causes Control (2012) 0.80

A comparison of methods to control type I errors in microarray studies. Stat Appl Genet Mol Biol (2007) 0.80

Post-abortion depot medroxyprogesterone acetate continuation rates: a randomized trial of cyclic estradiol. Contraception (2002) 0.79