The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures.

PubWeight™: 1.46‹?› | Rank: Top 5%

🔗 View Article (PMC 3244389)

Published in PLoS One on December 21, 2011

Authors

Anne-Claire Haury1, Pierre Gestraud, Jean-Philippe Vert

Author Affiliations

1: Mines ParisTech, Centre for Computational Biology, Fontainebleau, France. anne-claire.haury@mines-paristech.fr

Articles citing this

Machine Learning methods for Quantitative Radiomic Biomarkers. Sci Rep (2015) 1.19

Prognostic gene signatures for patient stratification in breast cancer: accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions. BMC Bioinformatics (2012) 1.10

A novel multi-tissue RNA diagnostic of healthy ageing relates to cognitive health status. Genome Biol (2015) 0.90

iGPSe: a visual analytic system for integrative genomic based cancer patient stratification. BMC Bioinformatics (2014) 0.86

Algebraic comparison of partial lists in bioinformatics. PLoS One (2012) 0.85

Test on existence of histology subtype-specific prognostic signatures among early stage lung adenocarcinoma and squamous cell carcinoma patients using a Cox-model based filter. Biol Direct (2015) 0.85

Radiomic Machine-Learning Classifiers for Prognostic Biomarkers of Head and Neck Cancer. Front Oncol (2015) 0.85

Using protein interaction database and support vector machines to improve gene signatures for prediction of breast cancer recurrence. J Med Signals Sens (2013) 0.81

T-ReCS: stable selection of dynamically formed groups of features with application to prediction of clinical outcomes. Pac Symp Biocomput (2015) 0.81

A Partial Least Squares based algorithm for parsimonious variable selection. Algorithms Mol Biol (2011) 0.81

Metabolomics as a tool for discovery of biomarkers of autism spectrum disorder in the blood plasma of children. PLoS One (2014) 0.81

Feature selection and classifier performance on diverse bio- logical datasets. BMC Bioinformatics (2014) 0.79

Clustering gene expression regulators: new approach to disease subtyping. PLoS One (2014) 0.79

Regularized logistic regression with network-based pairwise interaction for biomarker identification in breast cancer. BMC Bioinformatics (2016) 0.78

Prognosis Relevance of Serum Cytokines in Pancreatic Cancer. Biomed Res Int (2015) 0.77

Hard Data Analytics Problems Make for Better Data Analysis Algorithms: Bioinformatics as an Example. Big Data (2014) 0.75

Informative gene selection and direct classification of tumor based on Chi-square test of pairwise gene interactions. Biomed Res Int (2014) 0.75

Reproducible detection of disease-associated markers from gene expression data. BMC Med Genomics (2016) 0.75

Identification of long non-coding transcripts with feature selection: a comparative study. BMC Bioinformatics (2017) 0.75

Computing molecular signatures as optima of a bi-objective function: method and application to prediction in oncogenomics. Cancer Inform (2015) 0.75

An experimental study of the intrinsic stability of random forest variable importance measures. BMC Bioinformatics (2016) 0.75

Gene features selection for three-class disease classification via multiple orthogonal partial least square discriminant analysis and S-plot using microarray data. PLoS One (2013) 0.75

Sparse Zero-Sum Games as Stable Functional Feature Selection. PLoS One (2015) 0.75

A feature selection method based on multiple kernel learning with expression profiles of different types. BioData Min (2017) 0.75

biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data. Front Mol Biosci (2016) 0.75

Learning Classification Models of Cognitive Conditions from Subtle Behaviors in the Digital Clock Drawing Test. Mach Learn (2015) 0.75

Feature selection and validated predictive performance in the domain of Legionella pneumophila: a comparative study. BMC Res Notes (2016) 0.75

Stable feature selection based on the ensemble L 1 -norm support vector machine for biomarker discovery. BMC Genomics (2016) 0.75

PPIMpred: a web server for high-throughput screening of small molecules targeting protein-protein interaction. R Soc Open Sci (2017) 0.75

Articles cited by this

Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics (2003) 100.88

Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet (2005) 29.45

Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet (2005) 17.75

Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res (2005) 17.58

NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res (2008) 17.35

Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst (2006) 14.60

Concordance among gene-expression-based predictors for breast cancer. N Engl J Med (2006) 13.50

Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci U S A (2002) 9.83

Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst (2003) 9.68

Gene-expression signatures in breast cancer. N Engl J Med (2009) 9.23

Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res (2005) 8.73

Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. Cancer Res (2006) 8.08

Outcome signature genes in breast cancer: is there a unique set? Bioinformatics (2004) 7.96

Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures. Breast Cancer Res (2008) 6.63

Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci U S A (2006) 6.26

A new method to measure the semantic similarity of GO terms. Bioinformatics (2007) 5.91

Microarrays and molecular research: noise discovery? Lancet (2005) 4.79

Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics (2009) 2.66

A comprehensive analysis of prognostic signatures reveals the high predictive capacity of the proliferation, immune response and RNA splicing modules in breast cancer. Breast Cancer Res (2008) 2.09

A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets. BMC Bioinformatics (2006) 1.57

Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context. BMC Bioinformatics (2010) 1.33

Functional analysis of multiple genomic signatures demonstrates that classification algorithms choose phenotype-related genes. Pharmacogenomics J (2010) 1.16

Pathway analysis reveals functional convergence of gene expression profiles in breast cancer. BMC Med Genomics (2008) 1.00

Predicting prognosis of breast cancer with gene signatures: are we lost in a sea of data? Genome Med (2010) 0.89

Articles by these authors

Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics (2010) 2.79

Classification of microarray data using gene networks. BMC Bioinformatics (2007) 2.44

Protein homology detection using string alignment kernels. Bioinformatics (2004) 2.13

Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics (2008) 1.98

Frequent PTEN genomic alterations and activated phosphatidylinositol 3-kinase pathway in basal-like breast cancer cells. Breast Cancer Res (2008) 1.94

SIRENE: supervised inference of regulatory networks. Bioinformatics (2008) 1.79

Guidelines for splicing analysis in molecular diagnosis derived from a set of 327 combined in silico/in vitro studies on BRCA1 and BRCA2 variants. Hum Mutat (2012) 1.77

TIGRESS: Trustful Inference of Gene REgulation using Stability Selection. BMC Syst Biol (2012) 1.77

Efficient peptide-MHC-I binding prediction for alleles with few known binders. Bioinformatics (2007) 1.74

Molecular profiling of patient-derived breast cancer xenografts. Breast Cancer Res (2012) 1.58

Global alignment of protein-protein interaction networks by graph matching methods. Bioinformatics (2009) 1.49

A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci (2005) 1.38

Supervised reconstruction of biological networks with local models. Bioinformatics (2007) 1.35

Graph kernels for molecular structure-activity relationship analysis with support vector machines. J Chem Inf Model (2005) 1.35

Glycan classification with tree kernels. Bioinformatics (2007) 1.30

Supervised enzyme network inference from the integration of genomic data and chemical information. Bioinformatics (2005) 1.28

A path following algorithm for the graph matching problem. IEEE Trans Pattern Anal Mach Intell (2009) 1.21

Virtual screening of GPCRs: an in silico chemogenomics approach. BMC Bioinformatics (2008) 1.17

A new protein binding pocket similarity measure based on comparison of clouds of atoms in 3D: application to ligand prediction. BMC Bioinformatics (2010) 1.15

The pharmacophore kernel for virtual screening with support vector machines. J Chem Inf Model (2006) 1.12

Large-scale prediction of protein-protein interactions from structures. BMC Bioinformatics (2010) 1.03

ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples. BMC Bioinformatics (2011) 1.00

A structural alignment kernel for protein structures. Bioinformatics (2007) 0.97

Large-scale functional RNAi screen in C. elegans identifies genes that regulate the dysfunction of mutant polyglutamine neurons. BMC Genomics (2012) 0.96

Optimizing amino acid substitution matrices with a local alignment kernel. BMC Bioinformatics (2006) 0.94

Learning smoothing models of copy number profiles using breakpoint annotations. BMC Bioinformatics (2013) 0.93

EMA - A R package for Easy Microarray data analysis. BMC Res Notes (2010) 0.92

Classification of arrayCGH data using fused SVM. Bioinformatics (2008) 0.88

The context-tree kernel for strings. Neural Netw (2005) 0.86

Flux balance impact degree: a new definition of impact degree to properly treat reversible reactions in metabolic networks. Bioinformatics (2013) 0.85

DSIR: assessing the design of highly potent siRNA by testing a set of cancer-relevant target genes. PLoS One (2012) 0.85

Multiple dimensions of epigenetic gene regulation in the malaria parasite Plasmodium falciparum: gene regulation via histone modifications, nucleosome positioning and nuclear architecture in P. falciparum. Bioessays (2014) 0.82

Genome informatics for data-driven biology. Genome Biol (2002) 0.79

Telomere crisis in kidney epithelial cells promotes the acquisition of a microRNA signature retrieved in aggressive renal cell carcinomas. Carcinogenesis (2013) 0.79

A Bayesian active learning strategy for sequential experimental design in systems biology. BMC Syst Biol (2014) 0.77

Virtual screening with support vector machines and structure kernels. Comb Chem High Throughput Screen (2009) 0.76

The Kendall and Mallows Kernels for Permutations. IEEE Trans Pattern Anal Mach Intell (2017) 0.75

Supervised inference of gene regulatory networks from positive and unlabeled examples. Methods Mol Biol (2013) 0.75