The spectrum kernel: a string kernel for SVM protein classification.

PubWeight™: 4.90‹?› | Rank: Top 1%

🔗 View Article (PMID 11928508)

Published in Pac Symp Biocomput on January 01, 2002

Authors

Christina Leslie1, Eleazar Eskin, William Stafford Noble

Author Affiliations

1: Department of Computer Science, Columbia University, New York, NY 10027, USA. cleslie.noble@cs.columbia.edu

Articles citing this

(truncated to the top 100)

Support vector machines and kernels for computational biology. PLoS Comput Biol (2008) 3.16

Predicting linear B-cell epitopes using string kernels. J Mol Recognit (2008) 2.47

Choosing negative examples for the prediction of protein-protein interactions. BMC Bioinformatics (2006) 2.36

KIRMES: kernel-based identification of regulatory modules in euchromatic sequences. Bioinformatics (2009) 2.35

Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res (2011) 2.02

Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics (2008) 1.98

Accurate splice site prediction using support vector machines. BMC Bioinformatics (2007) 1.76

mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res (2009) 1.67

Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput Biol (2014) 1.55

POIMs: positional oligomer importance matrices--understanding support vector machine-based signal detectors. Bioinformatics (2008) 1.51

Classification of sporting activities using smartphone accelerometers. Sensors (Basel) (2013) 1.48

Predicting co-complexed protein pairs from heterogeneous data. PLoS Comput Biol (2008) 1.44

DiANNA 1.1: an extension of the DiANNA web server for ternary cysteine classification. Nucleic Acids Res (2006) 1.42

Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites. BMC Bioinformatics (2004) 1.42

Discriminating between HuR and TTP binding sites using the k-spectrum kernel method. PLoS One (2017) 1.40

MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing. Genome Biol (2014) 1.32

Machine learning applications in genetics and genomics. Nat Rev Genet (2015) 1.32

Predicting flexible length linear B-cell epitopes. Comput Syst Bioinformatics Conf (2008) 1.32

A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis. BMC Bioinformatics (2008) 1.25

L2-norm multiple kernel learning and its application to biomedical data fusion. BMC Bioinformatics (2010) 1.24

Mining physical protein-protein interactions from the literature. Genome Biol (2008) 1.23

Predicting and understanding the stability of G-quadruplexes. Bioinformatics (2009) 1.23

kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets. Nucleic Acids Res (2013) 1.20

Efficient alignment-free DNA barcode analytics. BMC Bioinformatics (2009) 1.18

Learning interpretable SVMs for biological sequence classification. BMC Bioinformatics (2006) 1.17

Virtual screening of GPCRs: an in silico chemogenomics approach. BMC Bioinformatics (2008) 1.17

Integrating diverse datasets improves developmental enhancer prediction. PLoS Comput Biol (2014) 1.16

GISMO--gene identification using a support vector machine for ORF classification. Nucleic Acids Res (2006) 1.14

Using distances between Top-n-gram and residue pairs for protein remote homology detection. BMC Bioinformatics (2014) 1.11

Using amino acid physicochemical distance transformation for fast protein remote homology detection. PLoS One (2012) 1.10

Recombination spot identification Based on gapped k-mers. Sci Rep (2016) 1.09

HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels. Retrovirology (2008) 1.05

Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics. BMC Bioinformatics (2007) 1.05

On evaluating MHC-II binding peptide prediction methods. PLoS One (2008) 1.04

Text mining for literature review and knowledge discovery in cancer risk assessment and research. PLoS One (2012) 1.01

Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition. BMC Bioinformatics (2007) 1.00

Word correlation matrices for protein sequence analysis and remote homology detection. BMC Bioinformatics (2008) 0.97

Gene ontology based transfer learning for protein subcellular localization. BMC Bioinformatics (2011) 0.94

Exploiting physico-chemical properties in string kernels. BMC Bioinformatics (2010) 0.93

Amino acid classification based spectrum kernel fusion for protein subnuclear localization. BMC Bioinformatics (2010) 0.92

Complex networks govern coiled-coil oligomerization--predicting and profiling by means of a machine learning approach. Mol Cell Proteomics (2011) 0.91

DescFold: a web server for protein fold recognition. BMC Bioinformatics (2009) 0.90

MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput Biol (2014) 0.89

Machine learning for in silico virtual screening and chemical genomics: new strategies. Comb Chem High Throughput Screen (2008) 0.89

Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation. Bioinformatics (2013) 0.88

SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps. PLoS Comput Biol (2015) 0.88

Graphlet kernels for prediction of functional residues in protein structures. J Comput Biol (2010) 0.88

Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence. Biomed Res Int (2015) 0.88

Identification of novel DNA repair proteins via primary sequence, secondary structure, and homology. BMC Bioinformatics (2009) 0.87

An empirical study of different approaches for protein classification. ScientificWorldJournal (2014) 0.87

Machine learning for regulatory analysis and transcription factor target prediction in yeast. Syst Synth Biol (2007) 0.86

Classification of protein sequences by means of irredundant patterns. BMC Bioinformatics (2010) 0.85

Automatic detection of exonic splicing enhancers (ESEs) using SVMs. BMC Bioinformatics (2008) 0.84

Building multiclass classifiers for remote homology detection and fold recognition. BMC Bioinformatics (2006) 0.83

Affinity regression predicts the recognition code of nucleic acid-binding proteins. Nat Biotechnol (2015) 0.83

Physicochemical property distributions for accurate and rapid pairwise protein homology detection. BMC Bioinformatics (2010) 0.83

Feature Selection in the Tensor Product Feature Space. Proc IEEE Int Conf Data Min (2009) 0.83

Genome-wide polycomb target gene prediction in Drosophila melanogaster. Nucleic Acids Res (2012) 0.83

Effective automated feature construction and selection for classification of biological sequences. PLoS One (2014) 0.82

Protein sequences classification by means of feature extraction with substitution matrices. BMC Bioinformatics (2010) 0.81

Sequence-based classification using discriminatory motif feature selection. PLoS One (2011) 0.81

NClassG+: A classifier for non-classically secreted Gram-positive bacterial proteins. BMC Bioinformatics (2011) 0.81

ccSVM: correcting Support Vector Machines for confounding factors in biological data classification. Bioinformatics (2011) 0.80

Accelerating the Original Profile Kernel. PLoS One (2013) 0.80

Support vector machine-based mucin-type o-linked glycosylation site prediction using enhanced sequence feature encoding. AMIA Annu Symp Proc (2009) 0.80

Genome-wide chromatin remodeling identified at GC-rich long nucleosome-free regions. PLoS One (2012) 0.80

Multiple instance learning of Calmodulin binding sites. Bioinformatics (2012) 0.80

Estimating evolutionary distances between genomic sequences from spaced-word matches. Algorithms Mol Biol (2015) 0.80

Exploring sequence characteristics related to high-level production of secreted proteins in Aspergillus niger. PLoS One (2012) 0.80

Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features. PLoS One (2016) 0.80

Antigenic heterogeneity of capsid protein VP1 in foot-and-mouth disease virus (FMDV) serotype Asia 1. Adv Appl Bioinform Chem (2013) 0.79

Improved prediction of drug-target interactions using regularized least squares integrating with kernel fusion technique. Anal Chim Acta (2016) 0.79

MHC2SKpan: a novel kernel based approach for pan-specific MHC class II peptide binding prediction. BMC Genomics (2013) 0.79

A coverage criterion for spaced seeds and its applications to support vector machine string kernels and k-mer distances. J Comput Biol (2014) 0.79

A histone arginine methylation localizes to nucleosomes in satellite II and III DNA sequences in the human genome. BMC Genomics (2012) 0.79

Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees. Bioinformatics (2012) 0.79

The limits of de novo DNA motif discovery. PLoS One (2012) 0.79

Prediction of potent shRNAs with a sequential classification algorithm. Nat Biotechnol (2017) 0.79

Classifying transcription factor targets and discovering relevant biological features. Biol Direct (2008) 0.78

SVM2Motif-Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor. PLoS One (2015) 0.78

Efficient feature selection and classification of protein sequence data in bioinformatics. ScientificWorldJournal (2014) 0.78

Prediction of carbohydrate-binding proteins from sequences using support vector machines. Adv Bioinformatics (2010) 0.77

Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation. BMC Bioinformatics (2011) 0.77

Machine learning assisted design of highly active peptides for drug discovery. PLoS Comput Biol (2015) 0.77

TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors. PLoS One (2013) 0.77

Predicting the coupling specificity of GPCRs to G-proteins by support vector machines. Genomics Proteomics Bioinformatics (2005) 0.77

A genetic algorithm-based weighted ensemble method for predicting transposon-derived piRNAs. BMC Bioinformatics (2016) 0.77

The distance-profile representation and its application to detection of distantly related protein families. BMC Bioinformatics (2005) 0.77

Characterization and sequence prediction of structural variations in α-helix. BMC Bioinformatics (2011) 0.77

A multiple kernel learning algorithm for drug-target interaction prediction. BMC Bioinformatics (2016) 0.76

Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLoS One (2015) 0.76

Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels. BMC Syst Biol (2014) 0.76

Mapping the stabilome: a novel computational method for classifying metabolic protein stability. BMC Syst Biol (2012) 0.76

In silico regulatory analysis for exploring human disease progression. Biol Direct (2008) 0.76

Tyrosine Kinase Ligand-Receptor Pair Prediction by Using Support Vector Machine. Adv Bioinformatics (2015) 0.76

Kernel-based logistic regression model for protein sequence without vectorialization. Biostatistics (2014) 0.75

A computational method for designing diverse linear epitopes including citrullinated peptides with desired binding affinities to intravenous immunoglobulin. BMC Bioinformatics (2016) 0.75

Learning virulent proteins from integrated query networks. BMC Bioinformatics (2012) 0.75

Prediction of protein-protein interaction strength using domain features with supervised regression. ScientificWorldJournal (2014) 0.75

Probabilistic inference of biological networks via data integration. Biomed Res Int (2015) 0.75

Articles by these authors

Whole-genome patterns of common DNA variation in three human populations. Science (2005) 21.22

Variance component model to account for sample structure in genome-wide association studies. Nat Genet (2010) 15.52

Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol (2005) 14.29

A comparison of phasing algorithms for trios and unrelated individuals. Am J Hum Genet (2006) 12.45

Efficient control of population structure in model organism association mapping. Genetics (2008) 12.32

Mouse genomic variation and its effect on phenotypes and gene regulation. Nature (2011) 10.66

Quantifying similarity between motifs. Genome Biol (2007) 9.27

Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods (2007) 8.94

FIMO: scanning for occurrences of a given motif. Bioinformatics (2011) 8.89

A sequence-based variation map of 8.27 million SNPs in inbred mouse strains. Nature (2007) 6.77

Haplotype reconstruction from genotype data using Imperfect Phylogeny. Bioinformatics (2004) 6.02

Searching for statistically significant regulatory modules. Bioinformatics (2003) 5.72

Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res (2007) 5.57

Matrix2png: a utility for visualizing matrix data. Bioinformatics (2003) 5.31

Nucleosome positioning signals in genomic DNA. Genome Res (2007) 4.99

Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods (2012) 4.89

Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal Chem (2006) 4.16

A high-resolution association mapping panel for the dissection of complex traits in mice. Genome Res (2010) 3.81

Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res (2012) 3.80

Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics (2008) 3.66

A statistical framework for genomic data fusion. Bioinformatics (2004) 3.64

The Forkhead transcription factor Hcm1 regulates chromosome segregation genes and fills the S-phase gap in the transcriptional circuitry of the cell cycle. Genes Dev (2006) 3.58

Leveraging the HapMap correlation structure in association studies. Am J Hum Genet (2007) 3.57

Transmembrane topology and signal peptide prediction using dynamic bayesian networks. PLoS Comput Biol (2008) 3.56

Kernel methods for predicting protein-protein interactions. Bioinformatics (2005) 3.52

Exploring gene expression data with class scores. Pac Symp Biocomput (2002) 3.51

Mismatch string kernels for discriminative protein classification. Bioinformatics (2004) 3.27

Comparative analysis of proteome and transcriptome variation in mouse. PLoS Genet (2011) 3.23

Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am J Hum Genet (2011) 3.21

Improved linear mixed models for genome-wide association studies. Nat Methods (2012) 3.09

Learning gene functional classifications from multiple data types. J Comput Biol (2002) 2.65

Posterior error probabilities and false discovery rates: two sides of the same coin. J Proteome Res (2007) 2.61

The effect of replication on gene expression microarray experiments. Bioinformatics (2003) 2.49

Whole-genome analysis of Alu repeat elements reveals complex evolutionary history. Genome Res (2004) 2.37

Choosing negative examples for the prediction of protein-protein interactions. BMC Bioinformatics (2006) 2.36

Rapid and accurate multiple testing correction and power estimation for millions of correlated markers. PLoS Genet (2009) 2.30

Sequence and chromatin determinants of cell-type-specific transcription factor binding. Genome Res (2012) 2.24

Genetic control of obesity and gut microbiota composition in response to high-fat, high-sucrose diet in mice. Cell Metab (2013) 2.17

A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. J Proteome Res (2003) 2.16

Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J Comput Biol (2003) 2.04

Large-scale identification of yeast integral membrane protein interactions. Proc Natl Acad Sci U S A (2005) 2.03

A model-based approach for analysis of spatial structure in genetic data. Nat Genet (2012) 2.02

Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics (2010) 1.98

Peptide charge state determination for low-resolution tandem mass spectra. Proc IEEE Comput Syst Bioinform Conf (2005) 1.97

Genome-wide analysis reveals novel genes influencing temporal lobe structure with relevance to neurodegeneration in Alzheimer's disease. Neuroimage (2010) 1.94

Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data. J Proteome Res (2010) 1.91

Searching genomes for noncoding RNA using FastR. IEEE/ACM Trans Comput Biol Bioinform (2006) 1.90

Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations. Genome Res (2008) 1.90

Learning to predict protein-protein interactions from protein sequences. Bioinformatics (2003) 1.84

Improvements to the percolator algorithm for Peptide identification from shotgun proteomics data sets. J Proteome Res (2009) 1.80

Genome-wide association studies in mice. Nat Rev Genet (2012) 1.75

High-resolution mapping of gene expression using association in an outbred mouse stock. PLoS Genet (2008) 1.72

Fine mapping in 94 inbred mouse strains using a high-density haplotype resource. Genetics (2010) 1.71

Mouse genome-wide association and systems genetics identify Asxl2 as a regulator of bone mineral density and osteoclastogenesis. PLoS Genet (2011) 1.68

Identification and functional validation of the novel antimalarial resistance locus PF10_0355 in Plasmodium falciparum. PLoS Genet (2011) 1.68

Protein ranking: from local to global structure in the protein similarity network. Proc Natl Acad Sci U S A (2004) 1.67

Predicting human nucleosome occupancy from primary sequence. PLoS Comput Biol (2008) 1.66

Support vector machine classification on the web. Bioinformatics (2004) 1.65

Polymorphisms and haplotypes of the regulator of G protein signaling-2 gene in normotensives and hypertensives. Hypertension (2006) 1.61

Analysis of genetic variation in Ashkenazi Jews by high density SNP genotyping. BMC Genet (2008) 1.53

High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions. PLoS Comput Biol (2010) 1.52

Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification. Bioinformatics (2008) 1.48

Epigenetic priors for identifying active transcription factor binding sites. Bioinformatics (2011) 1.47

Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry. Bioinformatics (2008) 1.47

Predicting co-complexed protein pairs from heterogeneous data. PLoS Comput Biol (2008) 1.44

Statistical calibration of the SEQUEST XCorr function. J Proteome Res (2009) 1.43

Hybrid mouse diversity panel: a panel of inbred mouse strains suitable for analysis of complex genetic traits. Mamm Genome (2012) 1.43

Semi-supervised protein classification using cluster kernels. Bioinformatics (2005) 1.42

Ranking predicted protein structures with support vector regression. Proteins (2008) 1.41

QVALITY: non-parametric estimation of q-values and posterior error probabilities. Bioinformatics (2009) 1.38

Faster SEQUEST searching for peptide identification from tandem mass spectra. J Proteome Res (2011) 1.33

Catecholamine release-inhibitory peptide catestatin (chromogranin A(352-372)): naturally occurring amino acid variant Gly364Ser causes profound changes in human autonomic activity and alters risk for hypertension. Circulation (2007) 1.32

Gene networks associated with conditional fear in mice identified using a systems genetics approach. BMC Syst Biol (2011) 1.30

The Minnesota Center for Twin and Family Research genome-wide association study. Twin Res Hum Genet (2012) 1.29

Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure. Bioinformatics (2006) 1.27

Genome-wide association mapping with longitudinal data. Genet Epidemiol (2012) 1.27

Identification of novel genes that mediate innate immunity using inbred mice. Genetics (2009) 1.22

An optimal weighted aggregated association test for identification of rare variants involved in common diseases. Genetics (2011) 1.21

Improving tandem mass spectrum identification using peptide retention time prediction across diverse chromatography conditions. Anal Chem (2007) 1.19

Riboproteomics of the hepatitis C virus internal ribosomal entry site. J Proteome Res (2004) 1.19

Interpreting meta-analyses of genome-wide association studies. PLoS Genet (2012) 1.19

Effectively identifying eQTLs from multiple tissues by combining mixed model and meta-analytic approaches. PLoS Genet (2013) 1.18

Consistent probabilistic outputs for protein function prediction. Genome Biol (2008) 1.17

On the assessment of statistical significance of three-dimensional colocalization of sets of genomic elements. Nucleic Acids Res (2012) 1.13

"Good enough solutions" and the genetics of complex diseases. Circ Res (2012) 1.13

Improved similarity scores for comparing motifs. Bioinformatics (2011) 1.10

Detecting cross-linked peptides by searching against a database of cross-linked peptide pairs. J Proteome Res (2010) 1.09

Using network component analysis to dissect regulatory networks mediated by transcription factors in yeast. PLoS Comput Biol (2009) 1.08

Mixed models can correct for population structure for genomic regions under selection. Nat Rev Genet (2013) 1.06

Crux: rapid open source protein tandem mass spectrometry analysis. J Proteome Res (2014) 1.06

The Genomedata format for storing large-scale functional genomics data. Bioinformatics (2010) 1.05

Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP. Genome Res (2005) 1.05

Imputation aware meta-analysis of genome-wide association studies. Genet Epidemiol (2010) 1.05

On using samples of known protein content to assess the statistical calibration of scores assigned to peptide-spectrum matches in shotgun proteomics. J Proteome Res (2011) 1.03

EMINIM: an adaptive and memory-efficient algorithm for genotype imputation. J Comput Biol (2010) 1.03

Exploratory analysis of genomic segmentations with Segtools. BMC Bioinformatics (2011) 1.02

Genome-wide association mapping of blood cell traits in mice. Mamm Genome (2013) 1.01

Learning kernels from biological networks by maximizing entropy. Bioinformatics (2004) 1.01