Sequence-based feature prediction and annotation of proteins.

PubWeight™: 1.28‹?› | Rank: Top 10%

🔗 View Article (PMC 2688272)

Published in Genome Biol on February 02, 2009

Authors

Agnieszka S Juncker1, Lars J Jensen, Andrea Pierleoni, Andreas Bernsel, Michael L Tress, Peer Bork, Gunnar von Heijne, Alfonso Valencia, Christos A Ouzounis, Rita Casadio, Søren Brunak

Author Affiliations

1: Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Lyngby, Denmark.

Articles citing this

The what, where, how and why of gene ontology--a primer for bioinformaticians. Brief Bioinform (2011) 1.61

Using bioinformatics to predict the functional impact of SNVs. Bioinformatics (2010) 1.49

Sma3s: a three-step modular annotator for large sequence datasets. DNA Res (2014) 1.23

FINDSITE-metal: integrating evolutionary information and machine learning for structure-based metal-binding site prediction at the proteome level. Proteins (2010) 1.00

Annotations for all by all - the BioSapiens network. Genome Biol (2009) 1.00

FFPred 2.0: improved homology-independent prediction of gene ontology terms for eukaryotic protein sequences. PLoS One (2013) 0.98

From protein sequences to 3D-structures and beyond: the example of the UniProt knowledgebase. Cell Mol Life Sci (2009) 0.95

Ligand-binding site prediction of proteins based on known fragment-fragment interactions. Bioinformatics (2010) 0.91

Structure-based function discovery of an enzyme for the hydrolysis of phosphorylated sugar lactones. Biochemistry (2012) 0.88

Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features. BMC Bioinformatics (2012) 0.87

eFindSite: improved prediction of ligand binding sites in protein models using meta-threading, machine learning and auxiliary ligands. J Comput Aided Mol Des (2013) 0.86

Hierarchical ensemble methods for protein function prediction. ISRN Bioinform (2014) 0.84

A new approach to assess and predict the functional roles of proteins across all known structures. J Struct Funct Genomics (2011) 0.83

Predicting DNA-binding specificities of eukaryotic transcription factors. PLoS One (2010) 0.79

Accuracy of functional surfaces on comparatively modeled protein structures. J Struct Funct Genomics (2011) 0.78

The utility of geometrical and chemical restraint information extracted from predicted ligand-binding sites in protein structure refinement. J Struct Biol (2010) 0.78

Optimization of gene set annotations via entropy minimization over variable clusters (EMVC). Bioinformatics (2014) 0.77

Protein identification problem from a Bayesian point of view. Stat Interface (2012) 0.76

Seqenv: linking sequences to environments through text mining. PeerJ (2016) 0.75

Exploring the "dark matter" of a mammalian proteome by protein structure and function modeling. Proteome Sci (2013) 0.75

Relationship between Metabolic Fluxes and Sequence-Derived Properties of Enzymes. Int Sch Res Notices (2014) 0.75

Joint probabilistic-logical refinement of multiple protein feature predictors. BMC Bioinformatics (2014) 0.75

Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics. Biomed Res Int (2014) 0.75

A method to predict edge strands in beta-sheets from protein sequences. Comput Struct Biotechnol J (2013) 0.75

Articles cited by this

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet (2000) 336.52

Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol (2001) 66.87

Improved prediction of signal peptides: SignalP 3.0. J Mol Biol (2004) 48.40

Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng (1997) 38.38

Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol (1999) 33.07

The Pfam protein families database. Nucleic Acids Res (2007) 30.53

Patterns of amino acids near signal-sequence cleavage sites. Eur J Biochem (1983) 26.68

Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc (2007) 19.50

Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res (2003) 15.74

Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol (1999) 15.63

The HMMTOP transmembrane topology prediction server. Bioinformatics (2001) 12.57

STRING 7--recent developments in the integration and prediction of protein interactions. Nucleic Acids Res (2006) 12.16

Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol (1992) 11.97

Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics (2000) 11.75

WoLF PSORT: protein localization predictor. Nucleic Acids Res (2007) 10.14

Enlarged representative set of protein structures. Protein Sci (1994) 7.98

Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics (2004) 7.76

The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res (2004) 7.21

Systematic discovery of in vivo phosphorylation networks. Cell (2007) 6.94

Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol (2001) 6.63

Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology (2004) 6.13

Mechanisms of specificity in protein phosphorylation. Nat Rev Mol Cell Biol (2007) 6.11

Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics (2007) 5.68

Trafficking and signaling by fatty-acylated and prenylated proteins. Nat Chem Biol (2006) 4.11

History of the enzyme nomenclature system. Bioinformatics (2000) 3.90

Practical limits of function prediction. Proteins (2000) 3.81

Linear motif atlas for phosphorylation-dependent signaling. Sci Signal (2008) 3.77

Best alpha-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information. Protein Sci (2004) 3.40

Reliability measures for membrane protein topology prediction algorithms. J Mol Biol (2003) 3.13

An HMM posterior decoder for sequence feature prediction that includes homology information. Bioinformatics (2005) 3.05

Inference of protein function from protein structure. Structure (2005) 2.90

ConFunc--functional annotation in the twilight zone. Bioinformatics (2008) 2.57

Prediction of human protein function according to Gene Ontology categories. Bioinformatics (2003) 2.54

Enhanced automated function prediction using distantly related sequences and contextual association by PFP. Protein Sci (2006) 2.45

Sequence conserved for subcellular localization. Protein Sci (2002) 2.36

The Protein Feature Ontology: a tool for the unification of protein feature annotations. Bioinformatics (2008) 2.33

BaCelLo: a balanced subcellular localization predictor. Bioinformatics (2006) 2.14

Prediction of human protein function from post-translational modifications and localization features. J Mol Biol (2002) 2.05

firestar--prediction of functionally important residues using structural templates and alignment reliability. Nucleic Acids Res (2007) 1.91

PredGPI: a GPI-anchor predictor. BMC Bioinformatics (2008) 1.86

Classification schemes for protein structure and function. Nat Rev Genet (2003) 1.52

Predicting biological networks from genomic data. FEBS Lett (2008) 1.51

FragAnchor: a large-scale predictor of glycosylphosphatidylinositol anchors in eukaryote protein sequences by qualitative scoring. Genomics Proteomics Bioinformatics (2007) 1.49

EUCLID: automatic classification of proteins in functional classes by their database annotations. Bioinformatics (1998) 1.49

FFPred: an integrated feature-based function prediction server for vertebrate proteomes. Nucleic Acids Res (2008) 1.44

CSS-Palm: palmitoylation site prediction with a clustering and scoring strategy (CSS). Bioinformatics (2006) 1.42

Defining a similarity threshold for a functional protein sequence pattern: the signal peptide cleavage site. Proteins (1996) 1.31

NetCGlyc 1.0: prediction of mammalian C-mannosylation sites. Glycobiology (2007) 1.30

PONGO: a web server for multiple predictions of all-alpha transmembrane proteins. Nucleic Acids Res (2006) 1.29

Structural classification and prediction of reentrant regions in alpha-helical transmembrane proteins: application to complete genomes. J Mol Biol (2006) 1.27

Meta-prediction of phosphorylation sites with weighted voting and restricted grid search parameter selection. Nucleic Acids Res (2008) 1.20

Prediction of enzyme classification from protein sequence without the use of sequence similarity. Proc Int Conf Intell Syst Mol Biol (1997) 1.13

Probabilistic annotation of protein sequences based on functional classifications. BMC Bioinformatics (2005) 1.12

Posttranslational modifications and subcellular localization signals: indicators of sequence regions without inherent 3D structure? Curr Protein Pept Sci (2007) 1.08

A combinatorial pattern discovery approach for the prediction of membrane dipping (re-entrant) loops. Bioinformatics (2006) 1.02

Systems for categorizing functions of gene products. Curr Opin Struct Biol (1998) 0.97

CORRIE: enzyme sequence annotation with confidence estimates. BMC Bioinformatics (2007) 0.87

Articles by these authors

Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 96.15

A method and server for predicting damaging missense mutations. Nat Methods (2010) 78.53

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 75.09

Human non-synonymous SNPs: server and survey. Nucleic Acids Res (2002) 50.45

Improved prediction of signal peptides: SignalP 3.0. J Mol Biol (2004) 48.40

Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature (2002) 45.19

A human gut microbial gene catalogue established by metagenomic sequencing. Nature (2010) 43.63

SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods (2011) 33.90

Comparative metagenomics of microbial communities. Science (2005) 25.88

InterPro: the integrative protein signature database. Nucleic Acids Res (2008) 25.07

Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res (2002) 25.06

The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res (2003) 24.72

Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature (2004) 24.40

Enterotypes of the human gut microbiome. Nature (2011) 24.36

Comparative assessment of large-scale data sets of protein-protein interactions. Nature (2002) 24.25

Proteome survey reveals modularity of the yeast cell machinery. Nature (2006) 20.77

STRING 8--a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res (2008) 20.62

The genome sequence of the malaria mosquito Anopheles gambiae. Science (2002) 20.36

International network of cancer genome projects. Nature (2010) 20.35

Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc (2007) 19.50

SMART 4.0: towards genomic data integration. Nucleic Acids Res (2004) 19.37

GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res (2012) 19.19

The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res (2010) 18.73

STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res (2012) 18.26

InterPro, progress and status in 2005. Nucleic Acids Res (2005) 17.53

SMART 5: domains in the context of genomes and networks. Nucleic Acids Res (2006) 17.13

The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data. Nat Biotechnol (2004) 16.08

IntAct: an open source molecular interaction database. Nucleic Acids Res (2004) 15.02

Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics (2006) 14.96

Toward automatic reconstruction of a highly resolved tree of life. Science (2006) 14.96

Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics (2005) 14.50

InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res (2011) 13.45

Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature (2011) 13.18

New developments in the InterPro database. Nucleic Acids Res (2007) 12.49

STRING 7--recent developments in the integration and prediction of protein interactions. Nucleic Acids Res (2006) 12.16

Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res (2011) 10.82

STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res (2005) 10.44

A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol (2007) 9.90

SMART 6: recent updates and new developments. Nucleic Acids Res (2008) 9.80

STRING: a database of predicted functional associations between proteins. Nucleic Acids Res (2003) 9.45

Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science (2002) 9.43

Drug target identification using side-effect similarity. Science (2008) 9.24

SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res (2011) 9.15

Bioinformatics in the post-sequence era. Nat Genet (2003) 8.83

mRNA degradation by miRNAs and GW182 requires both CCR4:NOT deadenylase and DCP1:DCP2 decapping complexes. Genes Dev (2006) 8.78

Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis. Sci Signal (2010) 8.61

PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res (2006) 8.36

Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet (2006) 8.23

Protein disorder prediction: implications for structural proteomics. Structure (2003) 7.93

Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics (2004) 7.76

Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature (2005) 7.72

Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol (2011) 7.53

Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature (2010) 7.51

Alternative splicing and genome complexity. Nat Genet (2001) 7.30

The genome sequence of Bifidobacterium longum reflects its adaptation to the human gastrointestinal tract. Proc Natl Acad Sci U S A (2002) 7.21

Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res (2005) 7.13

A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol (2002) 7.12

Systematic discovery of in vivo phosphorylation networks. Cell (2007) 6.94

Richness of human gut microbiome correlates with metabolic markers. Nature (2013) 6.93

ELM server: A new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res (2003) 6.86

Prediction of lipoprotein signal peptides in Gram-negative bacteria. Protein Sci (2003) 6.85

A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol (2010) 6.75

The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature (2008) 6.69

The ecoresponsive genome of Daphnia pulex. Science (2011) 6.55

The genome of the model beetle and pest Tribolium castaneum. Nature (2008) 6.50

Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nat Genet (2011) 6.43

A gene network for navigating the literature. Nat Genet (2004) 6.43

Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biol (2008) 6.38

Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology (2004) 6.13

Evaluation of BioCreAtIvE assessment of task 2. BMC Bioinformatics (2005) 6.02

Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci (2003) 5.94

Association of genes to genetically inherited diseases using data mining. Nat Genet (2002) 5.78

Functional and evolutionary insights from the genomes of three parasitoid Nasonia species. Science (2010) 5.56

Immunity-related genes and gene families in Anopheles gambiae. Science (2002) 5.47

Analysis and prediction of leucine-rich nuclear export signals. Protein Eng Des Sel (2004) 5.15

Dynamic complex formation during the yeast cell cycle. Science (2005) 5.11

Reductive genome evolution in Buchnera aphidicola. Proc Natl Acad Sci U S A (2003) 4.88

STITCH: interaction networks of chemicals and proteins. Nucleic Acids Res (2007) 4.88

Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res (2002) 4.85

eggNOG: automated construction and annotation of orthologous groups of genes. Nucleic Acids Res (2007) 4.84

An Aboriginal Australian genome reveals separate human dispersals into Asia. Science (2011) 4.84

Transcriptome complexity in a genome-reduced bacterium. Science (2009) 4.64

A large-scale evaluation of computational protein function prediction. Nat Methods (2013) 4.61

Molecular code for transmembrane-helix recognition by the Sec61 translocon. Nature (2007) 4.60