The InterPro Database, 2003 brings increased coverage and new features.

PubWeight™: 24.72‹?› | Rank: Top 0.01% | All-Time Top 10000

🔗 View Article (PMC 165493)

Published in Nucleic Acids Res on January 01, 2003

Authors

Nicola J Mulder1, Rolf Apweiler, Teresa K Attwood, Amos Bairoch, Daniel Barrell, Alex Bateman, David Binns, Margaret Biswas, Paul Bradley, Peer Bork, Phillip Bucher, Richard R Copley, Emmanuel Courcelle, Ujjwal Das, Richard Durbin, Laurent Falquet, Wolfgang Fleischmann, Sam Griffiths-Jones, Daniel Haft, Nicola Harte, Nicolas Hulo, Daniel Kahn, Alexander Kanapin, Maria Krestyaninova, Rodrigo Lopez, Ivica Letunic, David Lonsdale, Ville Silventoinen, Sandra E Orchard, Marco Pagni, David Peyruc, Chris P Ponting, Jeremy D Selengut, Florence Servant, Christian J A Sigrist, Robert Vaughan, Evgueni M Zdobnov

Author Affiliations

1: EMBL Outstation-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. mulder@ebi.ac.uk

Articles citing this

(truncated to the top 100)

Human MicroRNA targets. PLoS Biol (2004) 34.51

UniProt: the Universal Protein knowledgebase. Nucleic Acids Res (2004) 29.05

SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res (2003) 25.86

The Universal Protein Resource (UniProt). Nucleic Acids Res (2005) 23.66

PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res (2004) 19.62

SMART 4.0: towards genomic data integration. Nucleic Acids Res (2004) 19.37

The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res (2004) 18.75

SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res (2004) 15.21

Ensembl 2005. Nucleic Acids Res (2005) 15.13

The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res (2005) 13.37

The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer (2004) 12.35

Recent improvements to the PROSITE database. Nucleic Acids Res (2004) 11.89

Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol (2004) 10.59

MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res (2004) 8.58

The Genomes of Oryza sativa: a history of duplications. PLoS Biol (2005) 7.67

The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res (2005) 7.66

MEROPS: the peptidase database. Nucleic Acids Res (2004) 7.27

The EMBL Nucleotide Sequence Database. Nucleic Acids Res (2005) 7.18

Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol (2004) 7.17

ArrayExpress--a public repository for microarray gene expression data at the EBI. Nucleic Acids Res (2005) 6.73

The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res (2004) 6.72

The EMBL Nucleotide Sequence Database. Nucleic Acids Res (2004) 6.72

An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatics (2005) 6.58

GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic Acids Res (2004) 6.30

The TetR family of transcriptional repressors. Microbiol Mol Biol Rev (2005) 6.02

The Ensembl analysis pipeline. Genome Res (2004) 5.90

GlobPlot: Exploring protein sequences for globularity and disorder. Nucleic Acids Res (2003) 5.90

ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res (2005) 5.79

TCDB: the Transporter Classification Database for membrane transport protein analyses and information. Nucleic Acids Res (2006) 5.31

Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiol (2004) 5.17

Genew: the Human Gene Nomenclature Database, 2004 updates. Nucleic Acids Res (2004) 5.09

The European Bioinformatics Institute's data resources: towards systems biology. Nucleic Acids Res (2005) 4.90

BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments. Nucleic Acids Res (2005) 4.72

A genome-wide survey of human pseudogenes. Genome Res (2003) 4.34

The SWISS-MODEL Repository of annotated three-dimensional protein structure homology models. Nucleic Acids Res (2004) 4.30

E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Res (2005) 4.25

Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release. BMC Biol (2005) 4.18

Integr8 and Genome Reviews: integrated views of complete genomes and proteomes. Nucleic Acids Res (2005) 4.15

GeneSilico protein structure prediction meta-server. Nucleic Acids Res (2003) 3.98

The institute for genomic research Osa1 rice genome annotation database. Plant Physiol (2005) 3.96

SOAP-based services provided by the European Bioinformatics Institute. Nucleic Acids Res (2005) 3.93

Complete genome sequence of the hyperthermophilic archaeon Thermococcus kodakaraensis KOD1 and comparison with Pyrococcus genomes. Genome Res (2005) 3.68

The SOL Genomics Network: a comparative resource for Solanaceae biology and beyond. Plant Physiol (2005) 3.65

A first-draft human protein-interaction map. Genome Biol (2004) 3.64

Benchmarking ortholog identification methods using functional genomics data. Genome Biol (2006) 3.54

Bacterial signal transduction network in a genomic perspective. Environ Microbiol (2004) 3.45

E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Res (2004) 3.43

Complete genome sequence of the prototype lactic acid bacterium Lactococcus lactis subsp. cremoris MG1363. J Bacteriol (2007) 3.43

The aMAZE LightBench: a web interface to a relational database of cellular processes. Nucleic Acids Res (2004) 3.39

Tcoffee@igs: A web server for computing, evaluating and combining multiple sequence alignments. Nucleic Acids Res (2003) 3.35

A gateway-compatible yeast one-hybrid system. Genome Res (2004) 3.23

Insights into genome plasticity and pathogenicity of the plant pathogenic bacterium Xanthomonas campestris pv. vesicatoria revealed by the complete genome sequence. J Bacteriol (2005) 3.12

Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics (2004) 3.08

PIRSF family classification system for protein functional and evolutionary analysis. Evol Bioinform Online (2007) 3.03

Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinformatics (2005) 2.99

POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol (2003) 2.93

UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res (2005) 2.80

ADDA: a domain database with global coverage of the protein universe. Nucleic Acids Res (2005) 2.65

Comparative genomics of bacterial zinc regulons: enhanced ion transport, pathogenesis, and rearrangement of ribosomal proteins. Proc Natl Acad Sci U S A (2003) 2.61

OntoBlast function: From sequence similarities directly to potential functional annotations by ontology terms. Nucleic Acids Res (2003) 2.57

3DCoffee@igs: a web server for combining sequences and structures into a multiple sequence alignment. Nucleic Acids Res (2004) 2.50

SilkDB: a knowledgebase for silkworm biology and genomics. Nucleic Acids Res (2005) 2.50

High-throughput fluorescent tagging of full-length Arabidopsis gene products in planta. Plant Physiol (2004) 2.48

NRPS-PKS: a knowledge-based resource for analysis of NRPS/PKS megasynthases. Nucleic Acids Res (2004) 2.47

E-MSD: improving data deposition and structure quality. Nucleic Acids Res (2006) 2.41

Analyses of expressed sequence tags from apple. Plant Physiol (2006) 2.32

Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids. PLoS One (2007) 2.28

Prediction of RNA binding sites in proteins from amino acid sequence. RNA (2006) 2.28

Recombination and insertion events involving the botulinum neurotoxin complex genes in Clostridium botulinum types A, B, E and F and Clostridium butyricum type E strains. BMC Biol (2009) 2.24

A comprehensive transcript index of the human genome generated using microarrays and computational approaches. Genome Biol (2004) 2.21

The genome sequence of Salmonella enterica serovar Choleraesuis, a highly invasive and resistant zoonotic pathogen. Nucleic Acids Res (2005) 2.21

Evolutionary conservation of regulated longevity assurance mechanisms. Genome Biol (2007) 2.19

Public web-based services from the European Bioinformatics Institute. Nucleic Acids Res (2004) 2.15

prot4EST: translating expressed sequence tags from neglected genomes. BMC Bioinformatics (2004) 2.07

ECgene: genome annotation for alternative splicing. Nucleic Acids Res (2005) 2.06

GEPAS, an experiment-oriented pipeline for the analysis of microarray gene expression data. Nucleic Acids Res (2005) 1.97

Automated prediction of protein function and detection of functional sites from structure. Proc Natl Acad Sci U S A (2004) 1.96

Comparative analysis of protein domain organization. Genome Res (2004) 1.94

Global profiling of Shewanella oneidensis MR-1: expression of hypothetical genes and improved functional annotations. Proc Natl Acad Sci U S A (2005) 1.93

NOPdb: Nucleolar Proteome Database--2008 update. Nucleic Acids Res (2008) 1.93

Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Protein Sci (2004) 1.91

New challenges in gene expression data analysis and the extended GEPAS. Nucleic Acids Res (2004) 1.89

Genome-wide detection and analysis of cell wall-bound proteins with LPxTG-like sorting motifs. J Bacteriol (2005) 1.75

Transcriptome analysis of mouse stem cells and early embryos. PLoS Biol (2003) 1.75

ProtoNet 4.0: a hierarchical classification of one million protein sequences. Nucleic Acids Res (2005) 1.74

PupaSNP Finder: a web tool for finding SNPs with putative effect at transcriptional level. Nucleic Acids Res (2004) 1.74

The WD-repeat protein superfamily in Arabidopsis: conservation and divergence in structure and function. BMC Genomics (2003) 1.73

Antisense transcripts with rice full-length cDNAs. Genome Biol (2003) 1.69

NOPdb: Nucleolar Proteome Database. Nucleic Acids Res (2006) 1.68

Computational identification and characterization of novel genes from legumes. Plant Physiol (2004) 1.67

BacMap: an interactive picture atlas of annotated bacterial genomes. Nucleic Acids Res (2005) 1.65

Genomic and proteomic analysis of thirty-nine structural proteins of shrimp white spot syndrome virus. J Virol (2004) 1.65

MyHits: a new interactive resource for protein annotation and domain identification. Nucleic Acids Res (2004) 1.64

HUGE: a database for human KIAA proteins, a 2004 update integrating HUGEppi and ROUGE. Nucleic Acids Res (2004) 1.62

AnoEST: toward A. gambiae functional genomics. Genome Res (2005) 1.58

PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees. Nucleic Acids Res (2006) 1.58

Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy. Genome Biol (2008) 1.54

Chlamydomonas reinhardtii at the crossroads of genomics. Eukaryot Cell (2003) 1.52

ESTHER, the database of the alpha/beta-hydrolase fold superfamily of proteins. Nucleic Acids Res (2004) 1.52

The SYSTERS Protein Family Database in 2005. Nucleic Acids Res (2005) 1.49

Articles cited by this

Initial sequencing and analysis of the human genome. Nature (2001) 212.86

The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res (2000) 67.44

Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol (2001) 66.87

SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis (1997) 57.27

Creating the gene ontology resource: design and implementation. Genome Res (2001) 55.73

The Pfam protein families database. Nucleic Acids Res (2002) 51.34

InterProScan--an integration platform for the signature-recognition methods in InterPro. Bioinformatics (2001) 28.35

Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res (2002) 25.06

The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res (2001) 24.45

SRS: information retrieval system for molecular biology data banks. Methods Enzymol (1996) 24.30

TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res (2001) 20.84

ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res (2000) 18.27

SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res (2002) 18.11

The PROSITE database, its status in 2002. Nucleic Acids Res (2002) 14.71

A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Int J Neural Syst (1999) 7.40

PRINTS and PRINTS-S shed light on protein ancestry. Nucleic Acids Res (2002) 4.75

iProClass: an integrated, comprehensive and annotated protein classification database. Nucleic Acids Res (2001) 4.33

Systematic identification of novel protein domain families associated with nuclear functions. Genome Res (2002) 3.50

Applications of InterPro in protein annotation and genome analysis. Brief Bioinform (2002) 3.03

The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Sci (2002) 2.69

Browsing protein families via the 'Rich Family Description' format. Bioinformatics (1999) 1.85

Articles by these authors

The Sequence Alignment/Map format and SAMtools. Bioinformatics (2009) 232.39

Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (2009) 190.94

Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res (2008) 157.44

Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 96.15

Accurate whole human genome sequencing using reversible terminator chemistry. Nature (2008) 90.20

A method and server for predicting damaging missense mutations. Nat Methods (2010) 78.53

The Pfam protein families database. Nucleic Acids Res (2004) 56.46

The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res (2003) 52.80

Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics (2010) 52.01

The Pfam protein families database. Nucleic Acids Res (2002) 51.34

Human non-synonymous SNPs: server and survey. Nucleic Acids Res (2002) 50.45

The diploid genome sequence of an Asian individual. Nature (2008) 46.29

Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature (2002) 45.19

Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet (2008) 43.63

A human gut microbial gene catalogue established by metagenomic sequencing. Nature (2010) 43.63

miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res (2006) 39.25

Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res (2003) 38.75

The Pfam protein families database. Nucleic Acids Res (2009) 37.98

Genome sequence of the human malaria parasite Plasmodium falciparum. Nature (2002) 37.89

Pfam: clans, web tools and services. Nucleic Acids Res (2006) 34.83

The Pfam protein families database. Nucleic Acids Res (2011) 33.46

The Pfam protein families database. Nucleic Acids Res (2007) 30.53

UniProt: the Universal Protein knowledgebase. Nucleic Acids Res (2004) 29.05

Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol (2011) 28.61

Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature (2003) 26.58

The variant call format and VCFtools. Bioinformatics (2011) 25.88

Comparative metagenomics of microbial communities. Science (2005) 25.88

miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res (2010) 25.85

Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res (2005) 25.49

InterPro: the integrative protein signature database. Nucleic Acids Res (2008) 25.07

Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res (2002) 25.06

Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature (2004) 24.40

Enterotypes of the human gut microbiome. Nature (2011) 24.36

Comparative assessment of large-scale data sets of protein-protein interactions. Nature (2002) 24.25

The Universal Protein Resource (UniProt). Nucleic Acids Res (2005) 23.66

Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature (2005) 23.04

Rfam: an RNA family database. Nucleic Acids Res (2003) 22.93

The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res (2006) 22.70

Pfam: the protein families database. Nucleic Acids Res (2013) 22.48

A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol (2008) 21.72

Proteome survey reveals modularity of the yeast cell machinery. Nature (2006) 20.77

STRING 8--a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res (2008) 20.62

The genome sequence of the malaria mosquito Anopheles gambiae. Science (2002) 20.36

A uniform system for microRNA annotation. RNA (2003) 20.28

SMART 4.0: towards genomic data integration. Nucleic Acids Res (2004) 19.37

GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res (2012) 19.19

The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res (2004) 18.75

The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res (2010) 18.73

STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res (2012) 18.26

The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol (2005) 18.20

Evolution of genes and genomes on the Drosophila phylogeny. Nature (2007) 18.01

GeneWise and Genomewise. Genome Res (2004) 17.87

InterPro, progress and status in 2005. Nucleic Acids Res (2005) 17.53