The Pfam protein families database.

PubWeight™: 42.28‹?› | Rank: Top 0.01% | All-Time Top 1000

🔗 View Article (PMC 102420)

Published in Nucleic Acids Res on January 01, 2000

Authors

A Bateman1, E Birney, R Durbin, S R Eddy, K L Howe, E L Sonnhammer

Author Affiliations

1: The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. agb@sanger.ac.uk

Articles citing this

(truncated to the top 100)

Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet (2000) 336.52

The Pfam protein families database. Nucleic Acids Res (2004) 56.46

The Pfam protein families database. Nucleic Acids Res (2009) 37.98

The KEGG databases at GenomeNet. Nucleic Acids Res (2002) 33.68

An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res (2002) 25.81

The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res (2001) 24.45

Pfam: the protein families database. Nucleic Acids Res (2013) 22.48

TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res (2001) 20.84

Database resources of the National Center for Biotechnology Information: 2002 update. Nucleic Acids Res (2002) 19.40

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2001) 19.13

CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res (2002) 18.54

SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res (2000) 17.77

The Protein Information Resource: an integrated public resource of functional annotation of proteins. Nucleic Acids Res (2002) 12.20

A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS Comput Biol (2005) 11.82

Genome annotation assessment in Drosophila melanogaster. Genome Res (2000) 11.77

Complete genome sequence of an M1 strain of Streptococcus pyogenes. Proc Natl Acad Sci U S A (2001) 9.40

Using GeneWise in the Drosophila annotation experiment. Genome Res (2000) 7.50

The genome sequence of Bifidobacterium longum reflects its adaptation to the human gastrointestinal tract. Proc Natl Acad Sci U S A (2002) 7.21

BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations. Nucleic Acids Res (2001) 6.67

Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res (2002) 6.36

The TetR family of transcriptional repressors. Microbiol Mol Biol Rev (2005) 6.02

Genome sequence of Halobacterium species NRC-1. Proc Natl Acad Sci U S A (2000) 5.87

Complete genome sequence of Caulobacter crescentus. Proc Natl Acad Sci U S A (2001) 5.58

The Saccharomyces cerevisiae Set1 complex includes an Ash2 homologue and methylates histone 3 lysine 4. EMBO J (2001) 5.34

Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae. Proc Natl Acad Sci U S A (2002) 5.28

The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts. Proc Natl Acad Sci U S A (2002) 5.28

PRINTS and PRINTS-S shed light on protein ancestry. Nucleic Acids Res (2002) 4.75

Phylogenetic relationships within cation transporter families of Arabidopsis. Plant Physiol (2001) 4.74

CluSTr: a database of clusters of SWISS-PROT+TrEMBL proteins. Nucleic Acids Res (2001) 4.59

The genome of M. acetivorans reveals extensive metabolic and physiological diversity. Genome Res (2002) 4.58

Genome sequence of an industrial microorganism Streptomyces avermitilis: deducing the ability of producing secondary metabolites. Proc Natl Acad Sci U S A (2001) 4.56

RIO: analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics (2002) 4.49

iProClass: an integrated, comprehensive and annotated protein classification database. Nucleic Acids Res (2001) 4.33

Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res (2001) 4.29

Complete genome sequence of the Q-fever pathogen Coxiella burnetii. Proc Natl Acad Sci U S A (2003) 4.20

Structural flexibility in the Burkholderia mallei genome. Proc Natl Acad Sci U S A (2004) 4.13

Predicting gene ontology functions from ProDom and CDD protein domains. Genome Res (2002) 4.06

A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Res (2002) 3.76

Functional analysis of genes for biosynthesis of pyocyanin and phenazine-1-carboxamide from Pseudomonas aeruginosa PAO1. J Bacteriol (2001) 3.58

Systematic identification of novel protein domain families associated with nuclear functions. Genome Res (2002) 3.50

Insertional mutagenesis of genes required for seed development in Arabidopsis thaliana. Genetics (2001) 3.49

Genome sequence of Avery's virulent serotype 2 strain D39 of Streptococcus pneumoniae and comparison with that of unencapsulated laboratory strain R6. J Bacteriol (2006) 3.40

Pathway to synthesis and processing of mycolic acids in Mycobacterium tuberculosis. Clin Microbiol Rev (2005) 3.39

Clustering patterns of cytotoxic T-lymphocyte epitopes in human immunodeficiency virus type 1 (HIV-1) proteins reveal imprints of immune evasion on HIV-1 global variation. J Virol (2002) 3.38

Evolution of sensory complexity recorded in a myxobacterial genome. Proc Natl Acad Sci U S A (2006) 3.35

SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res (2002) 3.31

Proteome Analysis Database: online application of InterPro and CluSTr for the functional classification of proteins in whole genomes. Nucleic Acids Res (2001) 3.26

Annotating the human proteome: the Human Proteome Survey Database (HumanPSD) and an in-depth target database for G protein-coupled receptors (GPCR-PD) from Incyte Genomics. Nucleic Acids Res (2002) 3.14

Crystal structure of yeast initiation factor 4A, a DEAD-box RNA helicase. Proc Natl Acad Sci U S A (2000) 3.01

Protein Information Resource: a community resource for expert annotation of protein data. Nucleic Acids Res (2001) 3.01

Interrogating protein interaction networks through structural biology. Proc Natl Acad Sci U S A (2002) 2.93

Proteome-scale purification of human proteins from bacteria. Proc Natl Acad Sci U S A (2002) 2.88

The complete genome sequence of the murine respiratory pathogen Mycoplasma pulmonis. Nucleic Acids Res (2001) 2.75

Genome sequence of Haloarcula marismortui: a halophilic archaeon from the Dead Sea. Genome Res (2004) 2.75

Genomic characterization of non-O1, non-O139 Vibrio cholerae reveals genes for a type III secretion system. Proc Natl Acad Sci U S A (2005) 2.65

Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes. Proc Natl Acad Sci U S A (2006) 2.61

Comparison of the genome of the oral pathogen Treponema denticola with other spirochete genomes. Proc Natl Acad Sci U S A (2004) 2.60

Nucleotide sequence and predicted functions of the entire Sinorhizobium meliloti pSymA megaplasmid. Proc Natl Acad Sci U S A (2001) 2.59

DBC2, a candidate for a tumor suppressor gene involved in breast cancer. Proc Natl Acad Sci U S A (2002) 2.58

A human aminoacyl-tRNA synthetase as a regulator of angiogenesis. Proc Natl Acad Sci U S A (2002) 2.58

Full-length RecE enhances linear-linear homologous recombination and facilitates direct cloning for bioprospecting. Nat Biotechnol (2012) 2.54

The coding of temperature in the Drosophila brain. Cell (2011) 2.53

Mutator-like elements in Arabidopsis thaliana. Structure, diversity and evolution. Genetics (2000) 2.52

ApiDB: integrated resources for the apicomplexan bioinformatics resource center. Nucleic Acids Res (2006) 2.37

DBTBS: a database of Bacillus subtilis promoters and transcription factors. Nucleic Acids Res (2001) 2.35

A conserved coatomer-related complex containing Sec13 and Seh1 dynamically associates with the vacuole in Saccharomyces cerevisiae. Mol Cell Proteomics (2011) 2.34

Bacterial genome adaptation to niches: divergence of the potential virulence genes in three Burkholderia species of different survival strategies. BMC Genomics (2005) 2.34

Noncoding RNA genes identified in AT-rich hyperthermophiles. Proc Natl Acad Sci U S A (2002) 2.33

Neuropeptides and neuropeptide receptors in the Drosophila melanogaster genome. Genome Res (2001) 2.28

Identification of 113 conserved essential genes using a high-throughput gene disruption system in Streptococcus pneumoniae. Nucleic Acids Res (2002) 2.28

Rapid protein domain assignment from amino acid sequence using predicted secondary structure. Protein Sci (2002) 2.20

Genetic variation at the O-antigen biosynthetic locus in Pseudomonas aeruginosa. J Bacteriol (2002) 2.13

An annotated catalogue of salivary gland transcripts in the adult female mosquito, Aedes aegypti. BMC Genomics (2007) 2.06

Bacterial regulatory networks are extremely flexible in evolution. Nucleic Acids Res (2006) 2.04

SHORT INTEGUMENTS1/SUSPENSOR1/CARPEL FACTORY, a Dicer homolog, is a maternal effect gene required for embryo development in Arabidopsis. Plant Physiol (2002) 2.04

A comparative genome analysis identifies distinct sorting pathways in gram-positive bacteria. Infect Immun (2004) 2.04

Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J (2014) 2.03

Improving the quality of twilight-zone alignments. Protein Sci (2000) 2.03

The Molecular Biology Database Collection: 2007 update. Nucleic Acids Res (2006) 2.03

MODBASE, a database of annotated comparative protein structure models. Nucleic Acids Res (2002) 2.01

Development of a EST dataset and characterization of EST-SSRs in a traditional Chinese medicinal plant, Epimedium sagittatum (Sieb. Et Zucc.) Maxim. BMC Genomics (2010) 2.01

nag genes of Ralstonia (formerly Pseudomonas) sp. strain U2 encoding enzymes for gentisate catabolism. J Bacteriol (2001) 1.99

Predicting protein cellular localization using a domain projection method. Genome Res (2002) 1.98

Bacterial phylogeny structures soil resistomes across habitats. Nature (2014) 1.98

GTOP: a database of protein structures predicted from genome sequences. Nucleic Acids Res (2002) 1.97

FIGENIX: intelligent automation of genomic annotation: expertise integration in a new software platform. BMC Bioinformatics (2005) 1.97

De novo transcriptome sequencing in Anopheles funestus using Illumina RNA-seq technology. PLoS One (2010) 1.95

Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res (2005) 1.95

trEST, trGEN and Hits: access to databases of predicted protein sequences. Nucleic Acids Res (2001) 1.93

Arabidopsis disrupted in SQD2 encoding sulfolipid synthase is impaired in phosphate-limited growth. Proc Natl Acad Sci U S A (2002) 1.92

Evolutionary expansion and anatomical specialization of synapse proteome complexity. Nat Neurosci (2008) 1.91

FANTOM DB: database of Functional Annotation of RIKEN Mouse cDNA Clones. Nucleic Acids Res (2002) 1.91

Pyrosequencing-based comparative genome analysis of the nosocomial pathogen Enterococcus faecium and identification of a large transferable pathogenicity island. BMC Genomics (2010) 1.90

Brugia malayi excreted/secreted proteins at the host/parasite interface: stage- and gender-specific proteomic profiling. PLoS Negl Trop Dis (2009) 1.88

Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes. Genome Biol (2004) 1.88

ProtoNet: hierarchical classification of the protein space. Nucleic Acids Res (2003) 1.88

ORMDL proteins are a conserved new family of endoplasmic reticulum membrane proteins. Genome Biol (2002) 1.82

The Methanosarcina barkeri genome: comparative analysis with Methanosarcina acetivorans and Methanosarcina mazei reveals extensive rearrangement within methanosarcinal genomes. J Bacteriol (2006) 1.79

tRNAHis maturation: an essential yeast protein catalyzes addition of a guanine nucleotide to the 5' end of tRNAHis. Genes Dev (2003) 1.78

The complement of protein phosphatase catalytic subunits encoded in the genome of Arabidopsis. Plant Physiol (2002) 1.76

Articles by these authors

Initial sequencing and analysis of the human genome. Nature (2001) 212.86

tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res (1997) 142.55

Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol (2001) 66.87

The distributed annotation system. BMC Bioinformatics (2001) 42.98

The Ensembl genome database project. Nucleic Acids Res (2002) 40.87

Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature (2002) 28.79

Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins (1997) 26.91

Comparative genomics of the eukaryotes. Science (2000) 26.62

Ensembl 2009. Nucleic Acids Res (2008) 25.38

The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res (2001) 24.45

Ensembl 2008. Nucleic Acids Res (2007) 20.67

Ensembl 2007. Nucleic Acids Res (2006) 20.10

Reactome: a knowledgebase of biological pathways. Nucleic Acids Res (2005) 20.05

WormBase: network access to the genome and biology of Caenorhabditis elegans. Nucleic Acids Res (2001) 18.52

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol (2001) 16.47

Maximum discrimination hidden Markov models of sequence consensus. J Comput Biol (1995) 15.75

Ensembl 2005. Nucleic Acids Res (2005) 15.13

RNA sequence analysis using covariance models. Nucleic Acids Res (1994) 14.60

Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res (2003) 12.26

Volume changes in protein evolution. J Mol Biol (1994) 12.07

A workbench for large-scale sequence homology analysis. Comput Appl Biosci (1994) 12.00

Ensembl 2004. Nucleic Acids Res (2004) 11.88

Ensembl 2006. Nucleic Acids Res (2006) 11.66

Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. Nucleic Acids Res (1999) 11.64

ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics (2001) 11.54

Dynamite: a flexible code generating language for dynamic programming methods used in sequence comparison. Proc Int Conf Intell Syst Mol Biol (1997) 11.52

Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature (2001) 10.96

Apollo: a sequence annotation editor. Genome Biol (2002) 10.77

ACeDB and macace. Methods Cell Biol (1995) 10.64

Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res (1998) 8.87

Nucleotide sequence of yellow fever virus: implications for flavivirus gene expression and evolution. Science (1985) 8.54

A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene (1995) 8.45

Using GeneWise in the Drosophila annotation experiment. Genome Res (2000) 7.50

Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics (2001) 7.07

A computational screen for methylation guide snoRNAs in yeast. Science (1999) 6.43

InterPro--an integrated documentation resource for protein families, domains and functional sites. Bioinformatics (2000) 6.42

A dynamic programming algorithm for RNA structure prediction including pseudoknots. J Mol Biol (1999) 6.19

Homologs of small nucleolar RNAs in Archaea. Science (2000) 5.26

Ensembl Genomes: extending Ensembl across the taxonomic space. Nucleic Acids Res (2009) 5.09

The DNA sequence and analysis of human chromosome 6. Nature (2003) 4.75

A survey of expressed genes in Caenorhabditis elegans. Nat Genet (1992) 4.63

Computational identification of noncoding RNAs in E. coli by comparative genomics. Curr Biol (2001) 4.58

Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. Genome Res (1999) 4.50

Open annotation offers a democratic solution to genome sequencing. Nature (2000) 4.48

Software for genome mapping by fingerprinting techniques. Comput Appl Biosci (1988) 3.67

NIFAS: visual analysis of domain evolution in proteins. Bioinformatics (2001) 3.60

The Genome Knowledgebase: a resource for biologists and bioinformaticists. Cold Spring Harb Symp Quant Biol (2003) 3.55

PH domain: the first anniversary. Trends Biochem Sci (1994) 3.38

Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics (2000) 3.37

A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics (2001) 3.26

Dynamic programming alignment accuracy. J Comput Biol (1998) 3.22

Cancer and genomics. Nature (2001) 3.15

Changes in gene expression associated with developmental arrest and longevity in Caenorhabditis elegans. Genome Res (2001) 2.96

The language of RNA: a formal grammar that includes pseudoknots. Bioinformatics (2000) 2.71

Association of the Sindbis virus RNA methyltransferase activity with the nonstructural protein nsP1. Virology (1989) 2.64

Is there a single pathway for the folding of a polypeptide chain? Proc Natl Acad Sci U S A (1985) 2.62

Image analysis of restriction enzyme fingerprint autoradiograms. Comput Appl Biosci (1989) 2.55

The DNA sequence and biological annotation of human chromosome 1. Nature (2006) 2.42

A computational scan for U12-dependent introns in the human genome sequence. Nucleic Acids Res (2001) 2.21

Amino acid sequence motif of group I intron endonucleases is conserved in open reading frames of group II introns. Trends Biochem Sci (1994) 2.15

Sequence of the human immunoglobulin diversity (D) segment locus: a systematic analysis provides no evidence for the use of DIR segments, inverted D segments, "minor" D segments or D-D recombination. J Mol Biol (1997) 2.03

FAT: a novel domain in PIK-related kinases. Trends Biochem Sci (2000) 1.96

Monoclonal antibodies to three epitopic regions of feline leukemia virus p27 and their use in enzyme-linked immunosorbent assay of p27. J Immunol Methods (1983) 1.94

Comparative sequence analysis of the human and pufferfish Huntington's disease genes. Nat Genet (1995) 1.55

Widespread eukaryotic sequences, highly similar to bacterial DNA polymerase I, looking for functions. Curr Biol (1997) 1.45

The DNA sequence and analysis of human chromosome 13. Nature (2004) 1.33

Analysis of protein domain families in Caenorhabditis elegans. Genomics (1997) 1.30

Sequence assembly with CAFTOOLS. Genome Res (1998) 1.26

A comparison of sequence and structure protein domain families as a basis for structural genomics. Bioinformatics (1999) 1.25

DNA sequence and analysis of human chromosome 9. Nature (2004) 1.21

The imprint of somatic hypermutation on the repertoire of human germline V genes. J Mol Biol (1996) 1.21

Progress in sequencing the mouse genome. Genesis (2001) 1.20

Alfresco--a workbench for comparative genomic sequence analysis. Genome Res (2000) 1.19

An analogue approach to the travelling salesman problem using an elastic net method. Nature (1987) 1.19

An expert system for processing sequence homology data. Proc Int Conf Intell Syst Mol Biol (1994) 1.14

The DNA sequence and comparative analysis of human chromosome 10. Nature (2004) 1.14

Improved techniques for the identification of pseudogenes. Bioinformatics (2004) 1.04