The Pfam protein families database.

PubWeight™: 33.46‹?› | Rank: Top 0.01% | All-Time Top 10000

🔗 View Article (PMC 3245129)

Published in Nucleic Acids Res on November 29, 2011

Authors

Marco Punta1, Penny C Coggill, Ruth Y Eberhardt, Jaina Mistry, John Tate, Chris Boursnell, Ningze Pang, Kristoffer Forslund, Goran Ceric, Jody Clements, Andreas Heger, Liisa Holm, Erik L L Sonnhammer, Sean R Eddy, Alex Bateman, Robert D Finn

Author Affiliations

1: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK. mp13@sanger.ac.uk

Articles citing this

(truncated to the top 100)

MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol (2013) 34.34

The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov (2012) 26.98

Pfam: the protein families database. Nucleic Acids Res (2013) 22.48

The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res (2013) 10.24

InterProScan 5: genome-scale protein function classification. Bioinformatics (2014) 8.52

antiSMASH 2.0--a versatile platform for genome mining of secondary metabolite producers. Nucleic Acids Res (2013) 8.10

Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol (2013) 6.40

CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res (2012) 6.39

IMG 4 version of the integrated microbial genomes comparative analysis system. Nucleic Acids Res (2013) 6.14

Rfam 11.0: 10 years of RNA families. Nucleic Acids Res (2012) 6.14

The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res (2015) 5.97

The RCSB Protein Data Bank: new resources for research and education. Nucleic Acids Res (2012) 5.73

Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat Prod Rep (2013) 5.38

Cyclic di-GMP: the first 25 years of a universal bacterial second messenger. Microbiol Mol Biol Rev (2013) 4.69

A large-scale evaluation of computational protein function prediction. Nat Methods (2013) 4.61

eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res (2013) 3.77

Template-based protein structure modeling using the RaptorX web server. Nat Protoc (2012) 3.68

Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat (2012) 3.60

TIGRFAMs and Genome Properties in 2013. Nucleic Acids Res (2012) 3.45

Three-dimensional structures of membrane proteins from genomic sequencing. Cell (2012) 3.02

IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Res (2013) 2.99

The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature (2014) 2.93

X-linked TEX11 mutations, meiotic arrest, and azoospermia in infertile men. N Engl J Med (2015) 2.71

PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors. Nucleic Acids Res (2013) 2.71

SIFTS: Structure Integration with Function, Taxonomy and Sequences resource. Nucleic Acids Res (2012) 2.61

MycoCosm portal: gearing up for 1000 fungal genomes. Nucleic Acids Res (2013) 2.60

The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4). Stand Genomic Sci (2015) 2.55

The landscape of kinase fusions in cancer. Nat Commun (2014) 2.48

Classification of intrinsically disordered regions and proteins. Chem Rev (2014) 2.48

Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus). Nat Biotechnol (2012) 2.45

Predominant archaea in marine sediments degrade detrital proteins. Nature (2013) 2.41

LNCipedia: a database for annotated human lncRNA transcript sequences and structures. Nucleic Acids Res (2012) 2.40

DGIdb: mining the druggable genome. Nat Methods (2013) 2.26

The eukaryotic linear motif resource ELM: 10 years and counting. Nucleic Acids Res (2013) 2.25

A long noncoding RNA associated with susceptibility to celiac disease. Science (2016) 2.19

A Multicomponent Animal Virus Isolated from Mosquitoes. Cell Host Microbe (2016) 2.19

HAMAP in 2013, new developments in the protein family classification and annotation system. Nucleic Acids Res (2012) 2.16

The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection. Nucleic Acids Res (2011) 2.16

PDBe: Protein Data Bank in Europe. Nucleic Acids Res (2013) 2.16

The transporter classification database. Nucleic Acids Res (2013) 2.12

Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J (2014) 2.03

PredictProtein--an open resource for online prediction of protein structural and functional features. Nucleic Acids Res (2014) 2.00

PDBsum additions. Nucleic Acids Res (2013) 1.94

Behavioural and genetic analyses of Nasonia shed light on the evolution of sex pheromones. Nature (2013) 1.91

Adding unaligned sequences into an existing alignment using MAFFT and LAST. Bioinformatics (2012) 1.90

Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome Biol (2014) 1.90

Interactome3D: adding structural details to protein networks. Nat Methods (2012) 1.89

ESTHER, the database of the α/β-hydrolase fold superfamily of proteins: tools to explore diversity of functions. Nucleic Acids Res (2012) 1.88

Computational meta'omics for microbial community studies. Mol Syst Biol (2013) 1.84

Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res (2013) 1.83

Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB. Bioinformatics (2012) 1.82

FunGene: the functional gene pipeline and repository. Front Microbiol (2013) 1.80

GEMINI: integrative exploration of genetic variation and genome annotations. PLoS Comput Biol (2013) 1.78

Metrics for the Human Proteome Project 2013-2014 and strategies for finding missing proteins. J Proteome Res (2013) 1.76

Arginine-rhamnosylation as new strategy to activate translation elongation factor P. Nat Chem Biol (2015) 1.75

EBI metagenomics--a new resource for the analysis and archiving of metagenomic data. Nucleic Acids Res (2013) 1.74

Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res (2012) 1.74

Genomic analysis of 38 Legionella species identifies large and diverse effector repertoires. Nat Genet (2016) 1.73

Discovery of unconventional kinetochores in kinetoplastids. Cell (2014) 1.70

Web Apollo: a web-based genomic annotation editing platform. Genome Biol (2013) 1.69

Cyclic di-AMP: another second messenger enters the fray. Nat Rev Microbiol (2013) 1.69

PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome. Nucleic Acids Res (2013) 1.69

Profiling the orphan enzymes. Biol Direct (2014) 1.67

Assessment of computational methods for predicting the effects of missense mutations in human cancers. BMC Genomics (2013) 1.64

Assessing the human gut microbiota in metabolic diseases. Diabetes (2013) 1.64

Large-scale determination of sequence, structure, and function relationships in cytosolic glutathione transferases across the biosphere. PLoS Biol (2014) 1.62

RegPrecise 3.0--a resource for genome-scale exploration of transcriptional regulation in bacteria. BMC Genomics (2013) 1.61

Gene expansion shapes genome architecture in the human pathogen Lichtheimia corymbifera: an evolutionary genomics analysis in the ancient terrestrial mucorales (Mucoromycotina). PLoS Genet (2014) 1.61

AntiFam: a tool to help identify spurious ORFs in protein annotation. Database (Oxford) (2012) 1.60

Genomic determinants of sporulation in Bacilli and Clostridia: towards the minimal set of sporulation-specific genes. Environ Microbiol (2012) 1.59

Core structure of the U6 small nuclear ribonucleoprotein at 1.7-Å resolution. Nat Struct Mol Biol (2014) 1.56

UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics (2014) 1.56

An expanded genomic representation of the phylum cyanobacteria. Genome Biol Evol (2014) 1.56

Sequence composition of disordered regions fine-tunes protein half-life. Nat Struct Mol Biol (2015) 1.55

SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res (2013) 1.55

Identification of essential genes of the periodontal pathogen Porphyromonas gingivalis. BMC Genomics (2012) 1.55

The CD225 domain of IFITM3 is required for both IFITM protein association and inhibition of influenza A virus and dengue virus replication. J Virol (2013) 1.54

Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data. BMC Genomics (2012) 1.54

Making your database available through Wikipedia: the pros and cons. Nucleic Acids Res (2011) 1.53

Benchmarking of methods for genomic taxonomy. J Clin Microbiol (2014) 1.51

First genomic insights into members of a candidate bacterial phylum responsible for wastewater bulking. PeerJ (2015) 1.51

Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum Mol Genet (2014) 1.51

The Structure-Function Linkage Database. Nucleic Acids Res (2013) 1.50

Current opportunities and challenges in microbial metagenome analysis--a bioinformatic perspective. Brief Bioinform (2012) 1.50

DECIPHER: web-based, community resource for clinical interpretation of rare variants in developmental disorders. Hum Mol Genet (2012) 1.50

Reproductive Mode and the Evolution of Genome Size and Structure in Caenorhabditis Nematodes. PLoS Genet (2015) 1.49

Improving microbial genome annotations in an integrated database context. PLoS One (2013) 1.48

Extensive evolutionary and functional diversity among mammalian AIM2-like receptors. J Exp Med (2012) 1.48

DcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more. Nucleic Acids Res (2012) 1.47

Minke whale genome and aquatic adaptation in cetaceans. Nat Genet (2013) 1.46

Finding the missing honey bee genes: lessons learned from a genome upgrade. BMC Genomics (2014) 1.46

MYRF is a membrane-associated transcription factor that autoproteolytically cleaves to directly activate myelin genes. PLoS Biol (2013) 1.45

PDBTM: Protein Data Bank of transmembrane proteins after 8 years. Nucleic Acids Res (2012) 1.45

An improved canine genome and a comprehensive catalogue of coding genes and non-coding transcripts. PLoS One (2014) 1.44

Long non-coding RNAs: modulators of nuclear structure and function. Curr Opin Cell Biol (2013) 1.43

Reassessment of the Listeria monocytogenes pan-genome reveals dynamic integration hotspots and mobile genetic elements as major components of the accessory genome. BMC Genomics (2013) 1.43

Interactome map uncovers phosphatidylserine transport by oxysterol-binding proteins. Nature (2013) 1.42

Whole-genome sequencing of six dog breeds from continuous altitudes reveals adaptation to high-altitude hypoxia. Genome Res (2014) 1.42

The crystal structure and small-angle X-ray analysis of CsdL/TcdA reveal a new tRNA binding motif in the MoeB/E1 superfamily. PLoS One (2015) 1.41

Sequencing and beyond: integrating molecular 'omics' for microbial community profiling. Nat Rev Microbiol (2015) 1.41

Articles cited by this

SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol (1995) 74.88

The Pfam protein families database. Nucleic Acids Res (2009) 37.98

Pfam: clans, web tools and services. Nucleic Acids Res (2006) 34.83

The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res (2008) 27.83

InterPro: the integrative protein signature database. Nucleic Acids Res (2008) 25.07

The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res (2009) 19.70

UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics (2007) 17.43

PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res (2009) 7.47

A gene wiki for community annotation of gene function. PLoS Biol (2008) 6.60

Rfam: Wikipedia, clans and the "decimal" release. Nucleic Acids Res (2010) 6.58

Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation. PLoS One (2011) 5.64

A probabilistic model of local sequence alignment that simplifies statistical significance estimation. PLoS Comput Biol (2008) 5.12

Ensembl Genomes: extending Ensembl across the taxonomic space. Nucleic Acids Res (2009) 5.09

Exhaustive enumeration of protein domain families. J Mol Biol (2003) 4.95

A FAM21-containing WASH complex regulates retromer-dependent sorting. Dev Cell (2009) 4.89

The RNA WikiProject: community annotation of RNA families. RNA (2008) 4.83

Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics (2010) 4.48

Exploration of uncharted regions of the protein universe. PLoS Biol (2009) 3.41

Anillin is a substrate of anaphase-promoting complex/cyclosome (APC/C) that controls spatial contractility of myosin during late cytokinesis. J Biol Chem (2005) 2.61

DUFs: families in search of function. Acta Crystallogr Sect F Struct Biol Cryst Commun (2010) 2.61

Identification of an apoplastic protein involved in the initial phase of salt stress response in rice root by two-dimensional electrophoresis. Plant Physiol (2008) 2.28

The crystal structure of Mtr4 reveals a novel arch domain required for rRNA processing. EMBO J (2010) 2.25

Identifying protein domains with the Pfam database. Curr Protoc Bioinformatics (2008) 2.24

Identification of an Escherichia coli operon required for formation of the O-antigen capsule. J Bacteriol (2005) 2.14

Purification, characterization, and molecular gene cloning of an antifungal protein from Ginkgo biloba seeds. Biol Chem (2007) 1.93

The crystal structure of Escherichia coli group 4 capsule protein GfcC reveals a domain organization resembling that of Wza. Biochemistry (2011) 1.93

PfamAlyzer: domain-centric homology search. Bioinformatics (2007) 1.90

Crystal structure of ginkbilobin-2 with homology to the extracellular domain of plant cysteine-rich receptor-like kinases. Proteins (2009) 1.85

Articles by these authors

Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 96.15

The Pfam protein families database. Nucleic Acids Res (2004) 56.46

The Pfam protein families database. Nucleic Acids Res (2002) 51.34

miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res (2006) 39.25

Patterns of somatic mutation in human cancer genomes. Nature (2007) 38.41

The Pfam protein families database. Nucleic Acids Res (2009) 37.98

Pfam: clans, web tools and services. Nucleic Acids Res (2006) 34.83

The Pfam protein families database. Nucleic Acids Res (2007) 30.53

Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res (2005) 25.49

InterPro: the integrative protein signature database. Nucleic Acids Res (2008) 25.07

The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res (2003) 24.72

Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature (2005) 23.04

Rfam: an RNA family database. Nucleic Acids Res (2003) 22.93

Pfam: the protein families database. Nucleic Acids Res (2013) 22.48

A uniform system for microRNA annotation. RNA (2003) 20.28

Evolution of genes and genomes on the Drosophila phylogeny. Nature (2007) 18.01

InterPro, progress and status in 2005. Nucleic Acids Res (2005) 17.53

A combined transmembrane topology and signal peptide prediction method. J Mol Biol (2004) 15.77

InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res (2011) 13.45

HMMER web server: interactive sequence similarity searching. Nucleic Acids Res (2011) 13.00

Infernal 1.0: inference of RNA alignments. Bioinformatics (2009) 12.98

New developments in the InterPro database. Nucleic Acids Res (2007) 12.49

Rfam: updates to the RNA families database. Nucleic Acids Res (2008) 11.61

Mouse genomic variation and its effect on phenotypes and gene regulation. Nature (2011) 10.66

Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res (2005) 9.90

QuickTree: building huge Neighbour-Joining trees of protein sequences. Bioinformatics (2002) 9.36

Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature (2007) 7.91

Somatic mutations of the protein kinase gene family in human lung cancer. Cancer Res (2005) 7.66

Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet (2002) 7.25

iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions. Bioinformatics (2004) 7.02

Kalign--an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics (2005) 7.01

A hypermutation phenotype and somatic MSH6 mutations in recurrent human malignant gliomas after alkylator chemotherapy. Cancer Res (2006) 6.91

A screen of the complete protein kinase gene family identifies diverse patterns of somatic mutations in human breast cancer. Nat Genet (2005) 6.70

Rfam: Wikipedia, clans and the "decimal" release. Nucleic Acids Res (2010) 6.58

Integrating biological data--the Distributed Annotation System. BMC Bioinformatics (2008) 6.56

Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res (2002) 6.36

Query-dependent banding (QDB) for faster RNA similarity searches. PLoS Comput Biol (2007) 6.34

Rfam 11.0: 10 years of RNA families. Nucleic Acids Res (2012) 6.14

Enhanced protein domain discovery by using language modeling techniques from speech recognition. Proc Natl Acad Sci U S A (2003) 6.01

InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res (2009) 5.90

The genome of a songbird. Nature (2010) 5.90

Genome analysis of the platypus reveals unique signatures of evolution. Nature (2008) 5.74

Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation. PLoS One (2011) 5.64

MEROPS: the peptidase database. Nucleic Acids Res (2009) 5.33

Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server. Nucleic Acids Res (2007) 5.29

Pack-MULE transposable elements mediate gene evolution in plants. Nature (2004) 5.13

Integrating sequence and structural biology with DAS. BMC Bioinformatics (2007) 5.12

Exhaustive enumeration of protein domain families. J Mol Biol (2003) 4.95

MEROPS: the database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res (2011) 4.90

The RNA WikiProject: community annotation of RNA families. RNA (2008) 4.83

A ChIP-seq defined genome-wide map of vitamin D receptor binding: associations with disease and evolution. Genome Res (2010) 4.77

A large-scale evaluation of computational protein function prediction. Nat Methods (2013) 4.61

RIO: analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics (2002) 4.49

Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics (2010) 4.48