The COG database: a tool for genome-scale analysis of protein functions and evolution.

PubWeight™: 49.22‹?› | Rank: Top 0.01% | All-Time Top 1000

🔗 View Article (PMC 102395)

Published in Nucleic Acids Res on January 01, 2000

Authors

R L Tatusov1, M Y Galperin, D A Natale, E V Koonin

Author Affiliations

1: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

Articles citing this

(truncated to the top 100)

Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet (2000) 336.52

The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res (2001) 43.17

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2000) 34.79

OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res (2003) 33.03

TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res (2001) 20.84

Database resources of the National Center for Biotechnology Information: 2002 update. Nucleic Acids Res (2002) 19.40

Database resources of the National Center for Biotechnology. Nucleic Acids Res (2003) 18.26

STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res (2012) 18.26

The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol (2007) 13.99

The Comprehensive Microbial Resource. Nucleic Acids Res (2001) 11.19

Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res (2004) 9.85

Complete genome sequence of an M1 strain of Streptococcus pyogenes. Proc Natl Acad Sci U S A (2001) 9.40

Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs). Genome Biol (2000) 6.50

Genome sequence and comparative analysis of the solvent-producing bacterium Clostridium acetobutylicum. J Bacteriol (2001) 6.46

Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res (2002) 6.08

Honor thy symbionts. Proc Natl Acad Sci U S A (2003) 5.72

pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics (2010) 5.70

Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS One (2007) 5.62

BioVenn - a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams. BMC Genomics (2008) 5.07

FFAS03: a server for profile--profile sequence alignments. Nucleic Acids Res (2005) 4.99

A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol (2004) 4.94

Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Res (2002) 4.92

Genome of the extremely radiation-resistant bacterium Deinococcus radiodurans viewed from the perspective of comparative genomics. Microbiol Mol Biol Rev (2001) 4.75

Cyclic di-GMP: the first 25 years of a universal bacterial second messenger. Microbiol Mol Biol Rev (2013) 4.69

Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci U S A (2003) 4.68

Extreme genome reduction in symbiotic bacteria. Nat Rev Microbiol (2011) 4.67

Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol (2001) 4.59

ARED: human AU-rich element-containing mRNA database reveals an unexpectedly diverse functional repertoire of encoded proteins. Nucleic Acids Res (2001) 4.47

A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis. Nucleic Acids Res (2002) 4.39

iProClass: an integrated, comprehensive and annotated protein classification database. Nucleic Acids Res (2001) 4.33

Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC Bioinformatics (2011) 4.15

Genome sequencing reveals complex secondary metabolome in the marine actinomycete Salinispora tropica. Proc Natl Acad Sci U S A (2007) 4.04

Horizontal gene transfer in bacterial and archaeal complete genomes. Genome Res (2000) 3.95

Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res (2002) 3.93

Structural classification of bacterial response regulators: diversity of output domains and domain combinations. J Bacteriol (2006) 3.86

Comparative genomic analyses of seventeen Streptococcus pneumoniae strains: insights into the pneumococcal supragenome. J Bacteriol (2007) 3.62

SURVEY AND SUMMARY: holliday junction resolvases and related nucleases: identification of new families, phyletic distribution and evolutionary trajectories. Nucleic Acids Res (2000) 3.56

Genome sequence of the cyanobacterium Prochlorococcus marinus SS120, a nearly minimal oxyphototrophic genome. Proc Natl Acad Sci U S A (2003) 3.55

Benchmarking ortholog identification methods using functional genomics data. Genome Biol (2006) 3.54

Bacterial signal transduction network in a genomic perspective. Environ Microbiol (2004) 3.45

The complete genome and proteome of Mycoplasma mobile. Genome Res (2004) 3.36

Conservation of the biotin regulon and the BirA regulatory signal in Eubacteria and Archaea. Genome Res (2002) 3.26

Xenobiotics shape the physiology and gene expression of the active human gut microbiome. Cell (2013) 3.26

Proteome Analysis Database: online application of InterPro and CluSTr for the functional classification of proteins in whole genomes. Nucleic Acids Res (2001) 3.26

Protein molecular function prediction by Bayesian phylogenomics. PLoS Comput Biol (2005) 3.14

Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res (2001) 3.10

Dissecting the bacterial type VI secretion system by a genome wide in silico analysis: what can be learned from available microbial genomic resources? BMC Genomics (2009) 3.08

Protein Information Resource: a community resource for expert annotation of protein data. Nucleic Acids Res (2001) 3.01

Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution. Proc Natl Acad Sci U S A (2008) 2.97

Genes linked by fusion events are generally of the same functional category: a systematic analysis of 30 microbial genomes. Proc Natl Acad Sci U S A (2001) 2.95

Identification, analysis, and utilization of conserved ortholog set markers for comparative genomics in higher plants. Plant Cell (2002) 2.93

Genomic and functional adaptation in surface ocean planktonic prokaryotes. Nature (2010) 2.93

Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence. Genome Res (2001) 2.92

Functional classification using phylogenomic inference. PLoS Comput Biol (2006) 2.88

Re-annotating the Mycoplasma pneumoniae genome sequence: adding value, function and reading frames. Nucleic Acids Res (2000) 2.79

Combining bioinformatics and phylogenetics to identify large sets of single-copy orthologous genes (COSII) for comparative, evolutionary and systematic studies: a test case in the euasterid plant clade. Genetics (2006) 2.76

Metatranscriptomics reveals unique microbial small RNAs in the ocean's water column. Nature (2009) 2.72

Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes. Nucleic Acids Res (2008) 2.71

Prediction of the archaeal exosome and its connections with the proteasome and the translation and transcription machineries by a comparative-genomic approach. Genome Res (2001) 2.65

Population genomics of post-vaccine changes in pneumococcal epidemiology. Nat Genet (2013) 2.63

Comparative genomics of bacterial zinc regulons: enhanced ion transport, pathogenesis, and rearrangement of ribosomal proteins. Proc Natl Acad Sci U S A (2003) 2.61

Systematic identification of functional orthologs based on protein network comparison. Genome Res (2006) 2.47

Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res (2014) 2.37

The complete genome sequence of Chromobacterium violaceum reveals remarkable and exploitable bacterial adaptability. Proc Natl Acad Sci U S A (2003) 2.37

Proteomic analysis of the spore coats of Bacillus subtilis and Bacillus anthracis. J Bacteriol (2003) 2.34

Orthology prediction at scalable resolution by phylogenetic tree analysis. BMC Bioinformatics (2007) 2.29

Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids. PLoS One (2007) 2.28

An insect herbivore microbiome with high plant biomass-degrading capacity. PLoS Genet (2010) 2.28

What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae? Genome Biol (2006) 2.27

Recombination and insertion events involving the botulinum neurotoxin complex genes in Clostridium botulinum types A, B, E and F and Clostridium butyricum type E strains. BMC Biol (2009) 2.24

Localizing proteins in the cell from their phylogenetic profiles. Proc Natl Acad Sci U S A (2000) 2.24

Bambus 2: scaffolding metagenomes. Bioinformatics (2011) 2.24

Comparative analysis of ribosomal proteins in complete genomes: an example of reductive evolution at the domain scale. Nucleic Acids Res (2002) 2.24

Comparative genomics and functional analysis of niche-specific adaptation in Pseudomonas putida. FEMS Microbiol Rev (2011) 2.24

PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information. Nucleic Acids Res (2001) 2.23

Metabolic interdependence of obligate intracellular bacteria and their insect hosts. Microbiol Mol Biol Rev (2004) 2.19

Comparison of the complete genome sequences of Bifidobacterium animalis subsp. lactis DSM 10140 and Bl-04. J Bacteriol (2009) 2.15

The cyanobacterial genome core and the origin of photosynthesis. Proc Natl Acad Sci U S A (2006) 2.11

Constant relative rate of protein evolution and detection of functional diversification among bacterial, archaeal and eukaryotic proteins. Genome Biol (2001) 2.10

Comprehensive classification of nucleotidyltransferase fold proteins: identification of novel families and their representatives in human. Nucleic Acids Res (2009) 2.09

Comparative and functional genomic analysis of prokaryotic nickel and cobalt uptake transporters: evidence for a novel group of ATP-binding cassette transporters. J Bacteriol (2006) 2.02

Deep-sea vent epsilon-proteobacterial genomes provide insights into emergence of pathogens. Proc Natl Acad Sci U S A (2007) 2.00

Regulation of lysine biosynthesis and transport genes in bacteria: yet another RNA riboswitch? Nucleic Acids Res (2003) 1.98

Bacterial phylogeny structures soil resistomes across habitats. Nature (2014) 1.98

Common extracellular sensory domains in transmembrane receptors for diverse signal transduction pathways in bacteria and archaea. J Bacteriol (2003) 1.97

Sodium ion cycle in bacterial pathogens: evidence from cross-genome comparisons. Microbiol Mol Biol Rev (2001) 1.95

Comparative genomic analysis of the gut bacterium Bifidobacterium longum reveals loci susceptible to deletion during pure culture growth. BMC Genomics (2008) 1.94

The rhodanese/Cdc25 phosphatase superfamily. Sequence-structure-function relations. EMBO Rep (2002) 1.94

Massive comparative genomic analysis reveals convergent evolution of specialized bacteria. Biol Direct (2009) 1.92

The deep archaeal roots of eukaryotes. Mol Biol Evol (2008) 1.90

TarO: a target optimisation system for structural biology. Nucleic Acids Res (2008) 1.89

Regulation of the vitamin B12 metabolism and transport in bacteria by a conserved RNA structural element. RNA (2003) 1.89

Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evol Biol (2002) 1.86

Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags. Proc Natl Acad Sci U S A (2000) 1.84

Amino acid addition to Vibrio cholerae LPS establishes a link between surface remodeling in gram-positive and gram-negative bacteria. Proc Natl Acad Sci U S A (2012) 1.83

Comparative genomic analyses of nickel, cobalt and vitamin B12 utilization. BMC Genomics (2009) 1.80

Complete genome sequence of the extremely acidophilic methanotroph isolate V4, Methylacidiphilum infernorum, a representative of the bacterial phylum Verrucomicrobia. Biol Direct (2008) 1.78

Ancestral paralogs and pseudoparalogs and their role in the emergence of the eukaryotic cell. Nucleic Acids Res (2005) 1.77

Complete DNA sequence and analysis of the large virulence plasmid of Shigella flexneri. Infect Immun (2001) 1.77

Complete genome sequence and analysis of the multiresistant nosocomial pathogen Corynebacterium jeikeium K411, a lipid-requiring bacterium of the human skin flora. J Bacteriol (2005) 1.77

Articles by these authors

Initial sequencing and analysis of the human genome. Nature (2001) 212.86

A genomic perspective on protein families. Science (1997) 50.51

The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res (2001) 43.17

Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res (1998) 23.87

Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res (2001) 22.33

Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A (1994) 18.46

A superfamily of conserved domains in DNA damage-responsive cell cycle checkpoint proteins. FASEB J (1997) 15.10

Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science (1998) 13.64

BRCA1 protein products ... Functional motifs... Nat Genet (1996) 11.50

AAA+: A class of chaperone-like ATPases associated with the assembly, operation, and disassembly of protein complexes. Genome Res (1999) 11.30

Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science (2000) 10.82

Two related superfamilies of putative helicases involved in replication, recombination, repair and expression of DNA and RNA genomes. Nucleic Acids Res (1989) 10.03

Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol Microbiol (1997) 8.69

Novel domains of the prokaryotic two-component signal transduction systems. FEMS Microbiol Lett (2001) 8.45

Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. Trends Biochem Sci (1998) 8.01

Microbial culturomics: paradigm shift in the human gut microbiome study. Clin Microbiol Infect (2012) 7.97

IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics (1999) 6.91

Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs). Genome Biol (2000) 6.50

Genome sequence and comparative analysis of the solvent-producing bacterium Clostridium acetobutylicum. J Bacteriol (2001) 6.46

A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci U S A (1996) 6.38

Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science (1998) 6.28

Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption. In Silico Biol (1998) 6.22

Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol (1999) 5.90

Using the COG database to improve gene recognition in complete genomes. Genetica (2000) 5.80

Cleavage of cohesin by the CD clan protease separin triggers anaphase in yeast. Cell (2000) 5.69

Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr Biol (1996) 5.50

Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res (2001) 5.37

Beyond complete genomes: from sequence to structure and function. Curr Opin Struct Biol (1998) 5.16

Conserved domains in DNA repair proteins and evolution of repair systems. Nucleic Acids Res (1999) 4.80

Genome of the extremely radiation-resistant bacterium Deinococcus radiodurans viewed from the perspective of comparative genomics. Microbiol Mol Biol Rev (2001) 4.75

Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol (2001) 4.59

Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs. Proc Natl Acad Sci U S A (1997) 4.40

The DNA-repair protein AlkB, EGL-9, and leprecan define new families of 2-oxoglutarate- and iron-dependent dioxygenases. Genome Biol (2001) 4.39

Conserved sequence motifs in the initiator proteins for rolling circle DNA replication encoded by diverse replicons from eubacteria, eucaryotes and archaebacteria. Nucleic Acids Res (1992) 4.39

Cysteine proteases of positive strand RNA viruses and chymotrypsin-like serine proteases. A distinct protein superfamily with a common structural fold. FEBS Lett (1989) 4.29

Common origin of four diverse families of large eukaryotic DNA viruses. J Virol (2001) 4.28

N-terminal domains of putative helicases of flavi- and pestiviruses may be serine proteases. Nucleic Acids Res (1989) 4.27

Identification of paracaspases and metacaspases: two ancient families of caspase-like proteins, one of which plays a key role in MALT lymphoma. Mol Cell (2000) 4.19

The HD domain defines a new superfamily of metal-dependent phosphohydrolases. Trends Biochem Sci (1998) 4.11

SAP - a putative DNA-binding motif involved in chromosomal organization. Trends Biochem Sci (2000) 4.10

Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet (1998) 4.10

The complete sequence (22 kilobases) of murine coronavirus gene 1 encoding the putative proteases and RNA polymerase. Virology (1991) 4.07

Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol (2000) 4.04

Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage. Environ Microbiol (2000) 3.99

Viral proteins containing the purine NTP-binding sequence pattern. Nucleic Acids Res (1989) 3.96

Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum. Science (1998) 3.83

Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell. Genome Res (1999) 3.62

SURVEY AND SUMMARY: holliday junction resolvases and related nucleases: identification of new families, phyletic distribution and evolutionary trajectories. Nucleic Acids Res (2000) 3.56

Evolution of aminoacyl-tRNA synthetases--analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res (1999) 3.44

Coronavirus genome: prediction of putative functional domains in the non-structural polyprotein by comparative amino acid sequence analysis. Nucleic Acids Res (1989) 3.38

Putative papain-related thiol proteases of positive-strand RNA viruses. Identification of rubi- and aphthovirus proteases and delineation of a novel conserved domain associated with proteases of rubi-, alpha- and coronaviruses. FEBS Lett (1991) 3.35

DNA polymerase beta-like nucleotidyltransferase superfamily: identification of three new families, classification and evolutionary history. Nucleic Acids Res (1999) 3.32

A new superfamily of putative NTP-binding domains encoded by genomes of small DNA and RNA viruses. FEBS Lett (1990) 3.30

Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer. J Mol Biol (1999) 3.28

Lineage-specific loss and divergence of functionally linked genes in eukaryotes. Proc Natl Acad Sci U S A (2000) 3.27

Non-orthologous gene displacement. Trends Genet (1996) 3.12

Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res (2001) 3.10

A novel superfamily of nucleoside triphosphate-binding motif containing proteins which are probably involved in duplex unwinding in DNA and RNA replication and recombination. FEBS Lett (1988) 3.09

An NTP-binding motif is the most conserved sequence in a highly diverged monophyletic group of proteins involved in positive strand RNA viral replication. J Mol Evol (1989) 3.00

A conserved NTP-motif in putative helicases. Nature (1988) 2.94

Novel families of putative protein kinases in bacteria and archaea: evolution of the "eukaryotic" protein kinase superfamily. Genome Res (1998) 2.91

Detection of new genes in a bacterial genome using Markov models for three gene classes. Nucleic Acids Res (1995) 2.89

Rickettsiae and Chlamydiae: evidence of horizontal gene transfer and gene exchange. Trends Genet (1999) 2.87

Predicting functions from protein sequences--where are the bottlenecks? Nat Genet (1998) 2.86

Toprim--a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins. Nucleic Acids Res (1998) 2.82

SEALS: a system for easy analysis of lots of sequences. Proc Int Conf Intell Syst Mol Biol (1997) 2.80

Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. Nucleic Acids Res (2000) 2.80

RNA sequence of astrovirus: distinctive genomic organization and a putative retrovirus-like ribosomal frameshifting signal that directs the viral replicase synthesis. Proc Natl Acad Sci U S A (1993) 2.77

Role of CED-4 in the activation of CED-3. Nature (1997) 2.76

The U box is a modified RING finger - a common domain in ubiquitination. Curr Biol (2000) 2.73

Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-molecule-binding domains. J Mol Biol (2001) 2.68

Adaptations of the helix-grip fold for ligand binding and catalysis in the START domain superfamily. Proteins (2001) 2.65

Phosphoesterase domains associated with DNA polymerases of diverse origins. Nucleic Acids Res (1998) 2.61

The NACHT family - a new group of predicted NTPases implicated in apoptosis and MHC transcription activation. Trends Biochem Sci (2000) 2.58

The genome of molluscum contagiosum virus: analysis and comparison with other poxviruses. Virology (1997) 2.58

The domains of death: evolution of the apoptosis machinery. Trends Biochem Sci (1999) 2.56

DNA-binding proteins and evolution of transcription regulation in the archaea. Nucleic Acids Res (1999) 2.53

A diverse superfamily of enzymes with ATP-dependent carboxylate-amine/thiol ligase activity. Protein Sci (1997) 2.47

Gene order is not conserved in bacterial evolution. Trends Genet (1996) 2.44

Did DNA replication evolve twice independently? Nucleic Acids Res (1999) 2.44

Hedgehog patterning activity: role of a lipophilic modification mediated by the carboxy-terminal autoprocessing domain. Cell (1996) 2.39

The bacterial replicative helicase DnaB evolved from a RecA duplication. Genome Res (2000) 2.39

The catalytic domain of the P-type ATPase has the haloacid dehalogenase fold. Trends Biochem Sci (1998) 2.39

Fold prediction and evolutionary analysis of the POZ domain: structural and evolutionary relationship with the potassium channel tetramerization domain. J Mol Biol (1999) 2.37

Stable DNA unwinding, not "breathing," accounts for single-strand-specific nuclease hypersensitivity of specific A+T-rich sequences. Proc Natl Acad Sci U S A (1988) 2.35

Superfamily of UvrA-related NTP-binding proteins. Implications for rational classification of recombination/repair systems. J Mol Biol (1990) 2.34

A novel family of predicted phosphoesterases includes Drosophila prune protein and bacterial RecJ exonuclease. Trends Biochem Sci (1998) 2.33

Genome sequence comparison and scenarios for gene rearrangements: a test case. Genomics (1995) 2.32

Searching for drug targets in microbial genomes. Curr Opin Biotechnol (1999) 2.29

Distribution of protein folds in the three superkingdoms of life. Genome Res (1999) 2.29

Computer analysis of transcription regulatory patterns in completely sequenced bacterial genomes. Nucleic Acids Res (1999) 2.28

Crystal structure of a Hedgehog autoprocessing domain: homology between Hedgehog and self-splicing proteins. Cell (1997) 2.25