Published in Science on October 24, 1997
Gut Microbiota Composition in Hispanic and Non-Hispanic Children. | NCT03990350
The COG database: an updated version includes eukaryotes. BMC Bioinformatics (2003) 60.98
The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res (2000) 49.22
The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res (2001) 43.17
OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res (2003) 33.03
KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res (2007) 29.46
The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A (1999) 16.35
Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proc Natl Acad Sci U S A (2005) 14.59
The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions. Nucleic Acids Res (2007) 13.81
BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res (2005) 13.30
Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res (2003) 12.26
Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc Natl Acad Sci U S A (1998) 11.90
OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res (2006) 11.43
Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res (2011) 11.32
Metagenomic biomarker discovery and explanation. Genome Biol (2011) 11.29
Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol (2013) 11.05
The National Center for Biotechnology Information's Protein Clusters Database. Nucleic Acids Res (2008) 10.64
Comparative genomics of the lactic acid bacteria. Proc Natl Acad Sci U S A (2006) 10.24
The complete genome sequence of the lactic acid bacterium Lactococcus lactis ssp. lactis IL1403. Genome Res (2001) 8.63
The integrated microbial genomes (IMG) system. Nucleic Acids Res (2006) 7.34
IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Res (2007) 7.18
WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res (2000) 7.10
Measuring genome evolution. Proc Natl Acad Sci U S A (1998) 6.97
MetaSim: a sequencing simulator for genomics and metagenomics. PLoS One (2008) 6.54
Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs). Genome Biol (2000) 6.50
Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science (1998) 6.28
Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res (2002) 6.08
Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS One (2007) 5.62
Selection in the evolution of gene duplications. Genome Biol (2002) 5.58
Identification of 315 genes essential for early zebrafish development. Proc Natl Acad Sci U S A (2004) 5.54
Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res (2004) 5.26
Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. Genome Res (2003) 5.22
Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or "interologs". Genome Res (2001) 5.10
A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol (2004) 4.94
Strategies and tools for whole-genome alignments. Genome Res (2003) 4.86
eggNOG: automated construction and annotation of orthologous groups of genes. Nucleic Acids Res (2007) 4.84
Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res (2008) 4.78
Genome of the extremely radiation-resistant bacterium Deinococcus radiodurans viewed from the perspective of comparative genomics. Microbiol Mol Biol Rev (2001) 4.75
Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol Biol (2003) 4.70
Exploring the mycobacteriophage metaproteome: phage genomics as an educational platform. PLoS Genet (2006) 4.69
Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol (2009) 4.65
Who ate whom? Adaptive Helicobacter genomic changes that accompanied a host jump from early humans to large felines. PLoS Genet (2006) 4.65
New insights into Acinetobacter baumannii pathogenesis revealed by high-density pyrosequencing and transposon mutagenesis. Genes Dev (2007) 4.62
Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol (2001) 4.59
eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res (2009) 4.55
The MicrobesOnline Web site for comparative genomics. Genome Res (2005) 4.48
A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis. Nucleic Acids Res (2002) 4.39
Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource. Nucleic Acids Res (2010) 4.39
Genome sequence of Chlamydophila caviae (Chlamydia psittaci GPIC): examining the role of niche-specific genes in the evolution of the Chlamydiaceae. Nucleic Acids Res (2003) 4.33
A bioinformatician's guide to metagenomics. Microbiol Mol Biol Rev (2008) 4.33
AutoFACT: an automatic functional annotation and classification tool. BMC Bioinformatics (2005) 4.26
Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol (2004) 4.07
Comparative genomics and evolution of proteins involved in RNA metabolism. Nucleic Acids Res (2002) 4.00
ProtoMap: automatic classification of protein sequences and hierarchy of protein families. Nucleic Acids Res (2000) 3.97
eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res (2011) 3.94
Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res (2002) 3.93
Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res (2003) 3.90
Trends between gene content and genome size in prokaryotic species with larger genomes. Proc Natl Acad Sci U S A (2004) 3.89
The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res (2006) 3.84
The human phylome. Genome Biol (2007) 3.81
eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res (2013) 3.77
The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens. Proc Natl Acad Sci U S A (2002) 3.72
EDGAR: a software framework for the comparative analysis of prokaryotic genomes. BMC Bioinformatics (2009) 3.56
Comparative genome analysis of Vibrio vulnificus, a marine pathogen. Genome Res (2003) 3.55
'Conserved hypothetical' proteins: prioritization of targets for experimental study. Nucleic Acids Res (2004) 3.50
Multilocus sequence typing system for the endosymbiont Wolbachia pipientis. Appl Environ Microbiol (2006) 3.45
Exploration of uncharted regions of the protein universe. PLoS Biol (2009) 3.41
Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comput Biol (2005) 3.39
Network discovery pipeline elucidates conserved time-of-day-specific cis-regulatory modules. PLoS Genet (2008) 3.35
Epistasis as the primary factor in molecular evolution. Nature (2012) 3.34
Predictome: a database of putative functional links between proteins. Nucleic Acids Res (2002) 3.32
Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains. Genome Biol (2007) 3.29
Lineage-specific loss and divergence of functionally linked genes in eukaryotes. Proc Natl Acad Sci U S A (2000) 3.27
Comparative genomics of plant chromosomes. Plant Cell (2000) 3.21
Arabidopsis genes involved in acyl lipid metabolism. A 2003 census of the candidates, a study of the distribution of expressed sequence tags in organs, and a web-based database. Plant Physiol (2003) 3.21
Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res (2001) 3.10
Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human. PLoS Comput Biol (2006) 3.07
No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol Biol (2003) 3.07
PATRIC: the VBI PathoSystems Resource Integration Center. Nucleic Acids Res (2006) 3.06
Toxin-antitoxin modules may regulate synthesis of macromolecules during nutritional stress. J Bacteriol (2000) 3.05
Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol (2009) 3.03
Genes linked by fusion events are generally of the same functional category: a systematic analysis of 30 microbial genomes. Proc Natl Acad Sci U S A (2001) 2.95
Accelerated evolution associated with genome reduction in a free-living prokaryote. Genome Biol (2005) 2.95
DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res (2008) 2.94
Gene family evolution across 12 Drosophila genomes. PLoS Genet (2007) 2.94
Identification, analysis, and utilization of conserved ortholog set markers for comparative genomics in higher plants. Plant Cell (2002) 2.93
Graemlin: general and robust alignment of multiple large interaction networks. Genome Res (2006) 2.92
Complete genome sequencing of Anaplasma marginale reveals that the surface is skewed to two superfamilies of outer membrane proteins. Proc Natl Acad Sci U S A (2004) 2.90
Global alignment of multiple protein interaction networks with application to functional orthology detection. Proc Natl Acad Sci U S A (2008) 2.85
An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae. PLoS One (2007) 2.84
Toprim--a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins. Nucleic Acids Res (1998) 2.82
Molecular cloning of the maize gene crp1 reveals similarity between regulators of mitochondrial and chloroplast gene expression. EMBO J (1999) 2.77
Expanding yeast knowledge online. Yeast (1998) 2.76
Evidence of a large novel gene pool associated with prokaryotic genomic islands. PLoS Genet (2005) 2.71
On the origin of new genes in Drosophila. Genome Res (2008) 2.69
Search for a 'Tree of Life' in the thicket of the phylogenetic forest. J Biol (2009) 2.66
Large-scale assignment of orthology: back to phylogenetics? Genome Biol (2008) 2.66
The UCSC Archaeal Genome Browser. Nucleic Acids Res (2006) 2.65
Prediction of the archaeal exosome and its connections with the proteasome and the translation and transcription machineries by a comparative-genomic approach. Genome Res (2001) 2.65
Genome evolution at the genus level: comparison of three complete genomes of hyperthermophilic archaea. Genome Res (2001) 2.64
Testing the ortholog conjecture with comparative functional genomic data from mammals. PLoS Comput Biol (2011) 2.63
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31
Basic local alignment search tool. J Mol Biol (1990) 659.07
Initial sequencing and analysis of the human genome. Nature (2001) 212.86
Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A (1988) 193.60
Rapid and sensitive protein similarity searches. Science (1985) 76.83
Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci U S A (1983) 53.12
The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res (2000) 49.22
The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res (2001) 43.17
GenBank. Nucleic Acids Res (2000) 36.75
Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res (1998) 23.87
Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res (2001) 22.33
GenBank. Nucleic Acids Res (1999) 21.47
Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A (1994) 18.46
On the statistical significance of nucleic acid similarities. Nucleic Acids Res (1984) 18.21
A workbench for multiple alignment construction and analysis. Proteins (1991) 16.96
A superfamily of conserved domains in DNA damage-responsive cell cycle checkpoint proteins. FASEB J (1997) 15.10
Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science (1998) 13.64
Weights for data related by a tree. J Mol Biol (1989) 12.63
GenBank. Nucleic Acids Res (1997) 11.73
BRCA1 protein products ... Functional motifs... Nat Genet (1996) 11.50
AAA+: A class of chaperone-like ATPases associated with the assembly, operation, and disassembly of protein complexes. Genome Res (1999) 11.30
Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science (2000) 10.82
Two related superfamilies of putative helicases involved in replication, recombination, repair and expression of DNA and RNA genomes. Nucleic Acids Res (1989) 10.03
GenBank. Nucleic Acids Res (1998) 9.36
GenBank. Nucleic Acids Res (1993) 9.06
Novel domains of the prokaryotic two-component signal transduction systems. FEMS Microbiol Lett (2001) 8.45
Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. Trends Biochem Sci (1998) 8.01
Microbial culturomics: paradigm shift in the human gut microbiome study. Clin Microbiol Infect (2012) 7.97
GenBank. Nucleic Acids Res (1996) 7.06
IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics (1999) 6.91
Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs). Genome Biol (2000) 6.50
Genome sequence and comparative analysis of the solvent-producing bacterium Clostridium acetobutylicum. J Bacteriol (2001) 6.46
A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci U S A (1996) 6.38
Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science (1998) 6.28
Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption. In Silico Biol (1998) 6.22
Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol (1999) 5.90
Using the COG database to improve gene recognition in complete genomes. Genetica (2000) 5.80
Cleavage of cohesin by the CD clan protease separin triggers anaphase in yeast. Cell (2000) 5.69
GenBank. Nucleic Acids Res (1994) 5.63
Protein database searches for multiple alignments. Proc Natl Acad Sci U S A (1990) 5.52
Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr Biol (1996) 5.50
Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res (2001) 5.37
Conserved domains in DNA repair proteins and evolution of repair systems. Nucleic Acids Res (1999) 4.80
Genome of the extremely radiation-resistant bacterium Deinococcus radiodurans viewed from the perspective of comparative genomics. Microbiol Mol Biol Rev (2001) 4.75
Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol (2001) 4.59
Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs. Proc Natl Acad Sci U S A (1997) 4.40
The DNA-repair protein AlkB, EGL-9, and leprecan define new families of 2-oxoglutarate- and iron-dependent dioxygenases. Genome Biol (2001) 4.39
Conserved sequence motifs in the initiator proteins for rolling circle DNA replication encoded by diverse replicons from eubacteria, eucaryotes and archaebacteria. Nucleic Acids Res (1992) 4.39
Cysteine proteases of positive strand RNA viruses and chymotrypsin-like serine proteases. A distinct protein superfamily with a common structural fold. FEBS Lett (1989) 4.29
Common origin of four diverse families of large eukaryotic DNA viruses. J Virol (2001) 4.28
N-terminal domains of putative helicases of flavi- and pestiviruses may be serine proteases. Nucleic Acids Res (1989) 4.27
Identification of paracaspases and metacaspases: two ancient families of caspase-like proteins, one of which plays a key role in MALT lymphoma. Mol Cell (2000) 4.19
The HD domain defines a new superfamily of metal-dependent phosphohydrolases. Trends Biochem Sci (1998) 4.11
SAP - a putative DNA-binding motif involved in chromosomal organization. Trends Biochem Sci (2000) 4.10
Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet (1998) 4.10
The complete sequence (22 kilobases) of murine coronavirus gene 1 encoding the putative proteases and RNA polymerase. Virology (1991) 4.07
Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol (2000) 4.04
Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage. Environ Microbiol (2000) 3.99
Viral proteins containing the purine NTP-binding sequence pattern. Nucleic Acids Res (1989) 3.96
Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum. Science (1998) 3.83
Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell. Genome Res (1999) 3.62
SURVEY AND SUMMARY: holliday junction resolvases and related nucleases: identification of new families, phyletic distribution and evolutionary trajectories. Nucleic Acids Res (2000) 3.56
Evolution of aminoacyl-tRNA synthetases--analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res (1999) 3.44
Coronavirus genome: prediction of putative functional domains in the non-structural polyprotein by comparative amino acid sequence analysis. Nucleic Acids Res (1989) 3.38
DNA polymerase beta-like nucleotidyltransferase superfamily: identification of three new families, classification and evolutionary history. Nucleic Acids Res (1999) 3.32
A new superfamily of putative NTP-binding domains encoded by genomes of small DNA and RNA viruses. FEBS Lett (1990) 3.30
Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer. J Mol Biol (1999) 3.28
Lineage-specific loss and divergence of functionally linked genes in eukaryotes. Proc Natl Acad Sci U S A (2000) 3.27
Extracting protein alignment models from the sequence database. Nucleic Acids Res (1997) 3.17
Non-orthologous gene displacement. Trends Genet (1996) 3.12
Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res (2001) 3.10
A novel superfamily of nucleoside triphosphate-binding motif containing proteins which are probably involved in duplex unwinding in DNA and RNA replication and recombination. FEBS Lett (1988) 3.09
An NTP-binding motif is the most conserved sequence in a highly diverged monophyletic group of proteins involved in positive strand RNA viral replication. J Mol Evol (1989) 3.00
A conserved NTP-motif in putative helicases. Nature (1988) 2.94
Novel families of putative protein kinases in bacteria and archaea: evolution of the "eukaryotic" protein kinase superfamily. Genome Res (1998) 2.91
Detection of new genes in a bacterial genome using Markov models for three gene classes. Nucleic Acids Res (1995) 2.89
Rickettsiae and Chlamydiae: evidence of horizontal gene transfer and gene exchange. Trends Genet (1999) 2.87
Predicting functions from protein sequences--where are the bottlenecks? Nat Genet (1998) 2.86
Toprim--a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins. Nucleic Acids Res (1998) 2.82
SEALS: a system for easy analysis of lots of sequences. Proc Int Conf Intell Syst Mol Biol (1997) 2.80
Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. Nucleic Acids Res (2000) 2.80
RNA sequence of astrovirus: distinctive genomic organization and a putative retrovirus-like ribosomal frameshifting signal that directs the viral replicase synthesis. Proc Natl Acad Sci U S A (1993) 2.77
Role of CED-4 in the activation of CED-3. Nature (1997) 2.76
The U box is a modified RING finger - a common domain in ubiquitination. Curr Biol (2000) 2.73
Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-molecule-binding domains. J Mol Biol (2001) 2.68
Adaptations of the helix-grip fold for ligand binding and catalysis in the START domain superfamily. Proteins (2001) 2.65
Phosphoesterase domains associated with DNA polymerases of diverse origins. Nucleic Acids Res (1998) 2.61
The NACHT family - a new group of predicted NTPases implicated in apoptosis and MHC transcription activation. Trends Biochem Sci (2000) 2.58
The genome of molluscum contagiosum virus: analysis and comparison with other poxviruses. Virology (1997) 2.58
The domains of death: evolution of the apoptosis machinery. Trends Biochem Sci (1999) 2.56
DNA-binding proteins and evolution of transcription regulation in the archaea. Nucleic Acids Res (1999) 2.53
A diverse superfamily of enzymes with ATP-dependent carboxylate-amine/thiol ligase activity. Protein Sci (1997) 2.47
Contextual constraints on synonymous codon choice. J Mol Biol (1983) 2.47
Gene order is not conserved in bacterial evolution. Trends Genet (1996) 2.44
Did DNA replication evolve twice independently? Nucleic Acids Res (1999) 2.44