Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs).

PubWeight™: 6.50‹?› | Rank: Top 1%

🔗 View Article (PMC 15027)

Published in Genome Biol on November 06, 2000

Authors

D A Natale1, U T Shankavaram, M Y Galperin, Y I Wolf, L Aravind, E V Koonin

Author Affiliations

1: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rockville Pike, Bethesda, MD 20894, USA.

Articles citing this

The COG database: an updated version includes eukaryotes. BMC Bioinformatics (2003) 60.98

The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res (2001) 43.17

OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res (2003) 33.03

A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol (2004) 4.94

Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res (2001) 3.10

The past, present and future of genome-wide re-annotation. Genome Biol (2002) 2.77

Comparative analysis of ribosomal proteins in complete genomes: an example of reductive evolution at the domain scale. Nucleic Acids Res (2002) 2.24

Evolutionary genomics of lactic acid bacteria. J Bacteriol (2006) 1.84

The first Illumina-based de novo transcriptome sequencing and analysis of safflower flowers. PLoS One (2012) 1.53

Phenotypic differentiation of gastrointestinal microbes is reflected in their encoded metabolic repertoires. Microbiome (2015) 1.46

Comparative genomics of Archaea: how much have we learned in six years, and what's next? Genome Biol (2003) 1.38

Comparative multi-omics systems analysis of Escherichia coli strains B and K-12. Genome Biol (2012) 1.37

Re-annotation of genome microbial coding-sequences: finding new genes and inaccurately annotated genes. BMC Bioinformatics (2002) 1.33

A regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data. BMC Bioinformatics (2006) 1.15

Comparative analysis and assessment of M. tuberculosis H37Rv protein-protein interaction datasets. BMC Genomics (2011) 1.14

Evolutionary analysis by whole-genome comparisons. J Bacteriol (2002) 1.05

Identification of new members of the MAPK gene family in plants shows diverse conserved domains and novel activation loop variants. BMC Genomics (2015) 0.99

Uncovering rate variation of lateral gene transfer during bacterial genome evolution. BMC Genomics (2008) 0.96

Prediction of novel archaeal enzymes from sequence-derived features. Protein Sci (2002) 0.95

Phylogeny vs genome reshuffling: horizontal gene transfer. Indian J Microbiol (2008) 0.93

Mapping phosphoproteins in Mycoplasma genitalium and Mycoplasma pneumoniae. BMC Microbiol (2007) 0.92

Insertion sequence content reflects genome plasticity in strains of the root nodule actinobacterium Frankia. BMC Genomics (2009) 0.90

An integrative method for identifying the over-annotated protein-coding genes in microbial genomes. DNA Res (2011) 0.88

A highly conserved family of domains related to the DNA-glycosylase fold helps predict multiple novel pathways for RNA modifications. RNA Biol (2014) 0.87

De novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome to identify putative genes involved in the aquatic adaptation and immune response. PLoS One (2013) 0.83

Transcriptome analysis by Illumina high-throughout paired-end sequencing reveals the complexity of differential gene expression during in vitro plantlet growth and flowering in Amaranthus tricolor L. PLoS One (2014) 0.80

Revealing gene transcription and translation initiation patterns in archaea, using an interactive clustering model. Extremophiles (2004) 0.79

Transcriptomic Analysis of Paeonia delavayi Wild Population Flowers to Identify Differentially Expressed Genes Involved in Purple-Red and Yellow Petal Pigmentation. PLoS One (2015) 0.79

TM0486 from the hyperthermophilic anaerobe Thermotoga maritima is a thiamin-binding protein involved in response of the cell to oxidative conditions. J Mol Biol (2010) 0.78

Re-annotation of protein-coding genes in 10 complete genomes of Neisseriaceae family by combining similarity-based and composition-based methods. DNA Res (2013) 0.78

Systematic Analysis of Intracellular-targeting Antimicrobial Peptides, Bactenecin 7, Hybrid of Pleurocidin and Dermaseptin, Proline-Arginine-rich Peptide, and Lactoferricin B, by Using Escherichia coli Proteome Microarrays. Mol Cell Proteomics (2016) 0.77

Simultaneous prediction of enzyme orthologs from chemical transformation patterns for de novo metabolic pathway reconstruction. Bioinformatics (2016) 0.76

An evidence of laccases in archaea. Indian J Microbiol (2009) 0.75

RNA-Sequencing Reveals Biological Networks during Table Grapevine ('Fujiminori') Fruit Development. PLoS One (2017) 0.75

iTRAQ protein profile analysis of developmental dynamics in soybean [Glycine max (L.) Merr.] leaves. PLoS One (2017) 0.75

Articles cited by this

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol (1987) 266.90

KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res (2000) 117.00

A genomic perspective on protein families. Science (1997) 50.51

The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res (2000) 49.22

GenBank. Nucleic Acids Res (2000) 36.75

Distinguishing homologous from analogous proteins. Syst Zool (1970) 25.10

Construction of phylogenetic trees. Science (1967) 23.69

SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res (2000) 17.77

Analysis of compositionally biased regions in sequence databases. Methods Enzymol (1996) 17.11

Automated genome sequence analysis and annotation. Bioinformatics (1999) 13.92

Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science (1998) 13.64

Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol Microbiol (1997) 8.69

Genome phylogeny based on gene content. Nat Genet (1999) 8.12

Complete genome sequence of an aerobic hyper-thermophilic crenarchaeon, Aeropyrum pernix K1. DNA Res (1999) 7.39

WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res (2000) 7.10

Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol (1996) 6.99

Predicting function: from genes to genomes and back. J Mol Biol (1998) 6.60

Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption. In Silico Biol (1998) 6.22

Using the COG database to improve gene recognition in complete genomes. Genetica (2000) 5.80

Uses for evolutionary trees. Philos Trans R Soc Lond B Biol Sci (1995) 4.81

Conserved domains in DNA repair proteins and evolution of repair systems. Nucleic Acids Res (1999) 4.80

An evolutionary treasure: unification of a broad set of amidohydrolases related to urease. Proteins (1997) 4.42

Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol (2000) 4.04

MAGPIE: automated genome interpretation. Trends Genet (1996) 4.01

Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell. Genome Res (1999) 3.62

Evolution of aminoacyl-tRNA synthetases--analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res (1999) 3.44

Whole genome-based phylogenetic analysis of free-living microorganisms. Nucleic Acids Res (1999) 3.32

DNA polymerase beta-like nucleotidyltransferase superfamily: identification of three new families, classification and evolutionary history. Nucleic Acids Res (1999) 3.32

Evidence that eukaryotes and eocyte prokaryotes are immediate relatives. Science (1992) 3.14

Non-orthologous gene displacement. Trends Genet (1996) 3.12

Novel families of putative protein kinases in bacteria and archaea: evolution of the "eukaryotic" protein kinase superfamily. Genome Res (1998) 2.91

Predicting functions from protein sequences--where are the bottlenecks? Nat Genet (1998) 2.86

Pseudouridine synthases: four families of enzymes containing a putative uridine-binding motif also conserved in dUTPases and dCTP deaminases. Nucleic Acids Res (1996) 2.73

Phosphoesterase domains associated with DNA polymerases of diverse origins. Nucleic Acids Res (1998) 2.61

An evolutionary classification of the metallo-beta-lactamase fold proteins. In Silico Biol (1999) 2.58

An archaeal genomic signature. Proc Natl Acad Sci U S A (2000) 2.53

DNA-binding proteins and evolution of transcription regulation in the archaea. Nucleic Acids Res (1999) 2.53

Cloning, sequencing, and expression of a fibronectin/fibrinogen-binding protein from group A streptococci. Infect Immun (1994) 2.15

Novel predicted RNA-binding domains associated with the translation machinery. J Mol Evol (1999) 2.07

A read-ahead function in archaeal DNA polymerases detects promutagenic template-strand uracil. Proc Natl Acad Sci U S A (1999) 2.03

A heterodimeric DNA polymerase: evidence that members of Euryarchaeota possess a distinct DNA polymerase. Proc Natl Acad Sci U S A (1998) 1.70

Uracil-DNA glycosylase in the extreme thermophile Archaeoglobus fulgidus. J Biol Chem (2000) 1.40

Evolutionary anomalies among the aminoacyl-tRNA synthetases. Curr Opin Genet Dev (1998) 1.38

Dealing with database explosion: a cautionary note. Science (1997) 1.32

Thermostable uracil-DNA glycosylase from Thermotoga maritima a member of a novel class of DNA repair enzymes. Curr Biol (1999) 1.30

Biosequence exegesis. Science (1999) 1.27

Protein fold recognition using sequence profiles and its application in structural genomics. Adv Protein Chem (2000) 1.24

In vitro DNA binding of the archaeal protein Sso7d induces negative supercoiling at temperatures typical for thermophilic growth. Nucleic Acids Res (1998) 1.21

Organelle division: Self-assembling GTPase caught in the middle. Curr Biol (2000) 1.11

Pervasiveness of gene conservation and persistence of duplicates in cellular genomes. J Mol Evol (1999) 1.07

Optimally recovering rate variation information from genomes and sequences: pattern filtering. Mol Biol Evol (1998) 1.06

Two family B DNA polymerases from Aeropyrum pernix, an aerobic hyperthermophilic crenarchaeote. J Bacteriol (1999) 1.03

Cysteine biosynthesis pathway in the archaeon Methanosarcina barkeri encoded by acquired bacterial genes? J Bacteriol (2000) 0.98

Articles by these authors

Initial sequencing and analysis of the human genome. Nature (2001) 212.86

A genomic perspective on protein families. Science (1997) 50.51

The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res (2000) 49.22

The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res (2001) 43.17

Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science (2009) 32.97

Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res (1998) 23.87

Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res (2001) 22.33

Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A (1994) 18.46

A superfamily of conserved domains in DNA damage-responsive cell cycle checkpoint proteins. FASEB J (1997) 15.10

Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science (1998) 13.64

De-ubiquitination and ubiquitin ligase domains of A20 downregulate NF-kappaB signalling. Nature (2004) 12.41

BRCA1 protein products ... Functional motifs... Nat Genet (1996) 11.50

AAA+: A class of chaperone-like ATPases associated with the assembly, operation, and disassembly of protein complexes. Genome Res (1999) 11.30

Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science (2000) 10.82

Two related superfamilies of putative helicases involved in replication, recombination, repair and expression of DNA and RNA genomes. Nucleic Acids Res (1989) 10.03

Impaired hydroxylation of 5-methylcytosine in myeloid cancers with mutant TET2. Nature (2010) 8.87

Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol Microbiol (1997) 8.69

Novel domains of the prokaryotic two-component signal transduction systems. FEMS Microbiol Lett (2001) 8.45

Horizontal gene transfer in prokaryotes: quantification and classification. Annu Rev Microbiol (2001) 8.14

Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. Trends Biochem Sci (1998) 8.01

Genome sequence of the radioresistant bacterium Deinococcus radiodurans R1. Science (1999) 8.01

Microbial culturomics: paradigm shift in the human gut microbiome study. Clin Microbiol Infect (2012) 7.97

Role of Rpn11 metalloprotease in deubiquitination and degradation by the 26S proteasome. Science (2002) 7.49

IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics (1999) 6.91

Classification and evolution of P-loop GTPases and related ATPases. J Mol Biol (2002) 6.85

Genome sequence and comparative analysis of the solvent-producing bacterium Clostridium acetobutylicum. J Bacteriol (2001) 6.46

A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci U S A (1996) 6.38

Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science (1998) 6.28

Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption. In Silico Biol (1998) 6.22

Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol (1999) 5.90

Using the COG database to improve gene recognition in complete genomes. Genetica (2000) 5.80

Cleavage of cohesin by the CD clan protease separin triggers anaphase in yeast. Cell (2000) 5.69

Complete genome sequence of the apicomplexan, Cryptosporidium parvum. Science (2004) 5.64

Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr Biol (1996) 5.50

Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res (2001) 5.37

Beyond complete genomes: from sequence to structure and function. Curr Opin Struct Biol (1998) 5.16

Role of predicted metalloprotease motif of Jab1/Csn5 in cleavage of Nedd8 from Cul1. Science (2002) 4.95

Conserved domains in DNA repair proteins and evolution of repair systems. Nucleic Acids Res (1999) 4.80

Genome of the extremely radiation-resistant bacterium Deinococcus radiodurans viewed from the perspective of comparative genomics. Microbiol Mol Biol Rev (2001) 4.75

Evolutionary history and higher order classification of AAA+ ATPases. J Struct Biol (2004) 4.68

Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol (2001) 4.59

Evolutionary genomics of nucleo-cytoplasmic large DNA viruses. Virus Res (2006) 4.49

Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol (2004) 4.48

Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs. Proc Natl Acad Sci U S A (1997) 4.40

The DNA-repair protein AlkB, EGL-9, and leprecan define new families of 2-oxoglutarate- and iron-dependent dioxygenases. Genome Biol (2001) 4.39

Conserved sequence motifs in the initiator proteins for rolling circle DNA replication encoded by diverse replicons from eubacteria, eucaryotes and archaebacteria. Nucleic Acids Res (1992) 4.39

A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis. Nucleic Acids Res (2002) 4.39

Cysteine proteases of positive strand RNA viruses and chymotrypsin-like serine proteases. A distinct protein superfamily with a common structural fold. FEBS Lett (1989) 4.29

Common origin of four diverse families of large eukaryotic DNA viruses. J Virol (2001) 4.28

The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res (2002) 4.28

N-terminal domains of putative helicases of flavi- and pestiviruses may be serine proteases. Nucleic Acids Res (1989) 4.27

Identification of paracaspases and metacaspases: two ancient families of caspase-like proteins, one of which plays a key role in MALT lymphoma. Mol Cell (2000) 4.19

Prediction of novel families of enzymes involved in oxidative and other complex modifications of bases in nucleic acids. Cell Cycle (2009) 4.14

The HD domain defines a new superfamily of metal-dependent phosphohydrolases. Trends Biochem Sci (1998) 4.11

SAP - a putative DNA-binding motif involved in chromosomal organization. Trends Biochem Sci (2000) 4.10

Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet (1998) 4.10

The complete sequence (22 kilobases) of murine coronavirus gene 1 encoding the putative proteases and RNA polymerase. Virology (1991) 4.07

Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol (2000) 4.04

Comparative genomics and evolution of proteins involved in RNA metabolism. Nucleic Acids Res (2002) 4.00

Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage. Environ Microbiol (2000) 3.99

Viral proteins containing the purine NTP-binding sequence pattern. Nucleic Acids Res (1989) 3.96

Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPR-Cas systems. Biol Direct (2011) 3.92

Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum. Science (1998) 3.83

The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens. Proc Natl Acad Sci U S A (2002) 3.72

Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell. Genome Res (1999) 3.62

SURVEY AND SUMMARY: holliday junction resolvases and related nucleases: identification of new families, phyletic distribution and evolutionary trajectories. Nucleic Acids Res (2000) 3.56

Evolution of aminoacyl-tRNA synthetases--analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res (1999) 3.44

Coronavirus genome: prediction of putative functional domains in the non-structural polyprotein by comparative amino acid sequence analysis. Nucleic Acids Res (1989) 3.38

Discovery of the principal specific transcription factors of Apicomplexa and their implication for the evolution of the AP2-integrase DNA binding domains. Nucleic Acids Res (2005) 3.36

Putative papain-related thiol proteases of positive-strand RNA viruses. Identification of rubi- and aphthovirus proteases and delineation of a novel conserved domain associated with proteases of rubi-, alpha- and coronaviruses. FEBS Lett (1991) 3.35

DNA polymerase beta-like nucleotidyltransferase superfamily: identification of three new families, classification and evolutionary history. Nucleic Acids Res (1999) 3.32

A new superfamily of putative NTP-binding domains encoded by genomes of small DNA and RNA viruses. FEBS Lett (1990) 3.30

Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer. J Mol Biol (1999) 3.28