Published in Science on October 24, 1997
The COG database: an updated version includes eukaryotes. BMC Bioinformatics (2003) 60.98
The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res (2000) 49.22
OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res (2003) 33.03
An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res (2002) 25.81
The probability of duplicate gene preservation by subfunctionalization. Genetics (2000) 10.27
TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res (2006) 8.83
Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science (1998) 6.28
Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS One (2007) 5.62
A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol (2004) 4.94
Structure, function and evolution of glutathione transferases: implications for classification of non-mammalian members of an ancient enzyme superfamily. Biochem J (2001) 4.75
The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol (2004) 3.59
SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res (2003) 3.34
Mutator-like elements in Arabidopsis thaliana. Structure, diversity and evolution. Genetics (2000) 2.52
Cloning and functional analysis of cDNAs with open reading frames for 300 previously undefined genes expressed in CD34+ hematopoietic stem/progenitor cells. Genome Res (2000) 2.21
Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. Proc Natl Acad Sci U S A (1998) 2.08
PGP4, an ATP binding cassette P-glycoprotein, catalyzes auxin transport in Arabidopsis thaliana roots. Plant Cell (2005) 2.08
The ADF homology (ADF-H) domain: a highly exploited actin-binding module. Mol Biol Cell (1998) 1.81
Identification of genes expressed in human CD34(+) hematopoietic stem/progenitor cells by expressed sequence tags and efficient full-length cDNA cloning. Proc Natl Acad Sci U S A (1998) 1.78
YOGY: a web-based, integrated database to retrieve protein orthologs and associated Gene Ontology terms. Nucleic Acids Res (2006) 1.73
Streamlining and large ancestral genomes in Archaea inferred with a phylogenetic birth-and-death model. Mol Biol Evol (2009) 1.54
Directed evolution of an aspartate aminotransferase with new substrate specificities. Proc Natl Acad Sci U S A (1998) 1.54
The Arabidopsis thaliana ABC transporter AtMRP5 controls root development and stomata movement. EMBO J (2001) 1.53
Exploration of novel motifs derived from mouse cDNA sequences. Genome Res (2002) 1.51
Isovariant dynamics expand and buffer the responses of complex systems: the diverse plant actin gene family. Plant Cell (1999) 1.44
Plasmodium interspersed repeats: the major multigene superfamily of malaria parasites. Nucleic Acids Res (2004) 1.34
Plant ABC Transporters. Arabidopsis Book (2011) 1.32
L-tartaric acid synthesis from vitamin C in higher plants. Proc Natl Acad Sci U S A (2006) 1.31
Isolation of novel human and mouse genes of the recA/RAD51 recombination-repair gene family. Nucleic Acids Res (1998) 1.27
The solitary long terminal repeats of ERV-9 endogenous retrovirus are conserved during primate evolution and possess enhancer activities in embryonic and hematopoietic cells. J Virol (2002) 1.23
Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining. Yeast (2000) 1.20
Long-range function of an intergenic retrotransposon. Proc Natl Acad Sci U S A (2010) 1.20
The EF-hand domain: a globally cooperative structural unit. Protein Sci (2002) 1.17
The origin of alternation of generations in land plants: a focus on matrotrophy and hexose transport. Philos Trans R Soc Lond B Biol Sci (2000) 1.12
Functional prediction: identification of protein orthologs and paralogs. Protein Sci (2000) 1.10
Insights into the molecular evolution of the PDZ/LIM family and identification of a novel conserved protein motif. PLoS One (2007) 1.09
Evolutionary analysis by whole-genome comparisons. J Bacteriol (2002) 1.05
Fractured genes: a novel genomic arrangement involving new split inteins and a new homing endonuclease family. Nucleic Acids Res (2009) 1.05
OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species. Nucleic Acids Res (2015) 1.04
Specific inhibition of the transcription factor Ci by a cobalt(III) Schiff base-DNA conjugate. Mol Pharm (2012) 1.03
Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models. BMC Evol Biol (2004) 0.99
An ATP-binding cassette transporter GhWBC1 from elongating cotton fibers. Plant Physiol (2003) 0.95
Genomic sequence, structural organization, molecular evolution, and aberrant rearrangement of promyelocytic leukemia zinc finger gene. Proc Natl Acad Sci U S A (1999) 0.95
The origin of a novel gene through overprinting in Escherichia coli. BMC Evol Biol (2008) 0.94
Evolutionary conservation and expression of human RNA-binding proteins and their role in human genetic disease. Adv Exp Med Biol (2014) 0.92
Mammalian keratin associated proteins (KRTAPs) subgenomes: disentangling hair diversity and adaptation to terrestrial and aquatic environments. BMC Genomics (2014) 0.91
Changing T cell specificity by retroviral T cell receptor display. Proc Natl Acad Sci U S A (2000) 0.91
Identifying single copy orthologs in Metazoa. PLoS Comput Biol (2011) 0.89
New enzymes from environmental cassette arrays: functional attributes of a phosphotransferase and an RNA-methyltransferase. Protein Sci (2004) 0.89
Postnatal development- and age-related changes in DNA-methylation patterns in the human genome. Nucleic Acids Res (2012) 0.89
Gene trapping with firefly luciferase in Arabidopsis. Tagging of stress-responsive genes. Plant Physiol (2004) 0.89
Retracted Influence of ATP-Binding Cassette Transporters in Root Exudation of Phytoalexins, Signals, and in Disease Resistance. Front Plant Sci (2012) 0.87
The ERV-9 LTR enhancer is not blocked by the HS5 insulator and synthesizes through the HS5 site non-coding, long RNAs that regulate LTR enhancer function. Nucleic Acids Res (2003) 0.85
A single intact ATPase site of the ABC transporter BtuCD drives 5% transport activity yet supports full in vivo vitamin B12 utilization. Proc Natl Acad Sci U S A (2013) 0.85
Evaluation of monocot and eudicot divergence using the sugarcane transcriptome. Plant Physiol (2004) 0.84
The solution structure of the Mg2+ form of soybean calmodulin isoform 4 reveals unique features of plant calmodulins in resting cells. Protein Sci (2010) 0.84
Directed evolution of ampicillin-resistant activity from a functionally unrelated DNA fragment: A laboratory model of molecular evolution. Proc Natl Acad Sci U S A (2001) 0.84
Genomics, mutations and the Internet: the naming and use of parts. J Inherit Metab Dis (1999) 0.82
Genes coding for tryptophan-rich proteins are transcribed throughout the asexual cycle of Plasmodium falciparum. Parasitol Res (2005) 0.82
Reciprocal domain evolution within a transactivator in a restricted sequence space. Proc Natl Acad Sci U S A (2000) 0.82
Comparative analysis of complete genomes reveals gene loss, acquisition and acceleration of evolutionary rates in Metazoa, suggests a prevalence of evolution via gene acquisition and indicates that the evolutionary rates in animals tend to be conserved. Nucleic Acids Res (2004) 0.81
Prediction and analysis of canonical EF hand loop and qualitative estimation of Ca²⁺ binding affinity. PLoS One (2014) 0.79
NMR structure determination of proteins supplemented by quantum chemical calculations: detailed structure of the Ca2+ sites in the EGF34 fragment of protein S. J Biomol NMR (2005) 0.79
The rarity of gene shuffling in conserved genes. Genome Biol (2005) 0.79
Protein multifunctionality: principles and mechanisms. Transl Oncogenomics (2008) 0.78
Genome-wide identification and expression characterization of ABCC-MRP transporters in hexaploid wheat. Front Plant Sci (2015) 0.78
Occurrence of protein structure elements in conserved sequence regions. BMC Struct Biol (2007) 0.78
Expression analysis of sugarcane shaggy-like kinase (SuSK) gene identified through cDNA subtractive hybridization in sugarcane (Saccharum officinarum L.). Protoplasma (2010) 0.78
The evolutionary relationship of the domain architectures in the RhoGEF-containing proteins. Genomics Proteomics Bioinformatics (2005) 0.77
Linking the potato genome to the conserved ortholog set (COS) markers. BMC Genet (2013) 0.76
Long non-coding RNAs transcribed by ERV-9 LTR retrotransposon act in cis to modulate long-range LTR enhancer function. Nucleic Acids Res (2017) 0.76
Biochemical Characterization of a Mycobacteriophage Derived DnaB Ortholog Reveals New Insight into the Evolutionary Origin of DnaB Helicases. PLoS One (2015) 0.75
Advances in genetic hearing loss: CIB2 gene. Eur Arch Otorhinolaryngol (2016) 0.75
RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information. BMC Bioinformatics (2016) 0.75
INVHOGEN: a database of homologous invertebrate genes. Nucleic Acids Res (2006) 0.75
Hypermethylated LTR retrotransposon exhibits enhancer activity. Epigenetics (2017) 0.75
Comparative Analysis of the Flavobacterium columnare Genomovar I and II Genomes. Front Microbiol (2017) 0.75
ATP binding and hydrolysis disrupts the high-affinity interaction between the heme ABC transporter HmuUV and its cognate substrate binding protein. J Biol Chem (2017) 0.75
Initial sequencing and analysis of the human genome. Nature (2001) 212.86
SMART, a simple modular architecture research tool: identification of signaling domains. Proc Natl Acad Sci U S A (1998) 36.83
The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res (2001) 24.45
Prediction of deleterious human alleles. Hum Mol Genet (2001) 21.00
SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res (2000) 17.77
A common language for physical mapping of the human genome. Science (1989) 17.36
Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science (2001) 17.11
A physical map of the human genome. Nature (2001) 12.39
HGBASE: a database of SNPs and other variations in and around human genes. Nucleic Acids Res (2000) 11.75
BRCA1 protein products ... Functional motifs... Nat Genet (1996) 11.50
SMART: identification and annotation of domains from signalling and extracellular protein sequences. Nucleic Acids Res (1999) 11.33
An immunoglobulin heavy chain variable region gene is generated from three segments of DNA: VH, D and JH. Cell (1980) 10.35
Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci (1998) 9.94
Two mRNAs can be produced from a single immunoglobulin mu gene by alternative RNA processing pathways. Cell (1980) 9.86
Functional genomic analysis of cell division in C. elegans using RNAi of genes on chromosome III. Nature (2000) 9.82
Increased coverage of protein families with the blocks database servers. Nucleic Acids Res (2000) 9.18
DNA diagnostics--molecular techniques and automation. Science (1988) 8.85
Genome phylogeny based on gene content. Nat Genet (1999) 8.12
A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet (2001) 7.79
A ligase-mediated gene detection technique. Science (1988) 7.61
SOMAP: a novel interactive approach to multiple protein sequences alignment. Comput Appl Biosci (1991) 7.13
Measuring genome evolution. Proc Natl Acad Sci U S A (1998) 6.97
Alagille syndrome is caused by mutations in human Jagged1, which encodes a ligand for Notch1. Nat Genet (1997) 6.66
InterPro--an integrated documentation resource for protein families, domains and functional sites. Bioinformatics (2000) 6.42
Quantitative phylogenetic assessment of microbial communities in diverse environments. Science (2007) 6.35
Two mRNAs with different 3' ends encode membrane-bound and secreted forms of immunoglobulin mu chain. Cell (1980) 6.32
Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene (1995) 6.11
A single VH gene segment encodes the immune response to phosphorylcholine: somatic mutation is correlated with the class of the antibody. Cell (1981) 5.87
Genome sequence of Halobacterium species NRC-1. Proc Natl Acad Sci U S A (2000) 5.87
The organization, expression, and evolution of antibody genes and other multigene families. Annu Rev Genet (1975) 5.87
Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly related sequences. Nucleic Acids Res (1998) 5.70
STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res (2000) 5.68
Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr Biol (1996) 5.50
A pseudogene homologous to mouse transplantation antigens: transplantation antigens are encoded by eight exons that correlate with protein domains. Cell (1981) 5.43
Towards a structural basis of human non-synonymous single nucleotide polymorphisms. Trends Genet (2000) 5.41
Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res (2000) 5.14
A molecular map of the immune response region from the major histocompatibility complex of the mouse. Nature (1982) 5.12
EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett (2000) 4.96
Three cDNA clones encoding mouse transplantation antigens: homology to immunoglobulin genes. Cell (1981) 4.95
Targeted screening for induced mutations. Nat Biotechnol (2000) 4.89
Pairwise end sequencing: a unified approach to genomic mapping and sequencing. Genomics (1995) 4.83
Clusters of genes encoding mouse transplantation antigens. Cell (1982) 4.71
Targeting induced local lesions IN genomes (TILLING) for plant functional genomics. Plant Physiol (2000) 4.66
Antibody diversity: somatic hypermutation of rearranged VH genes. Cell (1981) 4.65
A new strategy for genome sequencing. Nature (1996) 4.61
Transgenic mice that express a myelin basic protein-specific T cell receptor develop spontaneous autoimmunity. Cell (1993) 4.59
eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res (2009) 4.55
Differential genome display. Trends Genet (1997) 4.48
Restricted use of T cell receptor V genes in murine autoimmune encephalomyelitis raises possibilities for antibody therapy. Cell (1988) 4.47
Genes of the major histocompatibility complex of the mouse. Annu Rev Immunol (1983) 4.44
Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs. Proc Natl Acad Sci U S A (1997) 4.40
A novel class of RanGTP binding proteins. J Cell Biol (1997) 4.36
REF, an evolutionary conserved family of hnRNP-like proteins, interacts with TAP/Mex67p and participates in mRNA nuclear export. RNA (2000) 4.10
HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources. Nucleic Acids Res (2002) 4.08
An immunoglobulin heavy-chain gene is formed by at least two recombinational events. Nature (1980) 3.99
Striking sequence similarity over almost 100 kilobases of human and mouse T-cell receptor DNA. Nat Genet (1994) 3.88
IgG antibodies to phosphorylcholine exhibit more diversity than their IgM counterparts. Nature (1981) 3.86
Prostate-localized and androgen-regulated expression of the membrane-bound serine protease TMPRSS2. Cancer Res (1999) 3.79
SNP frequencies in human genes an excess of rare alleles and differing modes of selection. Trends Genet (2000) 3.78
Comparison of ARM and HEAT protein repeats. J Mol Biol (2001) 3.78
Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations. Bioinformatics (1999) 3.70
Silkmoth chorion proteins: sequence analysis of the products of a multigene family. Proc Natl Acad Sci U S A (1978) 3.60
Structural studies of the scrapie prion protein using mass spectrometry and amino acid sequencing. Biochemistry (1993) 3.54
Introduced T cell receptor variable region gene segments recombine in pre-B cells: evidence that B and T cells use a common recombinase. Cell (1986) 3.51
Large-scale and automated DNA sequence determination. Science (1991) 3.45
Cell-type-specific cDNA probes and the murine I region: the localization and orientation of Ad alpha. Proc Natl Acad Sci U S A (1984) 3.43
The human T cell antigen receptor is encoded by variable, diversity, and joining gene segments that rearrange to generate a complete V gene. Cell (1984) 3.41
FingerPRINTScan: intelligent searching of the PRINTS motif database. Bioinformatics (1999) 3.36
Death receptor 5, a new member of the TNFR family, and DR4 induce FADD-dependent apoptosis and activate the NF-kappaB pathway. Immunity (1997) 3.33
Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer. J Mol Biol (1999) 3.28
Automated extraction of information in molecular biology. FEBS Lett (2000) 3.28
HEAT repeats in the Huntington's disease protein. Nat Genet (1995) 3.24
T cell antigen receptors and the immunoglobulin supergene family. Cell (1985) 3.23
Cell-cell interaction in prostate gene regulation and cytodifferentiation. Proc Natl Acad Sci U S A (1997) 3.22
Immunoglobulin heavy chain gene organization in mice: analysis of a myeloma genomic clone containing variable and alpha constant regions. Proc Natl Acad Sci U S A (1979) 3.19
The structure, rearrangement and expression of D beta gene segments of the murine T-cell antigen receptor. Nature (1984) 3.18
DNA sequence of a gene encoding a BALB/c mouse Ld transplantation antigen. Science (1982) 3.17
Non-orthologous gene displacement. Trends Genet (1996) 3.12
Characterization of a novel protein-binding module--the WW domain. FEBS Lett (1995) 3.11
Mechanism of antibody diversity: germ line basis for variability. Science (1970) 3.05
Protein annotation: detective work for function prediction. Trends Genet (1998) 3.04
Characterization of the mammalian YAP (Yes-associated protein) gene and its role in defining a novel protein module, the WW domain. J Biol Chem (1995) 3.01
Recent enhancements to the Blocks Database servers. Nucleic Acids Res (1997) 3.01
Predominant use of a V alpha gene segment in mouse T-cell receptors for cytochrome c. Nature (1987) 2.99
Evolution of domain families. Adv Protein Chem (2000) 2.98
Prediction of potential GPI-modification sites in proprotein sequences. J Mol Biol (1999) 2.95
Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence. Genome Res (2001) 2.92
Complete genomic sequence and analysis of 117 kb of human DNA containing the gene BRCA1. Genome Res (1996) 2.91
Automated DNA diagnostics using an ELISA-based oligonucleotide ligation assay. Proc Natl Acad Sci U S A (1990) 2.88
Genes of the major histocompatibility complex in mouse and man. Science (1983) 2.88
Yeast chromosome III: new gene functions. EMBO J (1994) 2.85
Rearrangement of genetic information may produce immunoglobulin diversity. Nature (1979) 2.84
SPRY domains in ryanodine receptors (Ca(2+)-release channels). Trends Biochem Sci (1997) 2.82
Re-annotating the Mycoplasma pneumoniae genome sequence: adding value, function and reading frames. Nucleic Acids Res (2000) 2.79
Mechanism of antibody synthesis: size differences between mouse kappa chains. Science (1967) 2.77
Zinc-dependent structure of a single-finger domain of yeast ADR1. Science (1988) 2.76
DNA sequence determination by hybridization: a strategy for efficient large-scale sequencing. Science (1993) 2.76
Gene context conservation of a higher order than operons. Trends Biochem Sci (2000) 2.66