Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.

PubWeight™: 22.33‹?› | Rank: Top 0.1% | All-Time Top 10000

🔗 View Article (PMC 55814)

Published in Nucleic Acids Res on July 15, 2001

Authors

A A Schäffer1, L Aravind, T L Madden, S Shavirin, J L Spouge, Y I Wolf, E V Koonin, S F Altschul

Author Affiliations

1: National Center for Biotechnology Information, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA. schaffer@helix.nih.gov

Articles citing this

(truncated to the top 100)

MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res (2004) 168.89

BLAST+: architecture and applications. BMC Bioinformatics (2009) 36.53

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2005) 22.98

Database resources of the National Center for Biotechnology Information: 2002 update. Nucleic Acids Res (2002) 19.40

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2006) 18.85

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2006) 18.84

Database resources of the National Center for Biotechnology. Nucleic Acids Res (2003) 18.26

Accelerated Profile HMM Searches. PLoS Comput Biol (2011) 15.22

FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol (2009) 15.15

Human CtIP promotes DNA end resection. Nature (2007) 10.63

The obesity-associated FTO gene encodes a 2-oxoglutarate-dependent nucleic acid demethylase. Science (2007) 9.86

Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res (2004) 9.85

CDD: NCBI's conserved domain database. Nucleic Acids Res (2014) 8.25

Protein database searches using compositionally adjusted substitution matrices. FEBS J (2005) 8.14

The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res (2004) 6.72

Emergence of fatal PRRSV variants: unparalleled outbreaks of atypical PRRS in China and molecular dissection of the unique hallmark. PLoS One (2007) 5.60

Type VI secretion system translocates a phage tail spike-like protein into target cells where it cross-links actin. Proc Natl Acad Sci U S A (2007) 5.48

Ctp1 is a cell-cycle-regulated protein that functions with Mre11 complex to control double-strand break repair by homologous recombination. Mol Cell (2007) 5.31

A probabilistic model of local sequence alignment that simplifies statistical significance estimation. PLoS Comput Biol (2008) 5.12

A novel method for accurate operon predictions in all sequenced prokaryotes. Nucleic Acids Res (2005) 4.68

Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics (2010) 4.48

Prediction of novel families of enzymes involved in oxidative and other complex modifications of bases in nucleic acids. Cell Cycle (2009) 4.14

Comparative genomics and evolution of proteins involved in RNA metabolism. Nucleic Acids Res (2002) 4.00

A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Res (2002) 3.76

Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline. Nucleic Acids Res (2007) 3.59

RAG1 core and V(D)J recombination signal sequences were derived from Transib transposons. PLoS Biol (2005) 3.50

Discovery of the principal specific transcription factors of Apicomplexa and their implication for the evolution of the AP2-integrase DNA binding domains. Nucleic Acids Res (2005) 3.36

The complete genome and proteome of Mycoplasma mobile. Genome Res (2004) 3.36

Comprehensive comparative-genomic analysis of type 2 toxin-antitoxin systems and related mobile stress response systems in prokaryotes. Biol Direct (2009) 3.17

Evolutionary history, structural features and biochemical diversity of the NlpC/P60 superfamily of enzymes. Genome Biol (2003) 3.02

Composition-based statistics and translated nucleotide searches: improving the TBLASTN module of BLAST. BMC Biol (2006) 2.90

Domain enhanced lookup time accelerated BLAST. Biol Direct (2012) 2.87

BLAST: a more efficient report with usability improvements. Nucleic Acids Res (2013) 2.77

The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc Natl Acad Sci U S A (2009) 2.50

Origin and evolution of the archaeo-eukaryotic primase superfamily and related palm-domain proteins: structural insights and new members. Nucleic Acids Res (2005) 2.49

GOing Bayesian: model-based gene set analysis of genome-scale data. Nucleic Acids Res (2010) 2.46

Comparative protein structure modeling by iterative alignment, model building and model assessment. Nucleic Acids Res (2003) 2.44

Selectively receptor-blind measles viruses: Identification of residues necessary for SLAM- or CD46-induced fusion and their localization on a new hemagglutinin structural model. J Virol (2004) 2.42

The compositional adjustment of amino acid substitution matrices. Proc Natl Acad Sci U S A (2003) 2.42

Genome-scale metabolic network analysis of the opportunistic pathogen Pseudomonas aeruginosa PAO1. J Bacteriol (2008) 2.25

Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures. Nucleic Acids Res (2003) 2.14

Prediction of functional sites by analysis of sequence and structure conservation. Protein Sci (2004) 2.11

An antisense RNA controls synthesis of an SOS-induced toxin evolved from an antitoxin. Mol Microbiol (2007) 2.11

Transposon mutagenesis of the mouse germline. Genetics (2003) 2.00

Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection. Nucleic Acids Res (2006) 1.96

Abundance of type I toxin-antitoxin systems in bacteria: searches for new candidates and discovery of novel families. Nucleic Acids Res (2010) 1.93

The life-cycle of operons. PLoS Genet (2006) 1.91

Brugia malayi excreted/secreted proteins at the host/parasite interface: stage- and gender-specific proteomic profiling. PLoS Negl Trop Dis (2009) 1.88

Finding weak similarities between proteins by sequence profile comparison. Nucleic Acids Res (2003) 1.88

Phylostratigraphic tracking of cancer genes suggests a link to the emergence of multicellularity in metazoa. BMC Biol (2010) 1.87

Predicting transmembrane beta-barrels in proteomes. Nucleic Acids Res (2004) 1.83

Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res (2013) 1.83

The JCVI standard operating procedure for annotating prokaryotic metagenomic shotgun sequencing data. Stand Genomic Sci (2010) 1.82

The prokaryotic antecedents of the ubiquitin-signaling system and the early evolution of ubiquitin-like beta-grasp domains. Genome Biol (2006) 1.82

Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches. Nucleic Acids Res (2006) 1.81

Improving gene annotation of complete viral genomes. Nucleic Acids Res (2003) 1.80

Evolutionary connection between the catalytic subunits of DNA-dependent RNA polymerases and eukaryotic RNA-dependent RNA polymerases and the origin of RNA polymerases. BMC Struct Biol (2003) 1.80

Planctomycetes and eukaryotes: a case of analogy not homology. Bioessays (2011) 1.76

PlanTAPDB, a phylogeny-based resource of plant transcription-associated proteins. Plant Physiol (2007) 1.74

Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics. Bioinformatics (2010) 1.73

ORFeus: Detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res (2003) 1.71

Protection of telomeres by a conserved Stn1-Ten1 complex. Proc Natl Acad Sci U S A (2007) 1.71

Homologous over-extension: a challenge for iterative similarity searches. Nucleic Acids Res (2010) 1.70

Three monophyletic superfamilies account for the majority of the known glycosyltransferases. Protein Sci (2003) 1.68

PSI-BLAST pseudocounts and the minimum description length principle. Nucleic Acids Res (2008) 1.68

Protein ranking: from local to global structure in the protein similarity network. Proc Natl Acad Sci U S A (2004) 1.67

Plant-mPLoc: a top-down strategy to augment the power for predicting plant protein subcellular localization. PLoS One (2010) 1.67

Large-scale transcriptome analysis in chickpea (Cicer arietinum L.), an orphan legume crop of the semi-arid tropics of Asia and Africa. Plant Biotechnol J (2011) 1.66

A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One (2011) 1.65

MyHits: a new interactive resource for protein annotation and domain identification. Nucleic Acids Res (2004) 1.64

Spatiotemporal controlled delivery of nanoparticles to injured vasculature. Proc Natl Acad Sci U S A (2010) 1.64

Operon formation is driven by co-regulation and not by horizontal gene transfer. Genome Res (2005) 1.64

Mechanism of the Class I KDPG aldolase. Bioorg Med Chem (2006) 1.62

The natural history of the WRKY-GCM1 zinc fingers and the relationship between transcription factors and transposons. Nucleic Acids Res (2006) 1.62

Novel domains and orthologues of eukaryotic transcription elongation factors. Nucleic Acids Res (2002) 1.60

iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins. PLoS One (2011) 1.59

The missing step of the L-galactose pathway of ascorbate biosynthesis in plants, an L-galactose guanyltransferase, increases leaf ascorbate content. Proc Natl Acad Sci U S A (2007) 1.59

Characterization of an anti-apoptotic glycoprotein encoded by Kaposi's sarcoma-associated herpesvirus which resembles a spliced variant of human survivin. EMBO J (2002) 1.57

The limits of protein sequence comparison? Curr Opin Struct Biol (2005) 1.57

Application of comparative genomics in the identification and analysis of novel families of membrane-associated receptors in bacteria. BMC Genomics (2003) 1.56

High-throughput computational and experimental techniques in structural genomics. Genome Res (2004) 1.55

Two novel type III-secreted proteins of Xanthomonas campestris pv. vesicatoria are encoded within the hrp pathogenicity island. J Bacteriol (2002) 1.53

Predation by Bdellovibrio bacteriovorus HD100 requires type IV pili. J Bacteriol (2007) 1.51

Reconstruction of ancestral protein sequences and its applications. BMC Evol Biol (2004) 1.48

Mapping metabolic and transcript temporal switches during germination in rice highlights specific transcription factors and the role of RNA instability in the germination process. Plant Physiol (2008) 1.45

Characterization of the archaeal thermophile Sulfolobus turreted icosahedral virus validates an evolutionary link among double-stranded DNA viruses from all domains of life. J Virol (2006) 1.44

A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0. PLoS One (2010) 1.44

iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One (2011) 1.43

MemBrain: improving the accuracy of predicting transmembrane helices. PLoS One (2008) 1.42

Repression of the LEAFY COTYLEDON 1/B3 regulatory network in plant embryo development by VP1/ABSCISIC ACID INSENSITIVE 3-LIKE B3 genes. Plant Physiol (2006) 1.41

Helicobacter pylori versus the host: remodeling of the bacterial outer membrane is required for survival in the gastric mucosa. PLoS Pathog (2011) 1.40

Nebulon: a system for the inference of functional relationships of gene products from the rearrangement of predicted operons. Nucleic Acids Res (2005) 1.39

Insight into DNA and protein transport in double-stranded DNA viruses: the structure of bacteriophage N4. J Mol Biol (2008) 1.39

Two-component response regulators of Vibrio fischeri: identification, mutagenesis, and characterization. J Bacteriol (2007) 1.38

An insight into the sialome of Anopheles funestus reveals an emerging pattern in anopheline salivary protein families. Insect Biochem Mol Biol (2006) 1.37

Determination of the structures of symmetric protein oligomers from NMR chemical shifts and residual dipolar couplings. J Am Chem Soc (2011) 1.35

Loss of the anaphase-promoting complex in quiescent cells causes unscheduled hepatocyte proliferation. Genes Dev (2004) 1.34

Extensive domain shuffling in transcription regulators of DNA viruses and implications for the origin of fungal APSES transcription factors. Genome Biol (2002) 1.34

Amino acid variant in the kinase binding domain of dual-specific A kinase-anchoring protein 2: a disease susceptibility polymorphism. Proc Natl Acad Sci U S A (2003) 1.33

Small but versatile: the extraordinary functional and structural diversity of the beta-grasp fold. Biol Direct (2007) 1.31

Articles cited by this

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

Basic local alignment search tool. J Mol Biol (1990) 659.07

Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A (1988) 193.60

The Protein Data Bank. Nucleic Acids Res (2000) 187.10

Identification of common molecular subsequences. J Mol Biol (1981) 130.53

Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A (1992) 61.33

Optimal alignments in linear space. Comput Appl Biosci (1988) 38.10

Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins (1991) 32.50

Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci (1994) 31.96

Measuring the accuracy of diagnostic systems. Science (1988) 27.46

Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A (1990) 24.42

Position-based sequence weights. J Mol Biol (1994) 24.41

Amino acid substitution matrices from an information theoretic perspective. J Mol Biol (1991) 23.38

An improved algorithm for matching biological sequences. J Mol Biol (1982) 21.95

Hidden Markov models for detecting remote protein homologies. Bioinformatics (1998) 21.29

Issues in searching molecular sequence databases. Nat Genet (1994) 19.28

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2001) 19.13

Local alignment statistics. Methods Enzymol (1996) 17.76

Weighting in sequence space: a comparison of methods in terms of generalized sequences. Proc Natl Acad Sci U S A (1993) 17.71

Maximum discrimination hidden Markov models of sequence consensus. J Comput Biol (1995) 15.75

Optimal sequence alignments. Proc Natl Acad Sci U S A (1983) 14.64

A weighting system and algorithm for aligning many phylogenetically related sequences. Comput Appl Biosci (1995) 13.48

Weights for data related by a tree. J Mol Biol (1989) 12.63

Optimal sequence alignment using affine gap costs. Bull Math Biol (1986) 12.15

Volume changes in protein evolution. J Mol Biol (1994) 12.07

The statistical distribution of nucleic acid similarities. Nucleic Acids Res (1985) 11.99

Weighting aligned protein or nucleic acid sequences to correct for unequal representation. J Mol Biol (1990) 11.50

The significance of protein sequence similarities. Comput Appl Biosci (1988) 11.26

Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins. Proc Natl Acad Sci U S A (1991) 10.60

A protein alignment scoring system sensitive at all evolutionary distances. J Mol Evol (1993) 10.53

Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol (1998) 9.09

Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins (2000) 8.17

Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. Trends Biochem Sci (1998) 8.01

IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics (1999) 6.91

Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science (1998) 6.28

Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci (2000) 6.23

Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol (1999) 5.90

Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput Chem (1996) 5.16

Comparison of methods for searching protein sequence databases. Protein Sci (1995) 4.29

Empirical statistical estimates for sequence similarity searches. J Mol Biol (1998) 4.14

The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Res (2001) 3.61

Accurate formula for P-values of gapped local sequence and profile alignments. J Mol Biol (2000) 3.51

Generalized affine gap costs for protein sequence alignment. Proteins (1998) 3.42

Crystal structure of the BTB domain from PLZF. Proc Natl Acad Sci U S A (1998) 2.84

Rapid and accurate estimates of statistical significance for sequence data base searches. Proc Natl Acad Sci U S A (1994) 2.72

An evolutionary classification of the metallo-beta-lactamase fold proteins. In Silico Biol (1999) 2.58

Fold prediction and evolutionary analysis of the POZ domain: structural and evolutionary relationship with the potassium channel tetramerization domain. J Mol Biol (1999) 2.37

A novel family of predicted phosphoesterases includes Drosophila prune protein and bacterial RecJ exonuclease. Trends Biochem Sci (1998) 2.33

Benchmarking PSI-BLAST in genome annotation. J Mol Biol (1999) 1.87

Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bull Math Biol (1992) 1.67

Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases. Bioinformatics (2000) 1.17

Articles by these authors

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

Initial sequencing and analysis of the human genome. Nature (2001) 212.86

A genomic perspective on protein families. Science (1997) 50.51

The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res (2000) 49.22

The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res (2001) 43.17

Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science (1993) 36.84

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2000) 34.79

Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science (2009) 32.97

BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett (1999) 25.40

Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A (1990) 24.42

Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res (1998) 23.87

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2001) 19.13

Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A (1994) 18.46

A tool for multiple sequence alignment. Proc Natl Acad Sci U S A (1989) 17.09

A workbench for multiple alignment construction and analysis. Proteins (1991) 16.96

A superfamily of conserved domains in DNA damage-responsive cell cycle checkpoint proteins. FASEB J (1997) 15.10

Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science (1998) 13.64

De-ubiquitination and ubiquitin ligase domains of A20 downregulate NF-kappaB signalling. Nature (2004) 12.41

Optimal sequence alignment using affine gap costs. Bull Math Biol (1986) 12.15

Applications and statistics for multiple high-scoring segments in molecular sequences. Proc Natl Acad Sci U S A (1993) 12.10

BRCA1 protein products ... Functional motifs... Nat Genet (1996) 11.50

AAA+: A class of chaperone-like ATPases associated with the assembly, operation, and disassembly of protein complexes. Genome Res (1999) 11.30

Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science (2000) 10.82

SAGEmap: a public gene expression resource. Genome Res (2000) 10.41

Two related superfamilies of putative helicases involved in replication, recombination, repair and expression of DNA and RNA genomes. Nucleic Acids Res (1989) 10.03

Locally optimal subalignments using nonlinear similarity functions. Bull Math Biol (1986) 9.10

Impaired hydroxylation of 5-methylcytosine in myeloid cancers with mutant TET2. Nature (2010) 8.87

A public database for gene expression in human cancers. Cancer Res (1999) 8.81

Novel domains of the prokaryotic two-component signal transduction systems. FEMS Microbiol Lett (2001) 8.45

Horizontal gene transfer in prokaryotes: quantification and classification. Annu Rev Microbiol (2001) 8.14

Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. Trends Biochem Sci (1998) 8.01

Genome sequence of the radioresistant bacterium Deinococcus radiodurans R1. Science (1999) 8.01

Microbial culturomics: paradigm shift in the human gut microbiome study. Clin Microbiol Infect (2012) 7.97

Role of Rpn11 metalloprotease in deubiquitination and degradation by the 26S proteasome. Science (2002) 7.49

IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics (1999) 6.91

Classification and evolution of P-loop GTPases and related ATPases. J Mol Biol (2002) 6.85

Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs). Genome Biol (2000) 6.50

Genome sequence and comparative analysis of the solvent-producing bacterium Clostridium acetobutylicum. J Bacteriol (2001) 6.46

A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci U S A (1996) 6.38

Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science (1998) 6.28

Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption. In Silico Biol (1998) 6.22

Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol (1999) 5.90

Using the COG database to improve gene recognition in complete genomes. Genetica (2000) 5.80

Cleavage of cohesin by the CD clan protease separin triggers anaphase in yeast. Cell (2000) 5.69

Complete genome sequence of the apicomplexan, Cryptosporidium parvum. Science (2004) 5.64

Factors underlying spontaneous inactivation and susceptibility to neutralization of human immunodeficiency virus. Virology (1992) 5.64

Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr Biol (1996) 5.50

Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res (2001) 5.37

Role of predicted metalloprotease motif of Jab1/Csn5 in cleavage of Nedd8 from Cul1. Science (2002) 4.95

Prediction of immunodominant helper T cell antigenic sites from the primary sequence. J Immunol (1987) 4.95

Conserved domains in DNA repair proteins and evolution of repair systems. Nucleic Acids Res (1999) 4.80

Genome of the extremely radiation-resistant bacterium Deinococcus radiodurans viewed from the perspective of comparative genomics. Microbiol Mol Biol Rev (2001) 4.75

Evolutionary history and higher order classification of AAA+ ATPases. J Struct Biol (2004) 4.68

Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol (2001) 4.59

Evolutionary genomics of nucleo-cytoplasmic large DNA viruses. Virus Res (2006) 4.49

Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol (2004) 4.48

Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs. Proc Natl Acad Sci U S A (1997) 4.40

The DNA-repair protein AlkB, EGL-9, and leprecan define new families of 2-oxoglutarate- and iron-dependent dioxygenases. Genome Biol (2001) 4.39

Conserved sequence motifs in the initiator proteins for rolling circle DNA replication encoded by diverse replicons from eubacteria, eucaryotes and archaebacteria. Nucleic Acids Res (1992) 4.39

A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis. Nucleic Acids Res (2002) 4.39

Cysteine proteases of positive strand RNA viruses and chymotrypsin-like serine proteases. A distinct protein superfamily with a common structural fold. FEBS Lett (1989) 4.29

Common origin of four diverse families of large eukaryotic DNA viruses. J Virol (2001) 4.28

The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res (2002) 4.28

N-terminal domains of putative helicases of flavi- and pestiviruses may be serine proteases. Nucleic Acids Res (1989) 4.27

Identification of paracaspases and metacaspases: two ancient families of caspase-like proteins, one of which plays a key role in MALT lymphoma. Mol Cell (2000) 4.19

Prediction of novel families of enzymes involved in oxidative and other complex modifications of bases in nucleic acids. Cell Cycle (2009) 4.14

The HD domain defines a new superfamily of metal-dependent phosphohydrolases. Trends Biochem Sci (1998) 4.11

PowerBLAST: a new network BLAST application for interactive or automated sequence analysis and annotation. Genome Res (1997) 4.10

SAP - a putative DNA-binding motif involved in chromosomal organization. Trends Biochem Sci (2000) 4.10

Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet (1998) 4.10

The complete sequence (22 kilobases) of murine coronavirus gene 1 encoding the putative proteases and RNA polymerase. Virology (1991) 4.07

Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. J Mol Biol (1987) 4.07

Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol (2000) 4.04

Comparative genomics and evolution of proteins involved in RNA metabolism. Nucleic Acids Res (2002) 4.00

Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage. Environ Microbiol (2000) 3.99

Viral proteins containing the purine NTP-binding sequence pattern. Nucleic Acids Res (1989) 3.96