Protein sequence similarity searches using patterns as seeds.

PubWeight™: 23.87‹?› | Rank: Top 0.01% | All-Time Top 10000

🔗 View Article (PMC 147803)

Published in Nucleic Acids Res on September 01, 1998

Authors

Z Zhang1, A A Schäffer, W Miller, T L Madden, D J Lipman, E V Koonin, S F Altschul

Author Affiliations

1: Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA.

Articles citing this

GenBank. Nucleic Acids Res (2000) 36.75

BLAST+: architecture and applications. BMC Bioinformatics (2009) 36.53

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2000) 34.79

GenBank. Nucleic Acids Res (2007) 25.54

GenBank. Nucleic Acids Res (2005) 19.25

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2001) 19.13

GenBank. Nucleic Acids Res (2002) 17.24

GenBank. Nucleic Acids Res (2007) 16.92

BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res (2004) 15.43

GenBank. Nucleic Acids Res (2008) 13.29

GenBank: update. Nucleic Acids Res (2004) 12.28

GenBank. Nucleic Acids Res (2006) 12.21

GenBank. Nucleic Acids Res (2009) 11.11

GenBank. Nucleic Acids Res (2012) 10.89

GenBank. Nucleic Acids Res (2003) 9.60

GenBank. Nucleic Acids Res (2011) 8.85

GenBank. Nucleic Acids Res (2010) 8.63

GenBank. Nucleic Acids Res (2014) 3.71

SURVEY AND SUMMARY: holliday junction resolvases and related nucleases: identification of new families, phyletic distribution and evolutionary trajectories. Nucleic Acids Res (2000) 3.56

GenBank. Nucleic Acids Res (2013) 3.47

Mutations of the protocadherin gene PCDH15 cause Usher syndrome type 1F. Am J Hum Genet (2001) 3.42

The Arabidopsis SLEEPY1 gene encodes a putative F-box subunit of an SCF E3 ubiquitin ligase. Plant Cell (2003) 3.09

ELM: the status of the 2010 eukaryotic linear motif resource. Nucleic Acids Res (2009) 2.59

Molecular determinants for targeting heterochromatin protein 1-mediated gene silencing: direct chromoshadow domain-KAP-1 corepressor interaction is essential. Mol Cell Biol (2000) 2.35

Genome of bacteriophage P1. J Bacteriol (2004) 2.15

toxB gene on pO157 of enterohemorrhagic Escherichia coli O157:H7 is required for full epithelial cell adherence phenotype. Infect Immun (2001) 2.04

Genome-wide analysis of ethylene-responsive element binding factor-associated amphiphilic repression motif-containing transcriptional regulators in Arabidopsis. Plant Physiol (2010) 1.97

Evolutionary connection between the catalytic subunits of DNA-dependent RNA polymerases and eukaryotic RNA-dependent RNA polymerases and the origin of RNA polymerases. BMC Struct Biol (2003) 1.80

GenBank. Nucleic Acids Res (2015) 1.67

Interaction between PAK and nck: a template for Nck targets and role of PAK autophosphorylation. Mol Cell Biol (2000) 1.66

Autoproteolysis in nucleoporin biogenesis. Proc Natl Acad Sci U S A (1999) 1.53

Nucleotide binding by the widespread high-affinity cyclic di-GMP receptor MshEN domain. Nat Commun (2016) 1.49

PASS2: an automated database of protein alignments organised as structural superfamilies. BMC Bioinformatics (2004) 1.37

A conserved Lsm-interaction motif in Prp24 required for efficient U4/U6 di-snRNP formation. RNA (2002) 1.34

Identification of three critical acidic residues of poly(ADP-ribose) glycohydrolase involved in catalysis: determining the PARG catalytic domain. Biochem J (2005) 1.28

Heterogeneous but conserved natural killer receptor gene complexes in four major orders of mammals. Proc Natl Acad Sci U S A (2006) 1.21

A highly conserved domain of the maize activator transposase is involved in dimerization. Plant Cell (2000) 1.20

The Fusarium verticillioides FUM gene cluster encodes a Zn(II)2Cys6 protein that affects FUM gene expression and fumonisin production. Eukaryot Cell (2007) 1.11

ElaD, a Deubiquitinating protease expressed by E. coli. PLoS One (2007) 1.11

Evolution and phylogeny of insect endogenous retroviruses. BMC Evol Biol (2001) 1.07

SCANMOT: searching for similar sequences using a simultaneous scan of multiple sequence motifs. Nucleic Acids Res (2005) 1.03

P48 major surface antigen of Mycoplasma agalactiae is homologous to a malp product of Mycoplasma fermentans and belongs to a selected family of bacterial lipoproteins. Infect Immun (1999) 1.02

Activation and inhibition of the receptor histidine kinase AgrC occurs through opposite helical transduction motions. Mol Cell (2014) 1.01

Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades. BMC Genomics (2009) 0.98

Trichomonas vaginalis vast BspA-like gene family: evidence for functional diversity from structural organisation and transcriptomics. BMC Genomics (2010) 0.97

Family classification without domain chaining. Bioinformatics (2009) 0.96

Genomic organization and evolutionary analysis of Ly49 genes encoding the rodent natural killer cell receptors: rapid evolution by repeated gene duplication. Immunogenetics (2004) 0.96

Small-molecule ligand docking into comparative models with Rosetta. Nat Protoc (2013) 0.96

CAMPR3: a database on sequences, structures and signatures of antimicrobial peptides. Nucleic Acids Res (2015) 0.91

Evolutionary diversification of plant shikimate kinase gene duplicates. PLoS Genet (2008) 0.90

Gbeta gamma -independent constitutive association of Galpha s with SHP-1 and angiotensin II receptor AT2 is essential in AT2-mediated ITIM-independent activation of SHP-1. Proc Natl Acad Sci U S A (2002) 0.89

Temporal regulation of gene expression of the Thermus thermophilus bacteriophage P23-45. J Mol Biol (2010) 0.88

Virome genomics: a tool for defining the human virome. Curr Opin Microbiol (2013) 0.86

DSP: a protein shape string and its profile prediction server. Nucleic Acids Res (2012) 0.86

PSimScan: algorithm and utility for fast protein similarity search. PLoS One (2013) 0.86

Variation in mitochondrial minichromosome composition between blood-sucking lice of the genus Haematopinus that infest horses and pigs. Parasit Vectors (2014) 0.86

A novel CHHC Zn-finger domain found in spliceosomal proteins and tRNA modifying enzymes. Bioinformatics (2008) 0.86

GenDiS: Genomic Distribution of protein structural domain Superfamilies. Nucleic Acids Res (2005) 0.85

Comparative homology agreement search: an effective combination of homology-search methods. Proc Natl Acad Sci U S A (2004) 0.85

GenBank. Nucleic Acids Res (2016) 0.84

Homodimerization of RBPMS2 through a new RRM-interaction motif is necessary to control smooth muscle plasticity. Nucleic Acids Res (2014) 0.83

Dimerization of the bacterial RsrI N6-adenine DNA methyltransferase. Nucleic Acids Res (2006) 0.83

A geometric interpretation for local alignment-free sequence comparison. J Comput Biol (2013) 0.83

Three-dimensional structure of the catalytic domain of the yeast beta-(1,3)-glucan transferase Gas1: a molecular modeling investigation. J Mol Model (2005) 0.82

The low incidence of diversity-generating retroelements in sequenced genomes. Mob Genet Elements (2012) 0.82

Phosphoglycerate mutases function as reverse regulated isoenzymes in Synechococcus elongatus PCC 7942. PLoS One (2013) 0.81

Estimation of protein function using template-based alignment of enzyme active sites. BMC Bioinformatics (2014) 0.81

Retrieving backbone string neighbors provides insights into structural modeling of membrane proteins. Mol Cell Proteomics (2012) 0.81

Expanding the Halohydrin Dehalogenase Enzyme Family: Identification of Novel Enzymes by Database Mining. Appl Environ Microbiol (2014) 0.81

PepPat, a pattern-based oligopeptide homology search method and the identification of a novel tachykinin-like peptide. Mamm Genome (2003) 0.80

Function and X-ray crystal structure of Escherichia coli YfdE. PLoS One (2013) 0.79

Comparison of Current BLAST Software on Nucleotide Sequences. IPDPS (2005) 0.78

iMOTdb--a comprehensive collection of spatially interacting motifs in proteins. Nucleic Acids Res (2006) 0.78

Two Tetrahymena G-DNA-binding proteins, TGP1 and TGP3, share novel motifs and may play a role in micronuclear division. Nucleic Acids Res (2000) 0.76

A meta-learning approach for B-cell conformational epitope prediction. BMC Bioinformatics (2014) 0.76

Ancient phylogenetic beginnings of immunoglobulin hypermutation. J Mol Evol (2006) 0.75

Improved performance of sequence search algorithms in remote homology detection. F1000Res (2013) 0.75

Tetramerization and interdomain flexibility of the replication initiation controller YabA enables simultaneous binding to multiple partners. Nucleic Acids Res (2015) 0.75

Identification of a new class of adenosine deaminase from Helicobacter pylori with homologs among diverse taxa. J Bacteriol (2013) 0.75

Functional Annotation of a Presumed Nitronate Monoxygenase Reveals a New Class of NADH:Quinone Reductases. J Biol Chem (2016) 0.75

Altered renin-angiotensin system gene expression causes renal hypoplasia in the rats with nitrofen-induced diaphragmatic hernia. Pediatr Surg Int (2006) 0.75

Articles cited by this

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

Basic local alignment search tool. J Mol Biol (1990) 659.07

Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A (1988) 193.60

A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol (1970) 155.96

Identification of common molecular subsequences. J Mol Biol (1981) 130.53

Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A (1992) 61.33

Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science (1996) 41.35

Optimal alignments in linear space. Comput Appl Biosci (1988) 38.10

Cytochrome c and dATP-dependent formation of Apaf-1/caspase-9 complex initiates an apoptotic protease cascade. Cell (1997) 24.57

An improved algorithm for matching biological sequences. J Mol Biol (1982) 21.95

2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans. Nature (1994) 21.30

Issues in searching molecular sequence databases. Nat Genet (1994) 19.28

Local alignment statistics. Methods Enzymol (1996) 17.76

Optimal sequence alignments. Proc Natl Acad Sci U S A (1983) 14.64

The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature (1997) 14.38

Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci U S A (1990) 12.96

Apaf-1, a human protein homologous to C. elegans CED-4, participates in cytochrome c-dependent activation of caspase-3. Cell (1997) 12.83

Optimal sequence alignment using affine gap costs. Bull Math Biol (1986) 12.15

The statistical distribution of nucleic acid similarities. Nucleic Acids Res (1985) 11.99

The significance of protein sequence similarities. Comput Appl Biosci (1988) 11.26

Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins. Proc Natl Acad Sci U S A (1991) 10.60

GenBank. Nucleic Acids Res (1998) 9.36

Matching sequences under deletion-insertion constraints. Proc Natl Acad Sci U S A (1972) 9.05

Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol Microbiol (1997) 8.69

The PROSITE database, its status in 1997. Nucleic Acids Res (1997) 8.12

An atypical topoisomerase II from Archaea with implications for meiotic recombination. Nature (1997) 7.06

Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs. Proc Natl Acad Sci U S A (1997) 4.40

Empirical statistical estimates for sequence similarity searches. J Mol Biol (1998) 4.14

Methods for calculating the probabilities of finding patterns in sequences. Comput Appl Biosci (1989) 3.64

Generalized affine gap costs for protein sequence alignment. Proteins (1998) 3.42

Cloning and nucleotide base sequence analysis of a spectinomycin adenyltransferase AAD(9) determinant from Enterococcus faecalis. Antimicrob Agents Chemother (1991) 3.13

Role of CED-4 in the activation of CED-3. Nature (1997) 2.76

Prediction of the coding sequences of unidentified human genes. IV. The coding sequences of 40 new genes (KIAA0121-KIAA0160) deduced by analysis of cDNA clones from human cell line KG-1. DNA Res (1995) 2.61

CCA-adding enzymes and poly(A) polymerases are all members of the same nucleotidyltransferase superfamily: characterization of the CCA-adding enzyme from the archaeal hyperthermophile Sulfolobus shibatae. RNA (1996) 2.59

Searching for patterns in protein and nucleic acid sequences. Methods Enzymol (1990) 2.39

Alignments without low-scoring regions. J Comput Biol (1998) 1.96

Caenorhabditis elegans CED-4 stimulates CED-3 processing and CED-3-induced apoptosis. Curr Biol (1997) 1.95

A system for pattern matching applications on biosequences. Comput Appl Biosci (1993) 1.79

Identification of the primase active site of the herpes simplex virus type 1 helicase-primase. J Biol Chem (1995) 1.71

A promoter associated with the neisserial repeat can be used to transcribe the uvrB gene from Neisseria gonorrhoeae. J Bacteriol (1995) 1.68

Approximate matching of regular expressions. Bull Math Biol (1989) 1.68

A simple tool to search for sequence motifs that are conserved in BLAST outputs. Comput Appl Biosci (1994) 1.51

Construction and analysis of a profile library characterizing groups of structurally known proteins. Protein Sci (1996) 1.47

Nonconserved segment of the MutL protein from Escherichia coli K-12 and Salmonella typhimurium. Nucleic Acids Res (1992) 1.30

Articles by these authors

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

Basic local alignment search tool. J Mol Biol (1990) 659.07

Initial sequencing and analysis of the human genome. Nature (2001) 212.86

Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A (1988) 193.60

Rapid and sensitive protein similarity searches. Science (1985) 76.83

Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci U S A (1983) 53.12

A genomic perspective on protein families. Science (1997) 50.51

The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res (2000) 49.22

A greedy algorithm for aligning DNA sequences. J Comput Biol (2000) 47.89

The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res (2001) 43.17

Optimal alignments in linear space. Comput Appl Biosci (1988) 38.10

Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science (1993) 36.84

GenBank. Nucleic Acids Res (2000) 36.75

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2000) 34.79

BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett (1999) 25.40

Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A (1990) 24.42

The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res (2007) 23.13

A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res (1998) 22.69

Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res (2001) 22.33

GenBank. Nucleic Acids Res (1999) 21.47

Aligning two sequences within a specified diagonal band. Comput Appl Biosci (1992) 19.31

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2001) 19.13

Faster sequential genetic linkage computations. Am J Hum Genet (1993) 18.83

Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A (1994) 18.46

On the statistical significance of nucleic acid similarities. Nucleic Acids Res (1984) 18.21

PipMaker--a web server for aligning two genomic DNA sequences. Genome Res (2000) 17.46

A tool for multiple sequence alignment. Proc Natl Acad Sci U S A (1989) 17.09

A workbench for multiple alignment construction and analysis. Proteins (1991) 16.96

Complete genome sequence of Salmonella enterica serovar Typhimurium LT2. Nature (2001) 16.89

A superfamily of conserved domains in DNA damage-responsive cell cycle checkpoint proteins. FASEB J (1997) 15.10

Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science (1998) 13.64

Comparative analyses of multi-species sequences from targeted genomic regions. Nature (2003) 13.31

Weights for data related by a tree. J Mol Biol (1989) 12.63

Optimal sequence alignment using affine gap costs. Bull Math Biol (1986) 12.15

Applications and statistics for multiple high-scoring segments in molecular sequences. Proc Natl Acad Sci U S A (1993) 12.10

GenBank. Nucleic Acids Res (1997) 11.73

BRCA1 protein products ... Functional motifs... Nat Genet (1996) 11.50

AAA+: A class of chaperone-like ATPases associated with the assembly, operation, and disassembly of protein complexes. Genome Res (1999) 11.30

Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science (2000) 10.82

SAGEmap: a public gene expression resource. Genome Res (2000) 10.41

Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science (2000) 10.14

Two related superfamilies of putative helicases involved in replication, recombination, repair and expression of DNA and RNA genomes. Nucleic Acids Res (1989) 10.03

GenBank. Nucleic Acids Res (1998) 9.36

Locally optimal subalignments using nonlinear similarity functions. Bull Math Biol (1986) 9.10

GenBank. Nucleic Acids Res (1993) 9.06

A public database for gene expression in human cancers. Cancer Res (1999) 8.81

Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res (1997) 8.49

Novel domains of the prokaryotic two-component signal transduction systems. FEMS Microbiol Lett (2001) 8.45

Scoring pairwise genomic sequence alignments. Pac Symp Biocomput (2002) 8.42

Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. Trends Biochem Sci (1998) 8.01

Microbial culturomics: paradigm shift in the human gut microbiome study. Clin Microbiol Infect (2012) 7.97

Comparison of DNA sequences with protein sequences. Genomics (1997) 7.76

GenBank. Nucleic Acids Res (1996) 7.06

IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics (1999) 6.91

Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs). Genome Biol (2000) 6.50

Genome sequence and comparative analysis of the solvent-producing bacterium Clostridium acetobutylicum. J Bacteriol (2001) 6.46

A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci U S A (1996) 6.38

Mapping sequenced E.coli genes by computer: software, strategies and examples. Nucleic Acids Res (1991) 6.29

Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science (1998) 6.28

Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption. In Silico Biol (1998) 6.22

Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol (1999) 5.90

Using the COG database to improve gene recognition in complete genomes. Genetica (2000) 5.80

Cleavage of cohesin by the CD clan protease separin triggers anaphase in yeast. Cell (2000) 5.69

GenBank. Nucleic Acids Res (1994) 5.63

Protein database searches for multiple alignments. Proc Natl Acad Sci U S A (1990) 5.52

Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr Biol (1996) 5.50

Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res (2001) 5.37

Large-scale comparative sequence analysis of the human and murine Bruton's tyrosine kinase loci reveals conserved regulatory domains. Genome Res (1997) 4.82

Conserved domains in DNA repair proteins and evolution of repair systems. Nucleic Acids Res (1999) 4.80

Genome of the extremely radiation-resistant bacterium Deinococcus radiodurans viewed from the perspective of comparative genomics. Microbiol Mol Biol Rev (2001) 4.75

Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol (2001) 4.59

Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs. Proc Natl Acad Sci U S A (1997) 4.40

The DNA-repair protein AlkB, EGL-9, and leprecan define new families of 2-oxoglutarate- and iron-dependent dioxygenases. Genome Biol (2001) 4.39

Conserved sequence motifs in the initiator proteins for rolling circle DNA replication encoded by diverse replicons from eubacteria, eucaryotes and archaebacteria. Nucleic Acids Res (1992) 4.39

Cysteine proteases of positive strand RNA viruses and chymotrypsin-like serine proteases. A distinct protein superfamily with a common structural fold. FEBS Lett (1989) 4.29

Common origin of four diverse families of large eukaryotic DNA viruses. J Virol (2001) 4.28

N-terminal domains of putative helicases of flavi- and pestiviruses may be serine proteases. Nucleic Acids Res (1989) 4.27

Identification of paracaspases and metacaspases: two ancient families of caspase-like proteins, one of which plays a key role in MALT lymphoma. Mol Cell (2000) 4.19

The HD domain defines a new superfamily of metal-dependent phosphohydrolases. Trends Biochem Sci (1998) 4.11

PowerBLAST: a new network BLAST application for interactive or automated sequence analysis and annotation. Genome Res (1997) 4.10

SAP - a putative DNA-binding motif involved in chromosomal organization. Trends Biochem Sci (2000) 4.10

Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet (1998) 4.10

Alignment of Escherichia coli K12 DNA sequences to a genomic restriction map. Nucleic Acids Res (1990) 4.07

The complete sequence (22 kilobases) of murine coronavirus gene 1 encoding the putative proteases and RNA polymerase. Virology (1991) 4.07

A controlled trial of a formalin-inactivated hepatitis A vaccine in healthy children. N Engl J Med (1992) 4.05

Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol (2000) 4.04

Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage. Environ Microbiol (2000) 3.99

Mutations in TNFRSF13B encoding TACI are associated with common variable immunodeficiency in humans. Nat Genet (2005) 3.97