The limits of protein sequence comparison?

PubWeight™: 1.57‹?› | Rank: Top 4%

🔗 View Article (PMC 2845305)

Published in Curr Opin Struct Biol on June 01, 2005

Authors

William R Pearson1, Michael L Sierk

Author Affiliations

1: Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA. wrp@virginia.edu

Articles citing this

A probabilistic model of local sequence alignment that simplifies statistical significance estimation. PLoS Comput Biol (2008) 5.12

Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res (2013) 1.83

Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches. Nucleic Acids Res (2006) 1.81

SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics. BMC Evol Biol (2007) 1.76

Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics. Bioinformatics (2010) 1.73

Homologous over-extension: a challenge for iterative similarity searches. Nucleic Acids Res (2010) 1.70

Accurate statistical model of comparison between multiple sequence alignments. Nucleic Acids Res (2008) 1.28

Novel protein folds and their nonsequential structural analogs. Protein Sci (2008) 1.19

The structure of an archaeal pilus. J Mol Biol (2008) 1.06

A comprehensive system for evaluation of remote sequence similarity detection. BMC Bioinformatics (2007) 1.05

Evolutionarily conserved substrate substructures for automated annotation of enzyme superfamilies. PLoS Comput Biol (2008) 0.98

Viral proteins originated de novo by overprinting can be identified by codon usage: application to the "gene nursery" of Deltaretroviruses. PLoS Comput Biol (2013) 0.96

Powerful sequence similarity search methods and in-depth manual analyses can identify remote homologs in many apparently "orphan" viral proteins. J Virol (2013) 0.93

Adjusting scoring matrices to correct overextended alignments. Bioinformatics (2013) 0.91

A combinatorial approach to detect coevolved amino acid networks in protein families of variable divergence. PLoS Comput Biol (2009) 0.88

Testing statistical significance scores of sequence comparison methods with structure similarity. BMC Bioinformatics (2006) 0.85

Globally, unrelated protein sequences appear random. Bioinformatics (2009) 0.82

Functional tissue units and their primary tissue motifs in multi-scale physiology. J Biomed Semantics (2013) 0.81

Biophysical constraints on the evolution of tissue structure and function. J Physiol (2014) 0.80

CLAP: a web-server for automatic classification of proteins with special reference to multi-domain proteins. BMC Bioinformatics (2014) 0.78

Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification. J Virol (2017) 0.76

Comparative Transcriptomic Approaches Exploring Contamination Stress Tolerance in Salix sp. Reveal the Importance for a Metaorganismal de Novo Assembly Approach for Nonmodel Plants. Plant Physiol (2016) 0.75

ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes. Bioinformatics (2010) 0.75

Molecular Phylogenetics and the Perennial Problem of Homology. J Mol Evol (2016) 0.75

A comparison of different functions for predicted protein model quality assessment. J Comput Aided Mol Des (2016) 0.75

Articles cited by this

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

Basic local alignment search tool. J Mol Biol (1990) 659.07

Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A (1988) 193.60

Identification of common molecular subsequences. J Mol Biol (1981) 130.53

Rapid and sensitive protein similarity searches. Science (1985) 76.83

The Pfam protein families database. Nucleic Acids Res (2004) 56.46

Profile hidden Markov models. Bioinformatics (1998) 56.04

Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci U S A (1983) 53.12

Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol (1994) 31.57

CATH--a hierarchic classification of protein domain structures. Structure (1997) 29.95

Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A (1987) 29.26

Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng (1998) 28.09

Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A (1990) 24.42

Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res (2001) 22.33

Surprising similarities in structure comparison. Curr Opin Struct Biol (1996) 22.27

GenBank. Nucleic Acids Res (2005) 19.25

Mapping the protein universe. Science (1996) 13.72

Embedding strategies for effective use of information from multiple sequence alignments. Protein Sci (1997) 11.25

Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics (1991) 9.65

An improved method of testing for evolutionary homology. J Mol Biol (1966) 9.42

Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol (1998) 9.09

Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. Science (1983) 9.06

COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol (2003) 8.35

Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Natl Acad Sci U S A (1998) 7.18

IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics (1999) 6.91

The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res (2004) 6.72

A comparison of scoring functions for protein sequence profile alignment. Bioinformatics (2004) 6.44

Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci (2000) 6.23

COACH: profile-profile alignment of protein families using hidden Markov models. Bioinformatics (2004) 6.22

Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol (1999) 5.90

Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucleic Acids Res (1996) 5.68

Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. J Mol Biol (2002) 4.99

Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput Appl Biosci (1996) 4.58

Comparison of methods for searching protein sequence databases. Protein Sci (1995) 4.29

Empirical statistical estimates for sequence similarity searches. J Mol Biol (1998) 4.14

Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol (2005) 4.02

Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics (2000) 3.80

A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Res (2002) 3.76

Sensitivity and selectivity in protein structure comparison. Protein Sci (2004) 3.69

A unified statistical framework for sequence comparison and structure comparison. Proc Natl Acad Sci U S A (1998) 3.68

Using video-oriented instructions to speed up sequence comparison. Comput Appl Biosci (1997) 2.96

Profile-profile methods provide improved fold-recognition: a study of different profile-profile alignment methods. Proteins (2004) 2.52

SCOP: a structural classification of proteins database. Nucleic Acids Res (1997) 2.44

Protein sequence databases. Curr Opin Chem Biol (2004) 2.43

The compositional adjustment of amino acid substitution matrices. Proc Natl Acad Sci U S A (2003) 2.42

Assessment of homology-based predictions in CASP5. Proteins (2003) 2.29

On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? J Struct Biol (2001) 1.90

Scoring profile-to-profile sequence alignments. Protein Sci (2004) 1.71

Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bull Math Biol (1992) 1.67

Profile-profile comparisons by COMPASS predict intricate homologies between protein families. Protein Sci (2003) 1.26

Using evolutionary information for the query and target improves fold recognition. Proteins (2004) 1.07

Comparative modeling in CASP5: progress is evident, but alignment errors remain a significant hindrance. Proteins (2003) 1.03

Detection of homologous proteins by an intermediate sequence search. Protein Sci (2004) 0.93