Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.

PubWeight™: 7.18‹?› | Rank: Top 1%

🔗 View Article (PMC 27587)

Published in Proc Natl Acad Sci U S A on May 26, 1998

Authors

S E Brenner1, C Chothia, T J Hubbard

Author Affiliations

1: MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, United Kingdom. brenner@hyper.stanford.edu

Articles citing this

(truncated to the top 100)

SCOP: a structural classification of proteins database. Nucleic Acids Res (2000) 14.14

The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res (2000) 11.38

Comparative protein structure modeling using Modeller. Curr Protoc Bioinformatics (2006) 8.72

The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res (2003) 8.36

Protein database searches using compositionally adjusted substitution matrices. FEBS J (2005) 8.14

The Genomes of Oryza sativa: a history of duplications. PLoS Biol (2005) 7.67

Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. Genome Res (2006) 6.59

Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci (2000) 6.23

A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics (2004) 5.90

DIAN: a novel algorithm for genome ontological classification. Genome Res (2001) 5.55

Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res (2004) 5.26

ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches. Nucleic Acids Res (2001) 4.71

Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics (2010) 4.48

Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol (2005) 4.02

Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics (2004) 4.00

A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Res (2002) 3.76

Sensitivity and selectivity in protein structure comparison. Protein Sci (2004) 3.69

Benchmarking ortholog identification methods using functional genomics data. Genome Biol (2006) 3.54

Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci (2001) 2.90

Alignment of protein sequences by their profiles. Protein Sci (2004) 2.80

The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Sci (2002) 2.69

Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics (2004) 2.63

Sequence conserved for subcellular localization. Protein Sci (2002) 2.36

PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information. Nucleic Acids Res (2001) 2.23

Characterization of a genetic element carrying the macrolide efflux gene mef(A) in Streptococcus pneumoniae. Antimicrob Agents Chemother (2000) 2.21

High-level expression, functional reconstitution, and quaternary structure of a prokaryotic ClC-type chloride channel. J Gen Physiol (1999) 2.12

Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. Proc Natl Acad Sci U S A (1998) 2.08

All are not equal: a benchmark of different homology modeling programs. Protein Sci (2005) 1.99

Finding weak similarities between proteins by sequence profile comparison. Nucleic Acids Res (2003) 1.88

Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches. Nucleic Acids Res (2006) 1.81

LiveBench-1: continuous benchmarking of protein structure prediction servers. Protein Sci (2001) 1.80

UniqueProt: Creating representative protein sequence sets. Nucleic Acids Res (2003) 1.78

A study of quality measures for protein threading models. BMC Bioinformatics (2001) 1.74

Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics. Bioinformatics (2010) 1.73

Homologous over-extension: a challenge for iterative similarity searches. Nucleic Acids Res (2010) 1.70

PSI-BLAST pseudocounts and the minimum description length principle. Nucleic Acids Res (2008) 1.68

Expectations from structural genomics. Protein Sci (2000) 1.65

Multiple nonidentical reductive-dehalogenase-homologous genes are common in Dehalococcoides. Appl Environ Microbiol (2004) 1.65

Protein-protein interactions more conserved within species than across species. PLoS Comput Biol (2006) 1.61

The growth-regulatory protein HCRP1/hVps37A is a subunit of mammalian ESCRT-I and mediates receptor down-regulation. Mol Biol Cell (2004) 1.58

The limits of protein sequence comparison? Curr Opin Struct Biol (2005) 1.57

LEAping to conclusions: a computational reanalysis of late embryogenesis abundant proteins and their possible roles. BMC Bioinformatics (2003) 1.52

Eukaryotic CTR copper uptake transporters require two faces of the third transmembrane domain for helix packing, oligomerization, and function. J Biol Chem (2004) 1.44

Genomic scale sub-family assignment of protein domains. Nucleic Acids Res (2006) 1.42

A limited universe of membrane protein families and folds. Protein Sci (2006) 1.41

Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint. BMC Bioinformatics (2007) 1.34

Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER. BMC Bioinformatics (2005) 1.33

Comparison of human solute carriers. Protein Sci (2010) 1.29

Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database. Genome Res (2002) 1.29

BALSA: Bayesian algorithm for local sequence alignment. Nucleic Acids Res (2002) 1.27

Practical lessons from protein structure prediction. Nucleic Acids Res (2005) 1.25

The identification of complete domains within protein sequences using accurate E-values for semi-global alignment. Nucleic Acids Res (2007) 1.24

iAlign: a method for the structural comparison of protein-protein interfaces. Bioinformatics (2010) 1.17

Genome analysis: Assigning protein coding regions to three-dimensional structures. Protein Sci (1999) 1.16

A comparison of position-specific score matrices based on sequence and structure alignments. Protein Sci (2002) 1.13

Genome comparison using Gene Ontology (GO) with statistical testing. BMC Bioinformatics (2006) 1.10

Improved detection of homologous membrane proteins by inclusion of information from topology predictions. Protein Sci (2002) 1.10

Sequence variations within protein families are linearly related to structural variations. J Mol Biol (2002) 1.06

Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction. BMC Bioinformatics (2006) 1.05

Statistical limits to the identification of ion channel domains by sequence similarity. J Gen Physiol (2006) 1.02

A semi-quantitative, synteny-based method to improve functional predictions for hypothetical and poorly annotated bacterial and archaeal genes. PLoS Comput Biol (2011) 1.02

Analysis of protein sequence/structure similarity relationships. Biophys J (2002) 1.01

FINDSITE-metal: integrating evolutionary information and machine learning for structure-based metal-binding site prediction at the proteome level. Proteins (2010) 1.00

Protein backbone structure determination using only residual dipolar couplings from one ordering medium. J Biomol NMR (2001) 1.00

Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure. Bioinformatics (2007) 0.99

Structural similarity to link sequence space: new potential superfamilies and implications for structural genomics. Protein Sci (2002) 0.98

Evolutionary relationships among G protein-coupled receptors using a clustered database approach. AAPS PharmSci (2001) 0.98

Protein family comparison using statistical models and predicted structural information. BMC Bioinformatics (2004) 0.98

Comparative Protein Structure Modeling Using MODELLER. Curr Protoc Bioinformatics (2016) 0.95

Optimizing amino acid substitution matrices with a local alignment kernel. BMC Bioinformatics (2006) 0.94

Detecting remotely related proteins by their interactions and sequence similarity. Proc Natl Acad Sci U S A (2005) 0.93

Adjusting scoring matrices to correct overextended alignments. Bioinformatics (2013) 0.91

Profile hidden Markov models for the detection of viruses within metagenomic sequence data. PLoS One (2014) 0.89

Protein sequence alignment with family-specific amino acid similarity matrices. BMC Res Notes (2011) 0.89

Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments. BMC Bioinformatics (2010) 0.88

Enhanced fold recognition using efficient short fragment clustering. J Mol Biochem (2012) 0.87

Template-based protein structure modeling. Methods Mol Biol (2010) 0.86

PSimScan: algorithm and utility for fast protein similarity search. PLoS One (2013) 0.86

Statistical distributions of optimal global alignment scores of random protein sequences. BMC Bioinformatics (2005) 0.85

Identification of genes differentially expressed in extraradical mycelium and ectomycorrhizal roots during Paxillus involutus-Betula pendula ectomycorrhizal symbiosis. Appl Environ Microbiol (2005) 0.85

Testing statistical significance scores of sequence comparison methods with structure similarity. BMC Bioinformatics (2006) 0.85

Comparison of the small molecule metabolic enzymes of Escherichia coli and Saccharomyces cerevisiae. Genome Res (2002) 0.85

Bioinformatics resources for cancer research with an emphasis on gene function and structure prediction tools. Cancer Inform (2007) 0.83

Homology induction: the use of machine learning to improve sequence similarity searches. BMC Bioinformatics (2002) 0.83

SCANPS: a web server for iterative protein sequence database searching by dynamic programing, with display in a hierarchical SCOP browser. Nucleic Acids Res (2008) 0.82

A new procedure for determining the genetic basis of a physiological process in a non-model species, illustrated by cold induced angiogenesis in the carp. BMC Genomics (2009) 0.82

Use of residue pairs in protein sequence-sequence and sequence-structure alignments. Protein Sci (2000) 0.82

Globally, unrelated protein sequences appear random. Bioinformatics (2009) 0.82

A procedure for identifying homologous alternative splicing events. BMC Bioinformatics (2007) 0.81

Identification of genetic bases of vibrio fluvialis species-specific biochemical pathways and potential virulence factors by comparative genomic analysis. Appl Environ Microbiol (2014) 0.81

Panning for genes--A visual strategy for identifying novel gene orthologs and paralogs. Genome Res (1999) 0.80

HHsvm: fast and accurate classification of profile-profile matches identified by HHsearch. Bioinformatics (2009) 0.80

A computational strategy for protein function assignment which addresses the multidomain problem. Comp Funct Genomics (2002) 0.80

A high level interface to SCOP and ASTRAL implemented in python. BMC Bioinformatics (2006) 0.79

Using HHsearch to tackle proteins of unknown function: A pilot study with PH domains. Traffic (2016) 0.79

Validating annotations for uncharacterized proteins in Shewanella oneidensis. OMICS (2008) 0.79

A comparative genome-wide study of ncRNAs in trypanosomatids. BMC Genomics (2010) 0.78

The distance-profile representation and its application to detection of distantly related protein families. BMC Bioinformatics (2005) 0.77

Analysis of the human kinome using methods including fold recognition reveals two novel kinases. PLoS One (2008) 0.76

Automatic generation and evaluation of sparse protein signatures for families of protein structural domains. Protein Sci (2005) 0.75

Articles cited by this

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

Basic local alignment search tool. J Mol Biol (1990) 659.07

Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A (1988) 193.60

Identification of common molecular subsequences. J Mol Biol (1981) 130.53

SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol (1995) 74.88

Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A (1992) 61.33

RASMOL: biomolecular graphics for all. Trends Biochem Sci (1995) 33.76

Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins (1991) 32.50

CATH--a hierarchic classification of protein domain structures. Structure (1997) 29.95

Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem (1993) 27.24

Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A (1990) 24.42

Issues in searching molecular sequence databases. Nat Genet (1994) 19.28

On the statistical significance of nucleic acid similarities. Nucleic Acids Res (1984) 18.21

Local alignment statistics. Methods Enzymol (1996) 17.76

Analysis of compositionally biased regions in sequence databases. Methods Enzymol (1996) 17.11

Applications and statistics for multiple high-scoring segments in molecular sequences. Proc Natl Acad Sci U S A (1993) 12.10

Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics (1991) 9.65

An improved method of testing for evolutionary homology. J Mol Biol (1966) 9.42

Effective protein sequence comparison. Methods Enzymol (1996) 6.38

The SWISS-PROT protein sequence data bank and its new supplement TREMBL. Nucleic Acids Res (1996) 6.14

The PROSITE database, its status in 1995. Nucleic Acids Res (1996) 5.53

Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput Chem (1996) 5.16

Performance evaluation of amino acid substitution matrices. Proteins (1993) 4.46

Comparison of methods for searching protein sequence databases. Protein Sci (1995) 4.29

Understanding protein structure: using scop for fold interpretation. Methods Enzymol (1996) 4.22

An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J Mol Biol (1995) 3.74

Evaluation and improvements in the automatic alignment of protein sequences. Protein Eng (1989) 3.57

Population statistics of protein structures: lessons from structural classifications. Curr Opin Struct Biol (1997) 1.98

A structural basis for sequence comparisons. An evaluation of scoring methodologies. J Mol Biol (1993) 1.83

Crystal structure of the catalytic domain of a thermophilic endocellulase. Biochemistry (1993) 1.61

A structural explanation for the twilight zone of protein sequence homology. Structure (1996) 1.45

Alignment of the amino acid sequences of distantly related proteins using variable gap penalties. Protein Eng (1989) 1.34

PIR-International Protein Sequence Database. Methods Enzymol (1996) 1.05

Molecular packing and intermolecular contacts of sickling deer type III hemoglobin. J Mol Biol (1979) 1.00

Articles by these authors

SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol (1995) 74.88

Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature (2002) 28.79

Determinants of a protein fold. Unique features of the globin amino acid sequences. J Mol Biol (1987) 18.58

The relation between the divergence of sequence and structure in proteins. EMBO J (1986) 16.66

Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol (2001) 15.97

SCOP: a structural classification of proteins database. Nucleic Acids Res (2000) 14.14

The atomic structure of protein-protein recognition sites. J Mol Biol (1999) 12.63

Volume changes in protein evolution. J Mol Biol (1994) 12.07

Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol (1998) 9.09

Proteins. One thousand families for the molecular biologist. Nature (1992) 7.83

Principles of protein-protein recognition. Nature (1975) 6.50

Structural patterns in globular proteins. Nature (1976) 5.95

The nature of the accessible and buried surfaces in proteins. J Mol Biol (1976) 5.92

How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. J Mol Biol (1980) 5.85

Mining the draft human genome. Nature (2001) 5.81

Hydrophobic bonding and accessible surface area in proteins. Nature (1974) 5.69

Structural invariants in protein folding. Nature (1975) 5.60

Interior and surface of monomeric proteins. J Mol Biol (1987) 5.42

Structural mechanisms for domain movements in proteins. Biochemistry (1994) 5.36

Surface, subunit interfaces and interior of oligomeric proteins. J Mol Biol (1988) 4.80

The structure of protein-protein recognition sites. J Biol Chem (1990) 4.55

Helix to helix packing in proteins. J Mol Biol (1981) 4.24

Understanding protein structure: using scop for fold interpretation. Methods Enzymol (1996) 4.22

Principles that determine the structure of proteins. Annu Rev Biochem (1984) 4.15

Haemoglobin: the structural changes related to ligand binding and its allosteric mechanism. J Mol Biol (1979) 3.79

The solution structure of the S1 RNA binding domain: a member of an ancient nucleic acid-binding fold. Cell (1997) 3.65

Many of the immunoglobulin superfamily domains in cell adhesion molecules and surface receptors belong to a new structural set which is close to that containing variable domains. J Mol Biol (1994) 3.57

Volume changes on protein folding. Structure (1994) 3.54

The packing density in proteins: standard radii and volumes. J Mol Biol (1999) 3.48

Standard conformations for the canonical structures of immunoglobulins. J Mol Biol (1997) 3.42

Conformation of twisted beta-pleated sheets in proteins. J Mol Biol (1973) 3.15

Intermediate sequences increase the detection of homology between sequences. J Mol Biol (1997) 2.98

The accessible surface area and stability of oligomeric proteins. Nature (1987) 2.77

Cadherin superfamily proteins in Caenorhabditis elegans and Drosophila melanogaster. J Mol Biol (2001) 2.64

Structure of proteins: packing of alpha-helices and pleated sheets. Proc Natl Acad Sci U S A (1977) 2.60

RSDB: representative protein sequence databases have high information content. Bioinformatics (2000) 2.55

Evolution of proteins formed by beta-sheets. II. The core of the immunoglobulin domains. J Mol Biol (1982) 2.47

beta-Trefoil fold. Patterns of structure and sequence in the Kunitz inhibitors interleukins-1 beta and 1 alpha and fibroblast growth factors. J Mol Biol (1992) 2.45

SCOP: a structural classification of proteins database. Nucleic Acids Res (1997) 2.44

SCOP: a Structural Classification of Proteins database. Nucleic Acids Res (1999) 2.20

The evolution and structural anatomy of the small molecule metabolic pathways in Escherichia coli. J Mol Biol (2001) 2.15

Advances in structural genomics. Curr Opin Struct Biol (1999) 2.09

Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. Proc Natl Acad Sci U S A (1998) 2.08

Fast assignment of protein structures to sequences using the intermediate sequence library PDB-ISL. Bioinformatics (2000) 2.04

Domain association in immunoglobulin molecules. The packing of variable domains. J Mol Biol (1985) 2.03

Population statistics of protein structures: lessons from structural classifications. Curr Opin Struct Biol (1997) 1.98

Conformations of the third hypervariable region in the VH domain of immunoglobulins. J Mol Biol (1998) 1.96

Packing at the protein-water interface. Proc Natl Acad Sci U S A (1996) 1.91

Structural principles of alpha/beta barrel proteins: the packing of the interior of the sheet. Proteins (1989) 1.78

Evolution of proteins formed by beta-sheets. I. Plastocyanin and azurin. J Mol Biol (1982) 1.76

The structure of a PKD domain from polycystin-1: implications for polycystic kidney disease. EMBO J (1999) 1.75

Determination of protein function, evolution and interactions by structural genomics. Curr Opin Struct Biol (2001) 1.75

Numerical criteria for the evaluation of ab initio predictions of protein structure. Proteins (1997) 1.68

The predicted structure of immunoglobulin D1.3 and its comparison with the crystal structure. Science (1986) 1.68

Principles determining the structure of beta-sheet barrels in proteins. I. A theoretical analysis. J Mol Biol (1994) 1.66

Domain closure in adenylate kinase. Joints on either side of two helices close like neighboring fingers. J Mol Biol (1993) 1.61

Role of hydrophobicity in the binding of coenzymes. Appendix. Translational and rotational contribution to the free energy of dissociation. Biochemistry (1978) 1.61

Domain closure in lactoferrin. Two hinges produce a see-saw motion between alternative close-packed interfaces. J Mol Biol (1993) 1.55

Orthogonal packing of beta-pleated sheets in proteins. Biochemistry (1982) 1.55

Framework residue 71 is a major determinant of the position and conformation of the second hypervariable region in the VH domains of immunoglobulins. J Mol Biol (1990) 1.53

Molecular structure of a new family of ribonucleases. Nature (1982) 1.52

Outline structure of the human L1 cell adhesion molecule and the sites where mutations cause neurological disorders. EMBO J (1996) 1.51

Analysis of protein loop closure. Two types of hinges produce one motion in lactate dehydrogenase. J Mol Biol (1991) 1.50

Helix movements and the reconstruction of the haem pocket during the evolution of the cytochrome c family. J Mol Biol (1985) 1.49

Transmission of conformational change in insulin. Nature (1983) 1.48

Domain closure in mitochondrial aspartate aminotransferase. J Mol Biol (1992) 1.46

Comparative analysis of the polycystic kidney disease 1 (PKD1) gene reveals an integral membrane glycoprotein with multiple evolutionary conserved domains. Hum Mol Genet (1997) 1.45

Effect of strength and proprioception training on eversion to inversion strength ratios in subjects with unilateral functional ankle instability. Br J Sports Med (2003) 1.41

Elbow motion in the immunoglobulins involves a molecular ball-and-socket joint. Nature (1988) 1.39

Alignment of the amino acid sequences of distantly related proteins using variable gap penalties. Protein Eng (1989) 1.34

Structure and stability of an immunoglobulin superfamily domain from twitchin, a muscle protein of the nematode Caenorhabditis elegans. J Mol Biol (1996) 1.33

Principles determining the structure of beta-sheet barrels in proteins. II. The observed structures. J Mol Biol (1994) 1.33

SCOP, Structural Classification of Proteins database: applications to evaluation of the effectiveness of sequence alignment methods and statistics of protein structural data. Acta Crystallogr D Biol Crystallogr (1998) 1.29

Assessment of novel fold targets in CASP4: predictions of three-dimensional structures, secondary structures, and interresidue contacts. Proteins (2001) 1.26

The structural repertoire of the human V kappa domain. EMBO J (1995) 1.26

Gene duplications in H. influenzae. Nature (1995) 1.24

Role of subunit interfaces in the allosteric mechanism of hemoglobin. Proc Natl Acad Sci U S A (1976) 1.23

Mechanisms of domain closure in proteins. J Mol Biol (1984) 1.22

Serpin tertiary structure transformation. J Mol Biol (1991) 1.21

Conservation of folding and stability within a protein family: the tyrosine corner as an evolutionary cul-de-sac. J Mol Biol (2000) 1.20

Protein evolution. How far can sequences diverge? Nature (1997) 1.20

Domains in proteins: definitions, location, and structural principles. Methods Enzymol (1985) 1.20

Packing of alpha-helices onto beta-pleated sheets and the anatomy of alpha/beta proteins. J Mol Biol (1980) 1.17

Stability and specificity of protein-protein interactions: the case of the trypsin-trypsin inhibitor complexes. J Mol Biol (1976) 1.15

Structural determinants of the conformations of medium-sized loops in proteins. Proteins (1989) 1.15

Antibody structure, prediction and redesign. Biophys Chem (1997) 1.10

Immunoglobulin superfamily proteins in Caenorhabditis elegans. J Mol Biol (2000) 1.09

Small-molecule metabolism: an enzyme mosaic. Trends Biotechnol (2001) 1.08

Increased hospitalizations for asthma among children in the Washington, D.C. area during 1961-1981. Ann Allergy (1984) 1.07

Coiling of beta-pleated sheets. J Mol Biol (1983) 1.03

Heat-shock proteins during growth and sporulation of Bacillus subtilis. FEBS Lett (1985) 1.01

Canonical structures for the hypervariable regions of T cell alphabeta receptors. J Mol Biol (2000) 1.00

Members of the immunoglobulin superfamily in bacteria. Protein Sci (1996) 0.94

Perspectives: signal transduction. Proteins in motion. Science (1999) 0.93

Solvent accessibility, protein surfaces, and protein folding. Biophys J (1980) 0.92

Conformations of acetylcholine. Nature (1968) 0.92

Haemoglobin: the surface buried between the alpha 1 beta 1 and alpha 2 beta 2 dimers in the deoxy and oxy structures. J Mol Biol (1985) 0.92