Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods.

PubWeight™: 9.09‹?› | Rank: Top 0.1%

🔗 View Article (PMID 9837738)

Published in J Mol Biol on December 11, 1998

Authors

J Park1, K Karplus, C Barrett, R Hughey, D Haussler, T Hubbard, C Chothia

Author Affiliations

1: MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, UK.

Articles citing this

(truncated to the top 100)

Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res (2001) 22.33

CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res (2002) 18.54

SCOP: a structural classification of proteins database. Nucleic Acids Res (2000) 14.14

The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res (2000) 11.38

Comparative protein structure modeling using Modeller. Curr Protoc Bioinformatics (2006) 8.72

The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res (2004) 6.72

Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. Genome Res (2006) 6.59

Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci (2000) 6.23

The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res (2005) 5.59

A probabilistic model of local sequence alignment that simplifies statistical significance estimation. PLoS Comput Biol (2008) 5.12

Assigning genomic sequences to CATH. Nucleic Acids Res (2000) 4.22

SCOOP: a simple method for identification of novel protein superfamily relationships. Bioinformatics (2007) 4.19

Molecular and phylogenetic analyses of the complete MADS-box transcription factor family in Arabidopsis: new openings to the MADS world. Plant Cell (2003) 3.80

A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Res (2002) 3.76

Sensitivity and selectivity in protein structure comparison. Protein Sci (2004) 3.69

Protein subfamily assignment using the Conserved Domain Database. BMC Res Notes (2008) 3.62

Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire. Proc Natl Acad Sci U S A (2009) 3.56

Systematic identification of novel protein domain families associated with nuclear functions. Genome Res (2002) 3.50

Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci (2001) 2.90

Alignment of protein sequences by their profiles. Protein Sci (2004) 2.80

Multipass membrane protein structure prediction using Rosetta. Proteins (2006) 2.79

Nature of the protein universe. Proc Natl Acad Sci U S A (2009) 2.73

The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Sci (2002) 2.69

Sequence conserved for subcellular localization. Protein Sci (2002) 2.36

PhyloFacts: an online structural phylogenomic encyclopedia for protein functional and structural classification. Genome Biol (2006) 2.27

PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information. Nucleic Acids Res (2001) 2.23

Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. Proc Natl Acad Sci U S A (1998) 2.08

Improving the quality of twilight-zone alignments. Protein Sci (2000) 2.03

All are not equal: a benchmark of different homology modeling programs. Protein Sci (2005) 1.99

GTOP: a database of protein structures predicted from genome sequences. Nucleic Acids Res (2002) 1.97

Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection. Nucleic Acids Res (2006) 1.96

Automated protein subfamily identification and classification. PLoS Comput Biol (2007) 1.89

Finding weak similarities between proteins by sequence profile comparison. Nucleic Acids Res (2003) 1.88

From endonucleases to transcription factors: evolution of the AP2 DNA binding domain in plants. Plant Cell (2004) 1.87

Enhanced protein domain discovery using taxonomy. BMC Bioinformatics (2004) 1.81

LiveBench-1: continuous benchmarking of protein structure prediction servers. Protein Sci (2001) 1.80

Bioinformatics analysis of the locus for enterocyte effacement provides novel insights into type-III secretion. BMC Microbiol (2005) 1.78

A study of quality measures for protein threading models. BMC Bioinformatics (2001) 1.74

Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res (2012) 1.74

Functional insights from the distribution and role of homopeptide repeat-containing proteins. Genome Res (2005) 1.70

Protein ranking: from local to global structure in the protein similarity network. Proc Natl Acad Sci U S A (2004) 1.67

A rapid classification protocol for the CATH Domain Database to support structural genomics. Nucleic Acids Res (2001) 1.61

The limits of protein sequence comparison? Curr Opin Struct Biol (2005) 1.57

Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genome Res (2002) 1.55

A gold standard set of mechanistically diverse enzyme superfamilies. Genome Biol (2006) 1.52

FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function. BMC Evol Biol (2007) 1.48

Genomic scale sub-family assignment of protein domains. Nucleic Acids Res (2006) 1.42

PASS2: an automated database of protein alignments organised as structural superfamilies. BMC Bioinformatics (2004) 1.37

Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint. BMC Bioinformatics (2007) 1.34

Evaluation of PSI-BLAST alignment accuracy in comparison to structural alignments. Protein Sci (2000) 1.33

Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER. BMC Bioinformatics (2005) 1.33

Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database. Genome Res (2002) 1.29

The identification of complete domains within protein sequences using accurate E-values for semi-global alignment. Nucleic Acids Res (2007) 1.24

Structural characterization of the human proteome. Genome Res (2002) 1.24

Assessing strategies for improved superfamily recognition. Protein Sci (2005) 1.21

Evolution of proteins and proteomes: a phylogenetics approach. Evol Bioinform Online (2007) 1.16

Purification, molecular cloning, and sequence analysis of sucrose-6F-phosphate phosphohydrolase from plants. Proc Natl Acad Sci U S A (2000) 1.15

GISMO--gene identification using a support vector machine for ORF classification. Nucleic Acids Res (2006) 1.14

A comparison of position-specific score matrices based on sequence and structure alignments. Protein Sci (2002) 1.13

Phylogenetic profiles reveal evolutionary relationships within the "twilight zone" of sequence similarity. Proc Natl Acad Sci U S A (2008) 1.11

Persistently conserved positions in structurally similar, sequence dissimilar proteins: roles in preserving protein fold and function. Protein Sci (2002) 1.10

Improved detection of homologous membrane proteins by inclusion of information from topology predictions. Protein Sci (2002) 1.10

GeneSpeed: protein domain organization of the transcriptome. Nucleic Acids Res (2006) 1.07

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition. BMC Bioinformatics (2007) 1.06

Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction. BMC Bioinformatics (2006) 1.05

Protein structural similarity search by Ramachandran codes. BMC Bioinformatics (2007) 1.04

Divergent evolution within protein superfolds inferred from profile-based phylogenetics. J Mol Biol (2005) 1.04

The evolution of enzyme specificity in Fasciola spp. J Mol Evol (2003) 0.99

Oligomerization of hantavirus nucleocapsid protein: analysis of the N-terminal coiled-coil domain. J Virol (2006) 0.99

PSI-BLAST-ISS: an intermediate sequence search tool for estimation of the position-specific alignment reliability. BMC Bioinformatics (2005) 0.99

EVEREST: automatic identification and classification of protein domains in all protein sequences. BMC Bioinformatics (2006) 0.99

Evolutionary relationships among G protein-coupled receptors using a clustered database approach. AAPS PharmSci (2001) 0.98

Word correlation matrices for protein sequence analysis and remote homology detection. BMC Bioinformatics (2008) 0.97

Coupling of Ci-VSP modules requires a combination of structure and electrostatics within the linker. Biophys J (2012) 0.97

A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction. Sci Rep (2013) 0.97

The TyrA family of aromatic-pathway dehydrogenases in phylogenetic context. BMC Biol (2005) 0.96

Comparative Protein Structure Modeling Using MODELLER. Curr Protoc Bioinformatics (2016) 0.95

In silico identification of specialized secretory-organelle proteins in apicomplexan parasites and in vivo validation in Toxoplasma gondii. PLoS One (2008) 0.95

Classification of transmembrane protein families in the Caenorhabditis elegans genome and identification of human orthologs. Genome Res (2000) 0.95

Protein ranking by semi-supervised network propagation. BMC Bioinformatics (2006) 0.95

Conservation of structure and function among tyrosine recombinases: homology-based modeling of the lambda integrase core-binding domain. Nucleic Acids Res (2003) 0.94

DAhunter: a web-based server that identifies homologous proteins by comparing domain architecture. Nucleic Acids Res (2008) 0.94

Thermodynamic propensities of amino acids in the native state ensemble: implications for fold recognition. Protein Sci (2001) 0.93

Detection of homologous proteins by an intermediate sequence search. Protein Sci (2004) 0.93

Improving model construction of profile HMMs for remote homology detection through structural alignment. BMC Bioinformatics (2007) 0.90

MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput Biol (2014) 0.89

Profile hidden Markov models for the detection of viruses within metagenomic sequence data. PLoS One (2014) 0.89

webPRC: the Profile Comparer for alignment-based searching of public domain databases. Nucleic Acids Res (2009) 0.88

Fold recognition without folds. Protein Sci (2002) 0.88

Homology modeling a fast tool for drug discovery: current perspectives. Indian J Pharm Sci (2012) 0.87

Detecting DNA-binding helix-turn-helix structural motifs using sequence and structure information. Nucleic Acids Res (2005) 0.87

The absence of TIR-type resistance gene analogues in the sugar beet (Beta vulgaris L.) genome. J Mol Evol (2004) 0.87

Prediction of MHC class I binding peptides by a query learning algorithm based on hidden markov models. J Biol Phys (2002) 0.86

Mitochondrial dysfunction in lyssavirus-induced apoptosis. J Virol (2008) 0.86

Improving classification in protein structure databases using text mining. BMC Bioinformatics (2009) 0.85

Testing statistical significance scores of sequence comparison methods with structure similarity. BMC Bioinformatics (2006) 0.85

Exploiting protein structure data to explore the evolution of protein function and biological complexity. Philos Trans R Soc Lond B Biol Sci (2006) 0.85

SIB-BLAST: a web server for improved delineation of true and false positives in PSI-BLAST searches. Nucleic Acids Res (2009) 0.84

pfsearchV3: a code acceleration and heuristic to search PROSITE profiles. Bioinformatics (2013) 0.84

Detection of protein fold similarity based on correlation of amino acid properties. Proc Natl Acad Sci U S A (1999) 0.84

Articles by these authors

Initial sequencing and analysis of the human genome. Nature (2001) 212.86

SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol (1995) 74.88

The UCSC Genome Browser Database. Nucleic Acids Res (2003) 32.84

Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol (1994) 31.57

Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature (2002) 28.79

The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res (2007) 23.13

Hidden Markov models for detecting remote protein homologies. Bioinformatics (1998) 21.29

Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology. Comput Appl Biosci (1996) 19.74

Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A (2000) 19.39

Determinants of a protein fold. Unique features of the globin amino acid sequences. J Mol Biol (1987) 18.58

The relation between the divergence of sequence and structure in proteins. EMBO J (1986) 16.66

Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol (2001) 15.97

SCOP: a structural classification of proteins database. Nucleic Acids Res (2000) 14.14

Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics (2000) 13.33

Comparative analyses of multi-species sequences from targeted genomic regions. Nature (2003) 13.31

A flexible motif search technique based on generalized profiles. Comput Chem (1996) 13.22

The UCSC genome browser database: update 2007. Nucleic Acids Res (2006) 13.04

The atomic structure of protein-protein recognition sites. J Mol Biol (1999) 12.63

A physical map of the human genome. Nature (2001) 12.39

Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res (2003) 12.26

Volume changes in protein evolution. J Mol Biol (1994) 12.07

Ensembl 2004. Nucleic Acids Res (2004) 11.88

Improved splice site detection in Genie. J Comput Biol (1997) 11.57

A generalized hidden Markov model for the recognition of human genes in DNA. Proc Int Conf Intell Syst Mol Biol (1996) 11.45

The UCSC Genome Browser Database: update 2006. Nucleic Acids Res (2006) 11.05

Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature (2001) 10.96

Using Dirichlet mixture priors to derive hidden Markov models for protein families. Proc Int Conf Intell Syst Mol Biol (1993) 10.73

The UCSC Genome Browser Database: update 2009. Nucleic Acids Res (2008) 10.31

Assembly of the working draft of the human genome with GigAssembler. Genome Res (2001) 8.23

Proteins. One thousand families for the molecular biologist. Nature (1992) 7.83

The vertebrate genome annotation (Vega) database. Nucleic Acids Res (2007) 7.53

Genie--gene finding in Drosophila melanogaster. Genome Res (2000) 7.47

The DNA sequence and comparative analysis of human chromosome 20. Nature (2002) 7.40

Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Natl Acad Sci U S A (1998) 7.18

The Vertebrate Genome Annotation (Vega) database. Nucleic Acids Res (2005) 7.06

Principles of protein-protein recognition. Nature (1975) 6.50

Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res (1994) 6.29

Rapid discrimination among individual DNA hairpin molecules at single-nucleotide resolution using an ion channel. Nat Biotechnol (2001) 6.07

Integrating database homology in a probabilistic gene structure model. Pac Symp Biocomput (1997) 5.96

Structural patterns in globular proteins. Nature (1976) 5.95

The nature of the accessible and buried surfaces in proteins. J Mol Biol (1976) 5.92

How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. J Mol Biol (1980) 5.85

Hydrophobic bonding and accessible surface area in proteins. Nature (1974) 5.69

Structural invariants in protein folding. Nature (1975) 5.60

A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res (1994) 5.50

Interior and surface of monomeric proteins. J Mol Biol (1987) 5.42

Structural mechanisms for domain movements in proteins. Biochemistry (1994) 5.36

Surface, subunit interfaces and interior of oligomeric proteins. J Mol Biol (1988) 4.80

The DNA sequence and analysis of human chromosome 6. Nature (2003) 4.75

Characterization of Escherichia coli mutants tolerant to bacteriocin JF246: two new classes of tolerant mutants. J Bacteriol (1973) 4.57

The structure of protein-protein recognition sites. J Biol Chem (1990) 4.55

Open annotation offers a democratic solution to genome sequencing. Nature (2000) 4.48

RNA modeling using Gibbs sampling and stochastic context free grammars. Proc Int Conf Intell Syst Mol Biol (1994) 4.32

Transcriptome and genome conservation of alternative splicing events in humans and mice. Pac Symp Biocomput (2004) 4.25

Helix to helix packing in proteins. J Mol Biol (1981) 4.24

Understanding protein structure: using scop for fold interpretation. Methods Enzymol (1996) 4.22

Principles that determine the structure of proteins. Annu Rev Biochem (1984) 4.15

Haemoglobin: the structural changes related to ligand binding and its allosteric mechanism. J Mol Biol (1979) 3.79

Many of the immunoglobulin superfamily domains in cell adhesion molecules and surface receptors belong to a new structural set which is close to that containing variable domains. J Mol Biol (1994) 3.57

Volume changes on protein folding. Structure (1994) 3.54

Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res (1998) 3.54

The packing density in proteins: standard radii and volumes. J Mol Biol (1999) 3.48

Standard conformations for the canonical structures of immunoglobulins. J Mol Biol (1997) 3.42

Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae. RNA (1999) 3.40

Conformation of twisted beta-pleated sheets in proteins. J Mol Biol (1973) 3.15

Intermediate sequences increase the detection of homology between sequences. J Mol Biol (1997) 2.98

The share of human genomic DNA under selection estimated from human-mouse genomic alignments. Cold Spring Harb Symp Quant Biol (2003) 2.83

The accessible surface area and stability of oligomeric proteins. Nature (1987) 2.77

Critical assessment of methods of protein structure prediction (CASP): round III. Proteins (1999) 2.77

Optimally parsing a sequence into different classes based on multiple types of evidence. Proc Int Conf Intell Syst Mol Biol (1994) 2.73

Analysis and assessment of ab initio three-dimensional prediction, secondary structure, and contacts prediction. Proteins (1999) 2.71

Further clinical clarification of the muscle dysfunction in cervical headache. Cephalalgia (1999) 2.65

Cadherin superfamily proteins in Caenorhabditis elegans and Drosophila melanogaster. J Mol Biol (2001) 2.64

Structure of proteins: packing of alpha-helices and pleated sheets. Proc Natl Acad Sci U S A (1977) 2.60

RSDB: representative protein sequence databases have high information content. Bioinformatics (2000) 2.55

Evolution of proteins formed by beta-sheets. II. The core of the immunoglobulin domains. J Mol Biol (1982) 2.47

beta-Trefoil fold. Patterns of structure and sequence in the Kunitz inhibitors interleukins-1 beta and 1 alpha and fibroblast growth factors. J Mol Biol (1992) 2.45

SCOP: a structural classification of proteins database. Nucleic Acids Res (1997) 2.44

The DNA sequence and biological annotation of human chromosome 1. Nature (2006) 2.42

A discriminative framework for detecting remote protein homologies. J Comput Biol (2000) 2.30

Using the Fisher kernel method to detect remote protein homologies. Proc Int Conf Intell Syst Mol Biol (1999) 2.28

Optimizing reduced-space sequence analysis. Bioinformatics (2000) 2.21

SCOP: a Structural Classification of Proteins database. Nucleic Acids Res (1999) 2.20

Predicting protein structure using hidden Markov models. Proteins (1997) 2.20

Telerheumatology: an idea whose time has come. Intern Med J (2012) 2.18

Critical assessment of methods of protein structure prediction (CASP): round IV. Proteins (2001) 2.16

The evolution and structural anatomy of the small molecule metabolic pathways in Escherichia coli. J Mol Biol (2001) 2.15