A comparison of profile hidden Markov model procedures for remote homology detection.

PubWeight™: 3.76^‹?› | Rank: Top 1%

🔗 View Article (PMC 140544)

Published in Nucleic Acids Res on October 01, 2002

Authors

Martin Madera¹, Julian Gough

Author Affiliations

1: MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK. mm238@mrc-lmb.cam.ac.uk

Articles citing this ↘

Accelerated Profile HMM Searches. PLoS Comput Biol (2011) 15.22

The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res (2004) 6.72

Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. Genome Res (2006) 6.59

A probabilistic model of local sequence alignment that simplifies statistical significance estimation. PLoS Comput Biol (2008) 5.12

Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics (2010) 4.48

The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Res (2006) 4.27

Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat (2012) 3.60

topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association. Nucleic Acids Res (2004) 3.50

SUPERFAMILY 1.75 including a domain-centric gene ontology method. Nucleic Acids Res (2010) 3.34

Profile Comparer: a program for scoring and aligning profile hidden Markov models. Bioinformatics (2008) 2.94

Evolution of transcription factors and the gene regulatory network in Escherichia coli. Nucleic Acids Res (2003) 2.87

Alignment of protein sequences by their profiles. Protein Sci (2004) 2.80

MADS-box gene family in rice: genome-wide identification, organization and expression profiling during reproductive development and stress. BMC Genomics (2007) 2.19

DILIMOT: discovery of linear motifs in proteins. Nucleic Acids Res (2006) 2.17

The limits of protein sequence comparison? Curr Opin Struct Biol (2005) 1.57

A gold standard set of mechanistically diverse enzyme superfamilies. Genome Biol (2006) 1.52

Genomic scale sub-family assignment of protein domains. Nucleic Acids Res (2006) 1.42

Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER. BMC Bioinformatics (2005) 1.33

The identification of complete domains within protein sequences using accurate E-values for semi-global alignment. Nucleic Acids Res (2007) 1.24

Assessing strategies for improved superfamily recognition. Protein Sci (2005) 1.21

Evolutionary diversity of vertebrate small heat shock proteins. J Mol Evol (2004) 1.20

eShadow: a tool for comparing closely related sequences. Genome Res (2004) 1.18

Identification, phylogeny, and transcript profiling of ERF family genes during development and abiotic stress treatments in tomato. Mol Genet Genomics (2010) 1.17

Curli functional amyloid systems are phylogenetically widespread and display large diversity in operon and protein structure. PLoS One (2012) 1.14

GeneSpeed: protein domain organization of the transcriptome. Nucleic Acids Res (2006) 1.07

Divergent evolution within protein superfolds inferred from profile-based phylogenetics. J Mol Biol (2005) 1.04

Organization and evolution of two SIDER retroposon subfamilies and their impact on the Leishmania genome. BMC Genomics (2009) 1.04

Statistical limits to the identification of ion channel domains by sequence similarity. J Gen Physiol (2006) 1.02

The solute carrier families have a remarkably long evolutionary history with the majority of the human families present before divergence of Bilaterian species. Mol Biol Evol (2010) 0.97

A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction. Sci Rep (2013) 0.97

In silico identification of specialized secretory-organelle proteins in apicomplexan parasites and in vivo validation in Toxoplasma gondii. PLoS One (2008) 0.95

Loss of genetic redundancy in reductive genome evolution. PLoS Comput Biol (2011) 0.94

Improving protein secondary structure prediction using a simple k-mer model. Bioinformatics (2010) 0.93

Evolutionary insight into the functional amyloids of the pseudomonads. PLoS One (2013) 0.92

TnpPred: A Web Service for the Robust Prediction of Prokaryotic Transposases. Comp Funct Genomics (2012) 0.91

Improving model construction of profile HMMs for remote homology detection through structural alignment. BMC Bioinformatics (2007) 0.90

The effectiveness of position- and composition-specific gap costs for protein similarity searches. Bioinformatics (2008) 0.90

Profile hidden Markov models for the detection of viruses within metagenomic sequence data. PLoS One (2014) 0.89

webPRC: the Profile Comparer for alignment-based searching of public domain databases. Nucleic Acids Res (2009) 0.88

Expression dynamics of metabolic and regulatory components across stages of panicle and seed development in indica rice. Funct Integr Genomics (2012) 0.85

AlignHUSH: alignment of HMMs using structure and hydrophobicity information. BMC Bioinformatics (2011) 0.83

Recursive protein modeling: a divide and conquer strategy for Protein Structure Prediction and its case study in CASP9. J Bioinform Comput Biol (2012) 0.82

PRODOC: a resource for the comparison of tethered protein domain architectures with in-built information on remotely related domain families. Nucleic Acids Res (2005) 0.82

Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development. J Comput Aided Mol Des (2009) 0.82

The evolution of human cells in terms of protein innovation. Mol Biol Evol (2014) 0.82

Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition. PLoS One (2010) 0.81

HHsvm: fast and accurate classification of profile-profile matches identified by HHsearch. Bioinformatics (2009) 0.80

A Profile Hidden Markov Model to investigate the distribution and frequency of LanB-encoding lantibiotic modification genes in the human oral and gut microbiome. PeerJ (2017) 0.77

Using phylogeny to improve genome-wide distant homology recognition. PLoS Comput Biol (2006) 0.76

A sequence sub-sampling algorithm increases the power to detect distant homologues. Nucleic Acids Res (2005) 0.75

Accelerating Information Retrieval from Profile Hidden Markov Model Databases. PLoS One (2016) 0.75

EVI1 promotes cell proliferation in HBx-induced hepatocarcinogenesis as a critical transcription factor regulating lncRNAs. Oncotarget (2016) 0.75

Analysis of triglyceride synthesis unveils a green algal soluble diacylglycerol acyltransferase and provides clues to potential enzymatic components of the chloroplast pathway. BMC Genomics (2017) 0.75

Resurrecting the Dead (Molecules). Comput Struct Biotechnol J (2017) 0.75

Articles cited by this ↗

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res (1994) 392.47

The Protein Data Bank. Nucleic Acids Res (2000) 187.10

SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol (1995) 74.88

Profile hidden Markov models. Bioinformatics (1998) 56.04

The Pfam protein families database. Nucleic Acids Res (2000) 42.28

Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol (1994) 31.57

Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res (2001) 22.33

Hidden Markov models for detecting remote protein homologies. Bioinformatics (1998) 21.29

Determinants of a protein fold. Unique features of the globin amino acid sequences. J Mol Biol (1987) 18.58

Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol (2001) 15.97

Hidden Markov models. Curr Opin Struct Biol (1996) 11.56

The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res (2000) 11.38

Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol (1998) 9.09

A new approach to protein fold recognition. Nature (1992) 8.17

Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Natl Acad Sci U S A (1998) 7.18

Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput Appl Biosci (1996) 4.58

Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics (1998) 3.47

Identification of related proteins on family, superfamily and fold level. J Mol Biol (2000) 3.24

Phylogenetic information improves homology detection. Proteins (2001) 1.01

Articles by these authors ⇆

InterPro: the integrative protein signature database. Nucleic Acids Res (2008) 25.07

InterPro, progress and status in 2005. Nucleic Acids Res (2005) 17.53

InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res (2011) 13.45

New developments in the InterPro database. Nucleic Acids Res (2007) 12.49

The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res (2004) 6.72

A promoter-level mammalian expression atlas. Nature (2014) 6.25

An atlas of combinatorial transcriptional regulation in mouse and man. Cell (2010) 6.24

The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nat Genet (2009) 6.02

A large-scale evaluation of computational protein function prediction. Nat Methods (2013) 4.61

The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Res (2006) 4.27

Evolution of the protein repertoire. Science (2003) 4.24

Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat (2012) 3.60

SUPERFAMILY 1.75 including a domain-centric gene ontology method. Nucleic Acids Res (2010) 3.34

SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res (2008) 3.07

Mouse proteome analysis. Genome Res (2003) 2.69

Classification of intrinsically disordered regions and proteins. Chem Rev (2014) 2.48

Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs. PLoS Genet (2006) 2.48

Development and evaluation of an automated annotation pipeline and cDNA annotation system. Genome Res (2003) 2.15

Predicting the functional consequences of cancer-associated amino acid substitutions. Bioinformatics (2013) 2.12

Supra-domains: evolutionary units larger than single protein domains. J Mol Biol (2004) 1.82

Distinguishing protein-coding from non-coding RNAs through support vector machines. PLoS Genet (2006) 1.81

A database of bacterial lipoproteins (DOLOP) with functional assignments to predicted lipoproteins. J Bacteriol (2006) 1.76

D²P²: database of disordered protein predictions. Nucleic Acids Res (2012) 1.75

A phylogenomic profile of globins. BMC Evol Biol (2006) 1.64

Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains. Nucleic Acids Res (2012) 1.63

DcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more. Nucleic Acids Res (2012) 1.47

TreeVector: scalable, interactive, phylogenetic trees for the web. PLoS One (2010) 1.45

Functional map and domain structure of MET, the product of the c-met protooncogene and receptor for hepatocyte growth factor/scatter factor. Proc Natl Acad Sci U S A (2003) 1.42

The evolution and structure prediction of coiled coils across all genomes. J Mol Biol (2010) 1.30

Genomic and structural aspects of protein evolution. Biochem J (2009) 1.25

Three globin lineages belonging to two structural classes in genomes from the three kingdoms of life. Proc Natl Acad Sci U S A (2005) 1.22

A daily-updated tree of (sequenced) life as a reference for genome research. Sci Rep (2013) 1.22

Classification, expression pattern, and E3 ligase activity assay of rice U-box-containing proteins. Mol Plant (2008) 1.07

Are viruses a source of new protein folds for organisms? - Virosphere structure space and evolution. Bioessays (2011) 1.01

Ranking non-synonymous single nucleotide polymorphisms based on disease concepts. Hum Genomics (2014) 0.98

A domain-centric solution to functional genomics via dcGO Predictor. BMC Bioinformatics (2013) 0.95

Improving protein secondary structure prediction using a simple k-mer model. Bioinformatics (2010) 0.93

Evolutionarily consistent families in SCOP: sequence, structure and function. BMC Struct Biol (2012) 0.92

Expression and in silico structural analysis of a rice (Oryza sativa) hemoglobin 5. Plant Physiol Biochem (2008) 0.90

A disease-drug-phenotype matrix inferred by walking on a functional domain network. Mol Biosyst (2013) 0.88

Proteins with class alpha/beta fold have high-level participation in fusion events. J Mol Biol (2002) 0.86

Sequences and topology: intrinsic disorder in the evolving universe of protein structure. Curr Opin Struct Biol (2011) 0.85

Comparison of the small molecule metabolic enzymes of Escherichia coli and Saccharomyces cerevisiae. Genome Res (2002) 0.85

Evolution of eukaryotic genome architecture: Insights from the study of a rapidly evolving metazoan, Oikopleura dioica: Non-adaptive forces such as elevated mutation rates may influence the evolution of genome architecture. Bioessays (2011) 0.81

Sequences and topology: disorder, modularity, and post/pre translation modification. Curr Opin Struct Biol (2013) 0.76

A proteome quality index. Environ Microbiol (2014) 0.75

DGEclust: differential expression analysis of clustered count data. Genome Biol (2015) 0.75