Gene prediction in metagenomic fragments: a large scale machine learning approach.

PubWeight™: 1.64‹?› | Rank: Top 3%

🔗 View Article (PMC 2409338)

Published in BMC Bioinformatics on April 28, 2008

Authors

Katharina J Hoff1, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern, Peter Meinicke

Author Affiliations

1: Abteilung Bioinformatik, Georg-August-Universität Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany. katharina@gobics.de

Articles citing this

Unlocking short read sequencing for metagenomics. PLoS One (2010) 4.88

Ab initio gene identification in metagenomic sequences. Nucleic Acids Res (2010) 4.47

A primer on metagenomics. PLoS Comput Biol (2010) 4.40

FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res (2010) 4.16

Analysis and comparison of very large metagenomes with fast clustering and functional annotation. BMC Bioinformatics (2009) 1.90

The effect of sequencing errors on metagenomic gene prediction. BMC Genomics (2009) 1.88

Current opportunities and challenges in microbial metagenome analysis--a bioinformatic perspective. Brief Bioinform (2012) 1.50

Orphelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res (2009) 1.43

Metagenomics: Facts and Artifacts, and Computational Challenges* J Comput Sci Technol (2009) 1.35

Combining gene prediction methods to improve metagenomic gene annotation. BMC Bioinformatics (2011) 1.06

Functional viral metagenomics and the next generation of molecular tools. Trends Microbiol (2009) 1.06

UFO: a web server for ultra-fast functional profiling of whole genome protein sequences. BMC Genomics (2009) 1.05

CoMet--a web server for comparative functional profiling of metagenomes. Nucleic Acids Res (2011) 1.03

Signal processing for metagenomics: extracting information from the soup. Curr Genomics (2009) 1.02

UProC: tools for ultra-fast protein domain classification. Bioinformatics (2014) 0.98

Short-read reading-frame predictors are not created equal: sequence error causes loss of signal. BMC Bioinformatics (2012) 0.84

Giardia lamblia transcriptome analysis using TSS-Seq and RNA-Seq. PLoS One (2013) 0.81

Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinformatics (2013) 0.80

AlexSys: a knowledge-based expert system for multiple sequence alignment construction and analysis. Nucleic Acids Res (2010) 0.79

MGC: a metagenomic gene caller. BMC Bioinformatics (2013) 0.75

Host-Microbiome Interaction and Cancer: Potential Application in Precision Medicine. Front Physiol (2016) 0.75

Articles cited by this

DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A (1977) 790.54

Basic local alignment search tool. J Mol Biol (1990) 659.07

Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev (1995) 62.96

Improved microbial gene identification with GLIMMER. Nucleic Acids Res (1999) 51.34

Environmental genome shotgun sequencing of the Sargasso Sea. Science (2004) 45.23

Comparative metagenomics of microbial communities. Science (2005) 25.88

GenBank. Nucleic Acids Res (2007) 25.54

GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res (1998) 25.21

A sequencing method based on real-time pyrophosphate. Science (1998) 21.21

Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature (2004) 20.20

The uncultured microbial majority. Annu Rev Microbiol (2003) 17.33

Metagenomics: application of genomics to uncultured microorganisms. Microbiol Mol Biol Rev (2004) 10.67

EcoGene: a genome sequence database for Escherichia coli K-12. Nucleic Acids Res (2000) 9.67

Exploring prokaryotic diversity in the genomic era. Genome Biol (2002) 9.42

Heuristic approach to deriving models for gene finding. Nucleic Acids Res (1999) 8.39

Using pyrosequencing to shed light on deep mine microbial ecology. BMC Genomics (2006) 7.65

MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res (2006) 6.64

Metagenomics: genomic analysis of microbial communities. Annu Rev Genet (2004) 6.27

A probabilistic method for identifying start codons in bacterial genomes. Bioinformatics (2001) 5.40

Microbial diversity and function in soil: from genes to ecosystems. Curr Opin Microbiol (2002) 5.09

Large-scale prokaryotic gene prediction and comparison to genome annotation. Bioinformatics (2005) 4.25

Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comput Biol (2005) 3.39

The metagenomics of soil. Nat Rev Microbiol (2005) 3.24

TICO: a tool for improving predictions of prokaryotic translation initiation sites. Bioinformatics (2005) 3.03

GS-Finder: a program to find bacterial gene start sites with a self-training method. Int J Biochem Cell Biol (2004) 3.00

Finding novel genes in bacterial communities isolated from the environment. Bioinformatics (2006) 2.55

An unsupervised classification scheme for improving predictions of prokaryotic TIS. BMC Bioinformatics (2006) 1.87

Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters. Bioinformatics (2002) 1.85

Prospecting for novel biocatalysts in a soil metagenome. Appl Environ Microbiol (2003) 1.62

The soil metagenome--a rich resource for the discovery of novel natural products. Curr Opin Biotechnol (2004) 1.37

Starts of bacterial genes: estimating the reliability of computer predictions. Gene (1999) 1.11

Prospecting for biocatalysts and drugs in the genomes of non-cultured microorganisms. Curr Opin Biotechnol (2004) 1.07

TICO: a tool for postprocessing the predictions of prokaryotic translation initiation sites. Nucleic Acids Res (2006) 0.89

Articles by these authors

The genome of the model beetle and pest Tribolium castaneum. Nature (2008) 6.50

Phylogenomics revives traditional views on deep animal relationships. Curr Biol (2009) 5.89

AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res (2004) 5.53

Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics (2006) 5.23

Comparative analysis of the complete genome sequence of the plant growth-promoting bacterium Bacillus amyloliquefaciens FZB42. Nat Biotechnol (2007) 4.66

AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res (2005) 4.13

AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res (2006) 4.11

Genome sequence analyses of two isolates from the recent Escherichia coli outbreak in Germany reveal the emergence of a new pathotype: Entero-Aggregative-Haemorrhagic Escherichia coli (EAHEC). Arch Microbiol (2011) 3.55

Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinformatics (2006) 3.15

TICO: a tool for improving predictions of prokaryotic translation initiation sites. Bioinformatics (2005) 3.03

Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics (2003) 2.94

Fast and sensitive alignment of large genomic sequences. Proc IEEE Comput Soc Bioinform Conf (2002) 2.84

Pyrosequencing-based assessment of bacterial community structure along different management types in German forest and grassland soils. PLoS One (2011) 2.69

Metabolic priming by a secreted fungal effector. Nature (2011) 2.56

AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol (2006) 2.39

A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes. BMC Bioinformatics (2006) 2.25

Metagenomic analyses: past and future trends. Appl Environ Microbiol (2010) 2.21

Horizon-specific bacterial community composition of German grassland soils, as revealed by pyrosequencing-based analysis of 16S rRNA genes. Appl Environ Microbiol (2010) 2.18

BCI Competition 2003--Data set IIb: support vector machines for the P300 speller paradigm. IEEE Trans Biomed Eng (2004) 2.14

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol (2008) 2.07

Rhizobium sp. strain NGR234 possesses a remarkable number of secretion systems. Appl Environ Microbiol (2009) 2.07

The complete genome sequence of the algal symbiont Dinoroseobacter shibae: a hitchhiker's guide to life in the sea. ISME J (2009) 2.00

The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences. Nucleic Acids Res (2004) 1.88

DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics (2005) 1.87

An unsupervised classification scheme for improving predictions of prokaryotic TIS. BMC Bioinformatics (2006) 1.87

Is autoinducer-2 a universal signal for interspecies communication: a comparative genomic and phylogenetic analysis of the synthesis and signal transduction pathways. BMC Evol Biol (2004) 1.86

Advances in recovery of novel biocatalysts from metagenomes. J Mol Microbiol Biotechnol (2008) 1.78

Phylogenetic diversity and metabolic potential revealed in a glacier ice metagenome. Appl Environ Microbiol (2009) 1.76

The role of recombination in the emergence of a complex and dynamic HIV epidemic. Retrovirology (2010) 1.69

jpHMM: improving the reliability of recombination prediction in HIV-1. Nucleic Acids Res (2009) 1.61

Mixture models for analysis of the taxonomic composition of metagenomes. Bioinformatics (2011) 1.59

jpHMM at GOBICS: a web server to detect genomic recombinations in HIV-1. Nucleic Acids Res (2006) 1.58

An ancient pathway combining carbon dioxide fixation with the generation and utilization of a sodium ion gradient for ATP synthesis. PLoS One (2012) 1.57

ICEPmu1, an integrative conjugative element (ICE) of Pasteurella multocida: analysis of the regions that comprise 12 antimicrobial resistance genes. J Antimicrob Chemother (2011) 1.54

Host imprints on bacterial genomes--rapid, divergent evolution in individual patients. PLoS Pathog (2010) 1.54

Comparative genomics and transcriptomics of Propionibacterium acnes. PLoS One (2011) 1.52

Treephyler: fast taxonomic profiling of metagenomes. Bioinformatics (2010) 1.50

Orphelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res (2009) 1.43

Reassessment of the Listeria monocytogenes pan-genome reveals dynamic integration hotspots and mobile genetic elements as major components of the accessory genome. BMC Genomics (2013) 1.43

The genome of the ammonia-oxidizing Candidatus Nitrososphaera gargensis: insights into metabolic versatility and environmental adaptations. Environ Microbiol (2012) 1.42

Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites. BMC Bioinformatics (2004) 1.42

First Insights into the Genome of the Gram-Negative, Endospore-Forming Organism Sporomusa ovata Strain H1 DSM 2662. Genome Announc (2013) 1.40

Interannual variation in land-use intensity enhances grassland multidiversity. Proc Natl Acad Sci U S A (2013) 1.40

Achievements and new knowledge unraveled by metagenomic approaches. Appl Microbiol Biotechnol (2009) 1.39

Sequence of the hyperplastic genome of the naturally competent Thermus scotoductus SA-01. BMC Genomics (2011) 1.33

Phaeobacter gallaeciensis genomes from globally opposite locations reveal high similarity of adaptation to surface life. ISME J (2012) 1.31

OrthoSelect: a protocol for selecting orthologous groups in phylogenomics. BMC Bioinformatics (2009) 1.29

Complete genome sequence of the type strain Cupriavidus necator N-1. J Bacteriol (2011) 1.27

Phylogenetic analysis of a microbialite-forming microbial mat from a hypersaline lake of the Kiritimati atoll, Central Pacific. PLoS One (2013) 1.27

AGenDA: gene prediction by comparative sequence analysis. In Silico Biol (2002) 1.26

General relationships between abiotic soil properties and soil biota across spatial scales and different land-use types. PLoS One (2012) 1.25

Remote homology detection based on oligomer distances. Bioinformatics (2006) 1.24

Comparative analysis of plasmids in the genus Listeria. PLoS One (2010) 1.22

The Janthinobacterium sp. HH01 genome encodes a homologue of the V. cholerae CqsA and L. pneumophila LqsA autoinducer synthases. PLoS One (2013) 1.20

Sequencing, annotation, and comparative genome analysis of the gerbil-adapted Helicobacter pylori strain B8. BMC Genomics (2010) 1.18

Involvement of two latex-clearing proteins during rubber degradation and insights into the subsequent degradation pathway revealed by the genome sequence of Gordonia polyisoprenivorans strain VH2. Appl Environ Microbiol (2012) 1.15

Rapid identification of genes encoding DNA polymerases by function-based screening of metagenomic libraries derived from glacial ice. Appl Environ Microbiol (2009) 1.14

Construction and screening of metagenomic libraries derived from enrichment cultures: generation of a gene bank for genes conferring alcohol oxidoreductase activity on Escherichia coli. Appl Environ Microbiol (2003) 1.13

Complete Genome Sequence of Mannheimia haemolytica Strain 42548 from a Case of Bovine Respiratory Disease. Genome Announc (2013) 1.13

ICEPmu1, an integrative conjugative element (ICE) of Pasteurella multocida: structure and transfer. J Antimicrob Chemother (2011) 1.13

Characterization and optimization of Bacillus subtilis ATCC 6051 as an expression host. J Biotechnol (2012) 1.12

Identification and characterization of coenzyme B12-dependent glycerol dehydratase- and diol dehydratase-encoding genes from metagenomic DNA libraries derived from enrichment cultures. Appl Environ Microbiol (2003) 1.09

Identification of novel plant peroxisomal targeting signals by a combination of machine learning methods and in vivo subcellular targeting analyses. Plant Cell (2011) 1.09

Impact of a phytoplankton bloom on the diversity of the active bacterial community in the southern North Sea as revealed by metatranscriptomic approaches. FEMS Microbiol Ecol (2013) 1.08

AGenDA: homology-based gene prediction. Bioinformatics (2003) 1.08

Poles apart: Arctic and Antarctic Octadecabacter strains share high genome plasticity and a new type of xanthorhodopsin. PLoS One (2013) 1.08

Genome sequence of Brevibacillus laterosporus LMG 15441, a pathogen of invertebrates. J Bacteriol (2011) 1.07

Prospecting for biocatalysts and drugs in the genomes of non-cultured microorganisms. Curr Opin Biotechnol (2004) 1.07

MolabIS--an integrated information system for storing and managing molecular genetics data. BMC Bioinformatics (2011) 1.05

Genome sequence of Paenibacillus alvei DSM 29, a secondary invader during European foulbrood outbreaks. J Bacteriol (2012) 1.05

Enrichment of chitinolytic microorganisms: isolation and characterization of a chitinase exhibiting antifungal activity against phytopathogenic fungi from a novel Streptomyces strain. Appl Microbiol Biotechnol (2004) 1.03

CoMet--a web server for comparative functional profiling of metagenomes. Nucleic Acids Res (2011) 1.03

Insights into the genome of the enteric bacterium Escherichia blattae: cobalamin (B12) biosynthesis, B12-dependent reactions, and inactivation of the gene region encoding B12-dependent glycerol dehydratase by a new mu-like prophage. J Mol Microbiol Biotechnol (2004) 1.02

Microbial diversity and biochemical potential encoded by thermal spring metagenomes derived from the Kamchatka Peninsula. Archaea (2013) 1.01

AGenDA: gene prediction by cross-species sequence comparison. Nucleic Acids Res (2004) 1.00

Complete genome sequence and metabolic potential of the quinaldine-degrading bacterium Arthrobacter sp. Rue61a. BMC Genomics (2012) 1.00

New mode of energy metabolism in the seventh order of methanogens as revealed by comparative genome analysis of “Candidatus methanoplasma termitum”. Appl Environ Microbiol (2015) 1.00

A novel metagenomic short-chain dehydrogenase/reductase attenuates Pseudomonas aeruginosa biofilm formation and virulence on Caenorhabditis elegans. PLoS One (2011) 1.00

jpHMM: recombination analysis in viruses with circular genomes such as the hepatitis B virus. Nucleic Acids Res (2012) 0.99

Physiological homogeneity among the endosymbionts of Riftia pachyptila and Tevnia jerichonana revealed by proteogenomics. ISME J (2011) 0.99

Verticillium transcription activator of adhesion Vta2 suppresses microsclerotia formation and is required for systemic infection of plant roots. New Phytol (2014) 0.98

Comparative genome analysis and genome-guided physiological analysis of Roseobacter litoralis. BMC Genomics (2011) 0.98

Word correlation matrices for protein sequence analysis and remote homology detection. BMC Bioinformatics (2008) 0.97

MarVis: a tool for clustering and visualization of metabolic biomarkers. BMC Bioinformatics (2009) 0.97

Metagenomes of complex microbial consortia derived from different soils as sources for novel genes conferring formation of carbonyls from short-chain polyols on Escherichia coli. J Mol Microbiol Biotechnol (2003) 0.97

Divide-and-conquer multiple alignment with segment-based constraints. Bioinformatics (2003) 0.96

The COP9 signalosome mediates transcriptional and metabolic response to hormones, oxidative stress protection and cell wall rearrangement during fungal development. Mol Microbiol (2010) 0.96

Isolation and characterization of metalloproteases with a novel domain structure by construction and screening of metagenomic libraries. Appl Environ Microbiol (2009) 0.95

The purine-utilizing bacterium Clostridium acidurici 9a: a genome-guided metabolic reconsideration. PLoS One (2012) 0.95

High abundance of heterotrophic prokaryotes in hydrothermal springs of the Azores as revealed by a network of 16S rRNA gene-based methods. Extremophiles (2013) 0.95

Genome-guided analysis of physiological and morphological traits of the fermentative acetate oxidizer Thermacetogenium phaeum. BMC Genomics (2012) 0.94

Complete genome sequence of the broad-host-range strain Sinorhizobium fredii USDA257. J Bacteriol (2012) 0.94