MetaSim: a sequencing simulator for genomics and metagenomics.

PubWeight™: 6.54‹?› | Rank: Top 1%

🔗 View Article (PMC 2556396)

Published in PLoS One on October 08, 2008

Authors

Daniel C Richter1, Felix Ott, Alexander F Auch, Ramona Schmid, Daniel H Huson

Author Affiliations

1: ZBIT- Center for Bioinformatics Tübingen, University of Tübingen, Tübingen, Germany. drichter@informatik.uni-tuebingen.de

Articles citing this

(truncated to the top 100)

Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res (2011) 8.38

BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC Genomics (2011) 7.27

Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature (2012) 5.67

ART: a next-generation sequencing read simulator. Bioinformatics (2011) 5.13

Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res (2012) 4.81

A primer on metagenomics. PLoS Comput Biol (2010) 4.40

FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res (2010) 4.16

Characteristics of 454 pyrosequencing data--enabling realistic simulation with flowsim. Bioinformatics (2010) 4.08

Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J Comput Biol (2011) 2.85

Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion. Proc Natl Acad Sci U S A (2011) 2.72

PhylOTU: a high-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data. PLoS Comput Biol (2011) 2.54

LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res (2012) 2.46

Bambus 2: scaffolding metagenomes. Bioinformatics (2011) 2.24

GemSIM: general, error-model based simulator of next-generation sequencing data. BMC Genomics (2012) 2.21

Grinder: a versatile amplicon and shotgun sequence simulator. Nucleic Acids Res (2012) 2.10

Analysis and comparison of very large metagenomes with fast clustering and functional annotation. BMC Bioinformatics (2009) 1.90

The effect of sequencing errors on metagenomic gene prediction. BMC Genomics (2009) 1.88

Classification of DNA sequences using Bloom filters. Bioinformatics (2010) 1.87

Computational meta'omics for microbial community studies. Mol Syst Biol (2013) 1.84

Feature-by-feature--evaluating de novo sequence assembly. PLoS One (2012) 1.83

Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic Acids Res (2011) 1.81

Inference of isoforms from short sequence reads. J Comput Biol (2011) 1.72

Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics (2013) 1.64

MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome (2014) 1.63

Accurate genome relative abundance estimation based on shotgun metagenomic reads. PLoS One (2011) 1.62

Classifying short genomic fragments from novel lineages using composition and homology. BMC Bioinformatics (2011) 1.60

Assessment of metagenomic assembly using simulated next generation sequencing data. PLoS One (2012) 1.57

Orphelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res (2009) 1.43

KISSPLICE: de-novo calling alternative splicing events from RNA-seq data. BMC Bioinformatics (2012) 1.42

Evaluating the fidelity of de novo short read metagenomic assembly using simulated data. PLoS One (2011) 1.39

Achievements and new knowledge unraveled by metagenomic approaches. Appl Microbiol Biotechnol (2009) 1.39

Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach. PLoS One (2012) 1.37

Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics. Nucleic Acids Res (2010) 1.34

MetaPathways: a modular pipeline for constructing pathway/genome databases from environmental sequence information. BMC Bioinformatics (2013) 1.28

Metatranscriptomic analyses of chlorophototrophs of a hot-spring microbial mat. ISME J (2011) 1.28

Alignment-free sequence comparison based on next-generation sequencing reads. J Comput Biol (2013) 1.26

A comparison of methods for clustering 16S rRNA sequences into OTUs. PLoS One (2013) 1.15

Functional assignment of metagenomic data: challenges and applications. Brief Bioinform (2012) 1.13

Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinform Biol Insights (2015) 1.13

Sigma: strain-level inference of genomes from metagenomic analysis for biosurveillance. Bioinformatics (2014) 1.12

On optimal pooling designs to identify rare variants through massive resequencing. Genet Epidemiol (2011) 1.12

Comparison of metagenomic samples using sequence signatures. BMC Genomics (2012) 1.10

Species identification and profiling of complex microbial communities using shotgun Illumina sequencing of 16S rRNA amplicon sequences. PLoS One (2013) 1.09

Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities. BMC Bioinformatics (2015) 1.09

Short clones or long clones? A simulation study on the use of paired reads in metagenomics. BMC Bioinformatics (2010) 1.07

Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut. BMC Genomics (2014) 1.07

DiScRIBinATE: a rapid method for accurate taxonomic classification of metagenomic sequences. BMC Bioinformatics (2010) 1.06

An efficient simulator of 454 data using configurable statistical models. BMC Res Notes (2011) 1.06

Evaluation of short read metagenomic assembly. BMC Genomics (2011) 1.03

Fast and accurate taxonomic assignments of metagenomic sequences using MetaBin. PLoS One (2012) 1.02

Signal processing for metagenomics: extracting information from the soup. Curr Genomics (2009) 1.02

Genometa--a fast and accurate classifier for short metagenomic shotgun reads. PLoS One (2012) 1.02

Separating metagenomic short reads into genomes via clustering. Algorithms Mol Biol (2012) 1.02

Bioinformatic approaches for functional annotation and pathway inference in metagenomics data. Brief Bioinform (2012) 1.02

Novel analysis of oceanic surface water metagenomes suggests importance of polyphosphate metabolism in oligotrophic environments. PLoS One (2011) 1.01

Full-length haplotype reconstruction to infer the structure of heterogeneous virus populations. Nucleic Acids Res (2014) 1.00

Reconstructing the genomic content of microbiome taxa through shotgun metagenomic deconvolution. PLoS Comput Biol (2013) 1.00

LOCAS--a low coverage assembly tool for resequencing projects. PLoS One (2011) 0.99

WGSQuikr: fast whole-genome shotgun metagenomic classification. PLoS One (2014) 0.97

Estimating DNA coverage and abundance in metagenomes using a gamma approximation. Bioinformatics (2009) 0.97

A non-negative matrix factorization framework for identifying modular patterns in metagenomic profile data. J Math Biol (2011) 0.97

Metagenomic survey for viruses in Western Arctic caribou, Alaska, through iterative assembly of taxonomic units. PLoS One (2014) 0.96

INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences. BMC Genomics (2011) 0.95

Metabolic pathways for the whole community. BMC Genomics (2014) 0.94

Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples. Genet Epidemiol (2012) 0.93

Rapid phylogenetic and functional classification of short genomic fragments with signature peptides. BMC Res Notes (2012) 0.93

Joint analysis of multiple metagenomic samples. PLoS Comput Biol (2012) 0.93

NeSSM: a Next-generation Sequencing Simulator for Metagenomics. PLoS One (2013) 0.93

Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front Genet (2015) 0.93

Protein signature-based estimation of metagenomic abundances including all domains of life and viruses. Bioinformatics (2013) 0.92

Flexible taxonomic assignment of ambiguous sequencing reads. BMC Bioinformatics (2011) 0.91

SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data. BMC Bioinformatics (2014) 0.90

Exploiting topic modeling to boost metagenomic reads binning. BMC Bioinformatics (2015) 0.89

ProViDE: A software tool for accurate estimation of viral diversity in metagenomic samples. Bioinformation (2011) 0.88

Consistency of metagenomic assignment programs in simulated and real data. BMC Bioinformatics (2014) 0.86

A de Bruijn graph approach to the quantification of closely-related genomes in a microbial community. J Comput Biol (2012) 0.86

XS: a FASTQ read simulator. BMC Res Notes (2014) 0.86

Assembly of non-unique insertion content using next-generation sequencing. BMC Bioinformatics (2011) 0.86

Genomic and metabolic comparison with Dickeya dadantii 3937 reveals the emerging Dickeya solani potato pathogen to display distinctive metabolic activities and T5SS/T6SS-related toxin repertoire. BMC Genomics (2014) 0.86

FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets. BMC Res Notes (2014) 0.85

Selection in coastal Synechococcus (cyanobacteria) populations evaluated from environmental metagenomes. PLoS One (2011) 0.85

MBBC: an efficient approach for metagenomic binning based on clustering. BMC Bioinformatics (2015) 0.85

Evaluation of viral genome assembly and diversity estimation in deep metagenomes. BMC Genomics (2014) 0.84

Oral spirochetes implicated in dental diseases are widespread in normal human subjects and carry extremely diverse integron gene cassettes. Appl Environ Microbiol (2012) 0.84

An extended genovo metagenomic assembler by incorporating paired-end information. PeerJ (2013) 0.84

Short-read reading-frame predictors are not created equal: sequence error causes loss of signal. BMC Bioinformatics (2012) 0.84

Exploration and retrieval of whole-metagenome sequencing samples. Bioinformatics (2014) 0.84

Alignment-free supervised classification of metagenomes by recursive SVM. BMC Genomics (2013) 0.84

Analysing complex Triticeae genomes - concepts and strategies. Plant Methods (2013) 0.84

Accurate genome relative abundance estimation for closely related species in a metagenomic sample. BMC Bioinformatics (2014) 0.83

Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics. Bioinformatics (2012) 0.83

A statistical framework for accurate taxonomic assignment of metagenomic sequencing reads. PLoS One (2012) 0.83

TIPP: taxonomic identification and phylogenetic profiling. Bioinformatics (2014) 0.82

A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads. Algorithms Mol Biol (2015) 0.82

SPA: a short peptide assembler for metagenomic data. Nucleic Acids Res (2013) 0.82

A new strategy for better genome assembly from very short reads. BMC Bioinformatics (2011) 0.82

Simulating a population genomics data set using FlowSim. BMC Res Notes (2014) 0.82

A better sequence-read simulator program for metagenomics. BMC Bioinformatics (2014) 0.82

On the power and the systematic biases of the detection of chromosomal inversions by paired-end genome sequencing. PLoS One (2013) 0.81

metaBEETL: high-throughput analysis of heterogeneous microbial populations from shotgun DNA sequences. BMC Bioinformatics (2013) 0.81

Articles cited by this

Genome sequencing in microfabricated high-density picolitre reactors. Nature (2005) 150.21

KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res (2000) 117.00

A genomic perspective on protein families. Science (1997) 50.51

Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol (1985) 47.78

Environmental genome shotgun sequencing of the Sargasso Sea. Science (2004) 45.23

An obesity-associated gut microbiome with increased capacity for energy harvest. Nature (2006) 44.35

Whole-genome re-sequencing. Curr Opin Genet Dev (2006) 35.24

The Pfam protein families database. Nucleic Acids Res (2007) 30.53

Metagenomic analysis of the human distal gut microbiome. Science (2006) 29.76

The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res (2005) 29.61

Comparative metagenomics of microbial communities. Science (2005) 25.88

MEGAN analysis of metagenomic data. Genome Res (2007) 25.29

The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol (2007) 23.58

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2007) 22.53

Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature (2004) 20.20

GenBank. Nucleic Acids Res (2005) 19.25

Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci (1997) 17.32

The TIGRFAMs database of protein families. Nucleic Acids Res (2003) 13.59

STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res (2005) 10.44

Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods (2006) 9.93

Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science (2005) 8.14

Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods (2007) 7.73

Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res (2001) 7.11

Dinucleotide relative abundance extremes: a genomic signature. Trends Genet (1995) 6.32

An application of statistics to comparative metagenomics. BMC Bioinformatics (2006) 6.29

Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res (2008) 5.12

TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics (2004) 4.67

Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. Mol Biol Evol (1999) 3.33

Automation for genomics, part one: preparation for sequencing. Genome Res (2000) 2.21

Automation for genomics, part two: sequencers, microarrays, and future trends. Genome Res (2000) 1.96

A dataset generator for whole genome shotgun sequencing. Proc Int Conf Intell Syst Mol Biol (1999) 1.32

GenFrag 2.1: new features for more robust fragment assembly benchmarks. Comput Appl Biosci (1994) 1.29

Interpreting the unculturable majority. Nat Methods (2007) 0.95

Articles by these authors

MEGAN analysis of metagenomic data. Genome Res (2007) 25.29

Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science (2007) 9.85

A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science (2002) 9.59

Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science (2005) 8.14

Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics (2013) 6.62

Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet (2011) 6.59

Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison. Stand Genomic Sci (2010) 6.39

Whole-genome shotgun assembly and comparison of human genome assemblies. Proc Natl Acad Sci U S A (2004) 6.08

Simultaneous assessment of soil microbial community structure and function through analysis of the meta-transcriptome. PLoS One (2008) 4.54

Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs). Nucleic Acids Res (2005) 4.15

OSLay: optimal syntenic layout of unfinished assemblies. Bioinformatics (2007) 4.10

Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs. Stand Genomic Sci (2010) 3.58

Reference-guided assembly of four diverse Arabidopsis thaliana genomes. Proc Natl Acad Sci U S A (2011) 2.67

The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus). Genome Res (2009) 2.30

Whole-genome prokaryotic phylogeny. Bioinformatics (2004) 2.26

Orchestration of the floral transition and floral development in Arabidopsis by the bifunctional transcription factor APETALA2. Plant Cell (2010) 2.10

Phylogenetic analysis of condensation domains in NRPS sheds light on their functional evolution. BMC Evol Biol (2007) 1.97

Visual and statistical comparison of metagenomes. Bioinformatics (2009) 1.94

Transposable elements and small RNAs contribute to gene expression divergence between Arabidopsis thaliana and Arabidopsis lyrata. Proc Natl Acad Sci U S A (2011) 1.82

Temperature-dependent regulation of flowering by antagonistic FLM variants. Nature (2013) 1.66

Is it worth paying more for emergency hormonal contraception? The cost-effectiveness of ulipristal acetate versus levonorgestrel 1.5 mg. J Fam Plann Reprod Health Care (2010) 1.55

Evolution of Arabidopsis thaliana microRNAs from random sequences. RNA (2008) 1.50

Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG. BMC Bioinformatics (2011) 1.48

Methods for comparative metagenomics. BMC Bioinformatics (2009) 1.45

Fast-forward genetics identifies plant CPL phosphatases as regulators of miRNA processing factor HYL1. Cell (2012) 1.42

Comparative analysis of four Campylobacterales. Nat Rev Microbiol (2004) 1.41

Identification of plant microRNA homologs. Bioinformatics (2005) 1.35

Constructing splits graphs. IEEE/ACM Trans Comput Biol Bioinform (2006) 1.33

Characterization of SOC1's central role in flowering by the identification of its upstream and downstream regulators. Plant Physiol (2012) 1.32

Drawing explicit phylogenetic networks and their integration into SplitsTree. BMC Evol Biol (2008) 1.30

CREST--classification resources for environmental sequence tags. PLoS One (2012) 1.28

COPYCAT: cophylogenetic analysis tool. Bioinformatics (2007) 1.25

Comparison of multiple metagenomes using phylogenetic networks based on ecological indices. ISME J (2010) 1.23

Prediction of regulatory interactions from genome sequences using a biophysical model for the Arabidopsis LEAFY transcription factor. Plant Cell (2011) 1.20

Genome-wide binding-site analysis of REVOLUTA reveals a link between leaf patterning and light-mediated growth responses. Plant J (2012) 1.16

A Clustering Optimization Strategy for Molecular Taxonomy Applied to Planktonic Foraminifera SSU rDNA. Evol Bioinform Online (2010) 1.12

Short clones or long clones? A simulation study on the use of paired reads in metagenomics. BMC Bioinformatics (2010) 1.07

Analysis of 16S rRNA environmental sequences using MEGAN. BMC Genomics (2011) 1.03

Fast computation of minimum hybridization networks. Bioinformatics (2011) 1.02

LOCAS--a low coverage assembly tool for resequencing projects. PLoS One (2011) 0.99

Comparative analysis of non-autonomous effects of tasiRNAs and miRNAs in Arabidopsis thaliana. Nucleic Acids Res (2010) 0.96

A comparison of RNA-seq and exon arrays for whole genome transcription profiling of the L5 spinal nerve transection model of neuropathic pain in the rat. Mol Pain (2014) 0.95

Tanglegrams for rooted phylogenetic trees and networks. Bioinformatics (2011) 0.94

Natural variation in biogenesis efficiency of individual Arabidopsis thaliana microRNAs. Curr Biol (2011) 0.92

AxPcoords & parallel AxParafit: statistical co-phylogenetic analyses on thousands of taxa. BMC Bioinformatics (2007) 0.91

Genome-wide identification of KANADI1 target genes. PLoS One (2013) 0.88

Analysis of the intestinal microbiota using SOLiD 16S rRNA gene sequencing and SOLiD shotgun sequencing. BMC Genomics (2013) 0.88

Identifying a species tree subject to random lateral gene transfer. J Theor Biol (2013) 0.84

Leucine-rich repeat kinase 2 modulates retinoic acid-induced neuronal differentiation of murine embryonic stem cells. PLoS One (2011) 0.83

Improved layout of phylogenetic networks. IEEE/ACM Trans Comput Biol Bioinform (2008) 0.82

Filtered Z-closure supernetworks for extracting and visualizing recurrent signal from incongruent gene trees. Syst Biol (2008) 0.82

Phenocopy--a strategy to qualify chemical compounds during hit-to-lead and/or lead optimization. PLoS One (2010) 0.79

VisRD--visual recombination detection. Bioinformatics (2004) 0.78

CrossLink: visualization and exploration of sequence relationships between (micro) RNAs. Nucleic Acids Res (2006) 0.75