Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler.

PubWeight™: 4.60‹?› | Rank: Top 1%

🔗 View Article (PMC 2793427)

Published in PLoS One on December 22, 2009

Authors

Daniel R Zerbino1, Gayle K McEwen, Elliott H Margulies, Ewan Birney

Author Affiliations

1: European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. zerbino@ebi.ac.uk

Articles citing this

Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics (2012) 9.68

Assembly algorithms for next-generation sequencing data. Genomics (2010) 8.56

Assembly of large genomes using second-generation sequencing. Genome Res (2010) 5.94

De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet (2012) 5.61

Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet (2011) 5.58

The long-term stability of the human gut microbiota. Science (2013) 5.15

Using the Velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinformatics (2010) 4.88

Population genomics of early events in the ecological differentiation of bacteria. Science (2012) 4.67

MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res (2012) 3.52

Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J Comput Biol (2011) 2.85

A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies. PLoS One (2011) 2.55

Meta-IDBA: a de Novo assembler for metagenomic data. Bioinformatics (2011) 2.46

Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics (2012) 2.46

Comparing de novo genome assembly: the long and short of it. PLoS One (2011) 2.37

Bambus 2: scaffolding metagenomes. Bioinformatics (2011) 2.24

Assembling genomes using short-read sequencing technology. Genome Biol (2010) 1.95

Draft genome sequence and genetic transformation of the oleaginous alga Nannochloropis gaditana. Nat Commun (2012) 1.83

DNA phosphorothioation is widespread and quantized in bacterial genomes. Proc Natl Acad Sci U S A (2011) 1.69

ESRRA-C11orf20 is a recurrent gene fusion in serous ovarian carcinoma. PLoS Biol (2011) 1.53

Genome-wide analysis of HPV integration in human cancers reveals recurrent, focal genomic instability. Genome Res (2013) 1.47

Genome characterisation of the genus Francisella reveals insight into similar evolutionary paths in pathogens of mammals and fish. BMC Genomics (2012) 1.45

Meraculous: de novo genome assembly with short paired-end reads. PLoS One (2011) 1.37

Draft genome sequence of Pantoea ananatis B1-9, a nonpathogenic plant growth-promoting bacterium. J Bacteriol (2012) 1.36

Next-generation sequencing and large genome assemblies. Pharmacogenomics (2012) 1.27

Sequencing platform and library preparation choices impact viral metagenomes. BMC Genomics (2013) 1.24

DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data. DNA Res (2013) 1.24

Comprehensive variation discovery in single human genomes. Nat Genet (2014) 1.18

Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies. BMC Bioinformatics (2011) 1.13

Effects of GC bias in next-generation-sequencing data on de novo genome assembly. PLoS One (2013) 1.13

Beginner's guide to comparative bacterial genome analysis using next-generation sequence data. Microb Inform Exp (2013) 1.08

Parallelized short read assembly of large genomes using de Bruijn graphs. BMC Bioinformatics (2011) 1.08

Next-generation sequence assembly: four stages of data processing and computational challenges. PLoS Comput Biol (2013) 1.07

De novo assembly of the perennial ryegrass transcriptome using an RNA-Seq strategy. PLoS One (2014) 1.05

Transcriptome-scale homoeolog-specific transcript assemblies of bread wheat. BMC Genomics (2012) 1.05

Whole genome sequencing of peach (Prunus persica L.) for SNP identification and selection. BMC Genomics (2011) 1.04

Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol (2015) 1.03

Simultaneous genome sequencing of symbionts and their hosts. Symbiosis (2012) 0.99

Genome scale evolution of myxoma virus reveals host-pathogen adaptation and rapid geographic spread. J Virol (2013) 0.96

Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes. BMC Genomics (2014) 0.95

Transcriptome of the Lymantria dispar (gypsy moth) larval midgut in response to infection by Bacillus thuringiensis. PLoS One (2013) 0.95

Arabidopsis MSH1 mutation alters the epigenome and produces heritable changes in plant growth. Nat Commun (2015) 0.94

Whole-genome sequencing in bacteriology: state of the art. Infect Drug Resist (2013) 0.94

RNA-Seq reveals complex genetic response to Deepwater Horizon oil release in Fundulus grandis. BMC Genomics (2012) 0.93

Effects of short read quality and quantity on a de novo vertebrate transcriptome assembly. Comp Biochem Physiol C Toxicol Pharmacol (2011) 0.89

Deep sequencing of mixed total DNA without barcodes allows efficient assembly of highly plastic ascidian mitochondrial genomes. Genome Biol Evol (2013) 0.89

A genome-wide association study identifies genomic regions for virulence in the non-model organism Heterobasidion annosum s.s. PLoS One (2013) 0.86

Role of Fig1, a component of the low-affinity calcium uptake system, in growth and sexual development of filamentous fungi. Eukaryot Cell (2012) 0.85

Comparative analysis of the complete genome sequence of the California MSW strain of myxoma virus reveals potential host adaptations. J Virol (2013) 0.85

MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning. DNA Res (2014) 0.84

An extended genovo metagenomic assembler by incorporating paired-end information. PeerJ (2013) 0.84

Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res (2016) 0.83

Draft genome sequences of the Pseudomonas fluorescens biocontrol strains Wayne1R and Wood1R. J Bacteriol (2012) 0.83

Genomic and transcriptomic analyses of the facultative methanotroph Methylocystis sp. strain SB2 grown on methane or ethanol. Appl Environ Microbiol (2014) 0.81

Whole genome analysis of a community-associated methicillin-resistant Staphylococcus aureus ST59 isolate from a case of human sepsis and severe pneumonia in China. PLoS One (2014) 0.81

Sequence comparative analysis using networks: software for evaluating de novo transcript assembly from next-generation sequencing. Mol Biol Evol (2013) 0.80

Comprehensive analysis of transcriptome response to salinity stress in the halophytic turf grass Sporobolus virginicus. Front Plant Sci (2015) 0.80

Draft genome sequences of the biocontrol bacterium Mitsuaria sp. strain H24L5A. J Bacteriol (2012) 0.80

Draft genome sequence of the biocontrol bacterium Chromobacterium sp. strain C-61. J Bacteriol (2011) 0.79

Draft Genome Sequence of Cryptococcus flavescens Strain OH182.9_3C, a Biocontrol Agent against Fusarium Head Blight of Wheat. Genome Announc (2013) 0.79

Group II Intron-Mediated Trans-Splicing in the Gene-Rich Mitochondrial Genome of an Enigmatic Eukaryote, Diphylleia rotans. Genome Biol Evol (2016) 0.78

An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome. BMC Bioinformatics (2015) 0.78

A Non-Synonymous HMGA2 Variant Decreases Height in Shetland Ponies and Other Small Horses. PLoS One (2015) 0.78

An improved protocol for sequencing of repetitive genomic regions and structural variations using mutagenesis and next generation sequencing. PLoS One (2012) 0.78

Complete genome sequence of Rickettsia heilongjiangensis, an emerging tick-transmitted human pathogen. J Bacteriol (2011) 0.78

Genomic and phenotypic characterization of myxoma virus from Great Britain reveals multiple evolutionary pathways distinct from those in Australia. PLoS Pathog (2017) 0.78

A base composition analysis of natural patterns for the preprocessing of metagenome sequences. BMC Bioinformatics (2013) 0.76

Advances in genome studies: The PAG 2010 conference. Funct Integr Genomics (2010) 0.76

High throughput sequencing approaches to mutation discovery in the mouse. Mamm Genome (2012) 0.76

WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data. BMC Bioinformatics (2015) 0.76

Conserved gene order and expanded inverted repeats characterize plastid genomes of Thalassiosirales. PLoS One (2014) 0.76

Transcriptome profiling of sulfate deprivation responses in two agarophytes Gracilaria changii and Gracilaria salicornia (Rhodophyta). Sci Rep (2017) 0.75

Arapan-S: a fast and highly accurate whole-genome assembly software for viruses and small genomes. BMC Res Notes (2012) 0.75

A polychromatic 'greenbeard' locus determines patterns of cooperation in a social amoeba. Nat Commun (2017) 0.75

CRCDA--Comprehensive resources for cancer NGS data analysis. Database (Oxford) (2015) 0.75

Hybrid de novo tandem repeat detection using short and long reads. BMC Med Genomics (2015) 0.75

Genome assembly from synthetic long read clouds. Bioinformatics (2016) 0.75

OMACC: an Optical-Map-Assisted Contig Connector for improving de novo genome assembly. BMC Syst Biol (2013) 0.75

Global analyses of Ceratocystis cacaofunesta mitochondria: from genome to proteome. BMC Genomics (2013) 0.75

Isolation and characterization of 22 EST-SSR markers for the genus Thujopsis (Cupressaceae). Appl Plant Sci (2015) 0.75

Articles cited by this

Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res (2008) 151.16

Genome sequencing in microfabricated high-density picolitre reactors. Nature (2005) 150.21

Genome-wide mapping of in vivo protein-DNA interactions. Science (2007) 64.92

The complete genome of an individual by massively parallel DNA sequencing. Nature (2008) 52.81

ABySS: a parallel assembler for short read sequence data. Genome Res (2009) 43.20

Human-mouse alignments with BLASTZ. Genome Res (2003) 35.49

Whole-genome re-sequencing. Curr Opin Genet Dev (2006) 35.24

Whole-genome sequencing and variant discovery in C. elegans. Nat Methods (2008) 31.92

An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci U S A (2001) 31.51

Mapping and sequencing of structural variation from eight human genomes. Nature (2008) 30.28

Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics (2005) 24.54

ARACHNE: a whole-genome shotgun assembler. Genome Res (2002) 22.72

ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res (2008) 20.61

Assembling millions of short DNA sequences using SSAKE. Bioinformatics (2006) 18.71

SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res (2007) 16.20

Short read fragment assembly of bacterial genomes. Genome Res (2007) 15.40

De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res (2008) 14.90

Extending assembly of short DNA sequences to handle error. Bioinformatics (2007) 14.46

The fragment assembly string graph. Bioinformatics (2005) 11.84

De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Res (2008) 7.66

An analysis of the feasibility of short read sequencing. Nucleic Acids Res (2005) 6.10

Fragment assembly with double-barreled data. Bioinformatics (2001) 6.04

Hierarchical scaffolding with Bambus. Genome Res (2004) 4.95

Crystallizing short-read assemblies around seeds. BMC Bioinformatics (2009) 2.89

De novo assembly of the Pseudomonas syringae pv. syringae B728a genome using Illumina/Solexa short sequence reads. FEMS Microbiol Lett (2008) 2.50

Articles by these authors

Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res (2008) 151.16

Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 96.15

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 75.09

The Bioperl toolkit: Perl modules for the life sciences. Genome Res (2002) 58.63

The Pfam protein families database. Nucleic Acids Res (2002) 51.34

Patterns of somatic mutation in human cancer genomes. Nature (2007) 38.41

Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics (2005) 24.54

Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature (2004) 24.40

A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol (2008) 21.72

The genome sequence of the malaria mosquito Anopheles gambiae. Science (2002) 20.36

International network of cancer genome projects. Nature (2010) 20.35

A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature (2009) 18.39

EnsMart: a generic system for fast and flexible access to biological data. Genome Res (2004) 17.64

Evolutionary and biomedical insights from the rhesus macaque genome. Science (2007) 16.21

High-resolution mapping and characterization of open chromatin across the genome. Cell (2008) 15.93

Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res (2008) 15.69

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res (2009) 14.90

Ensembl 2011. Nucleic Acids Res (2010) 14.68

The International Protein Index: an integrated database for proteomics experiments. Proteomics (2004) 14.67

Ensembl 2012. Nucleic Acids Res (2011) 14.55

Reactome: a knowledge base of biologic pathways and processes. Genome Biol (2007) 13.36

EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res (2008) 12.72

Ensembl 2014. Nucleic Acids Res (2013) 12.62

Prepublication data sharing. Nature (2009) 12.24

Ensembl 2013. Nucleic Acids Res (2012) 11.70

Optimized design and assessment of whole genome tiling arrays. Bioinformatics (2007) 11.38

Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res (2010) 11.23

Ensembl's 10th year. Nucleic Acids Res (2009) 10.82

Mouse genomic variation and its effect on phenotypes and gene regulation. Nature (2011) 10.66

Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol (2004) 10.59

Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics (2012) 9.68

Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science (2002) 9.43

Genome sequence of Aedes aegypti, a major arbovirus vector. Science (2007) 9.19

The BioPAX community standard for pathway data sharing. Nat Biotechnol (2010) 9.19

Accurate and comprehensive sequencing of personal genomes. Genome Res (2011) 8.99

A high-resolution map of human evolutionary constraint using 29 mammals. Nature (2011) 8.67

The Reactome pathway knowledgebase. Nucleic Acids Res (2013) 8.56

Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res (2008) 7.35

The Ensembl core software libraries. Genome Res (2004) 7.30

The HGNC Database in 2008: a resource for the human genome. Nucleic Acids Res (2007) 7.29

EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol (2006) 7.06

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res (2007) 7.05

Integrating biological data--the Distributed Annotation System. BMC Bioinformatics (2008) 6.56

The European Nucleotide Archive. Nucleic Acids Res (2010) 6.48

Challenges and standards in integrating surveys of structural variation. Nat Genet (2007) 6.05

Heritable individual-specific and allele-specific chromatin signatures in humans. Science (2010) 5.94

Systematic comparison of three genomic enrichment methods for massively parallel DNA sequencing. Genome Res (2010) 5.76

Genome analysis of the platypus reveals unique signatures of evolution. Nature (2008) 5.74

Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res (2005) 5.71

The landscape of histone modifications across 1% of the human genome in five human cell lines. Genome Res (2007) 5.67

Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res (2011) 5.60

Immunity-related genes and gene families in Anopheles gambiae. Science (2002) 5.47

Petabyte-scale innovations at the European Nucleotide Archive. Nucleic Acids Res (2008) 5.21

The genomic basis of adaptive evolution in threespine sticklebacks. Nature (2012) 5.20

Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res (2008) 5.12

Improvements to services at the European Nucleotide Archive. Nucleic Acids Res (2009) 5.00

A physical map of the mouse genome. Nature (2002) 4.97

An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs). Genome Res (2008) 4.84

Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res (2012) 4.80

High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res (2010) 4.69

A database and API for variation, dense genotyping and resequencing data. BMC Bioinformatics (2010) 4.68

Sense from sequence reads: methods for alignment and assembly. Nat Methods (2009) 4.44

Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Genome Res (2011) 4.43

Locus Reference Genomic sequences: an improved basis for describing human DNA variants. Genome Med (2010) 4.19

Local DNA topography correlates with functional noncoding regions of the human genome. Science (2009) 4.18

The implications of alternative splicing in the ENCODE protein complement. Proc Natl Acad Sci U S A (2007) 3.93

Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database. Nucleic Acids Res (2007) 3.84

Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res (2012) 3.80

Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc (2009) 3.77

VectorBase: a data resource for invertebrate vector genomics. Nucleic Acids Res (2008) 3.73

Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol (2012) 3.61

Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat Rev Genet (2003) 3.45

Defining functional DNA elements in the human genome. Proc Natl Acad Sci U S A (2014) 3.35

The European Bioinformatics Institute's data resources. Nucleic Acids Res (2003) 3.34

An expressed fgf4 retrogene is associated with breed-defining chondrodysplasia in domestic dogs. Science (2009) 3.29

Ensembl variation resources. BMC Genomics (2010) 3.17

TranscriptSNPView: a genome-wide catalog of mouse coding variation. Nat Genet (2006) 3.10

SNP and haplotype mapping for genetic analysis in the rat. Nat Genet (2008) 2.96

VectorBase: a home for invertebrate vectors of human pathogens. Nucleic Acids Res (2006) 2.94

Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species. Nucleic Acids Res (2011) 2.87

Modeling gene expression using chromatin features in various cellular contexts. Genome Biol (2012) 2.76

Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res (2012) 2.66

Exon capture analysis of G protein-coupled receptors identifies activating mutations in GRM3 in melanoma. Nat Genet (2011) 2.60

Comparison of human chromosome 21 conserved nongenic sequences (CNGs) with the mouse and dog genomes shows that their selective constraint is independent of their genic environment. Genome Res (2004) 2.58

Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature (2013) 2.56

The EBI RDF platform: linked open data for the life sciences. Bioinformatics (2014) 2.55

Arabidopsis reactome: a foundation knowledgebase for plant systems biology. Plant Cell (2008) 2.50

Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment. Bioinformatics (2008) 2.46

Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium. Nucleic Acids Res (2012) 2.32

Sequencing studies in human genetics: design and interpretation. Nat Rev Genet (2013) 2.27

Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics (2013) 2.25

Global epigenomic analysis of primary human pancreatic islets provides insights into type 2 diabetes susceptibility loci. Cell Metab (2010) 2.21

The Anopheles gambiae genome: an update. Trends Parasitol (2004) 2.10

A transcriptomic atlas of mouse neocortical layers. Neuron (2011) 2.08

A transcription factor collective defines cardiac cell fate and reflects lineage history. Cell (2012) 2.02

Trawler: de novo regulatory motif discovery pipeline for chromatin immunoprecipitation. Nat Methods (2007) 2.01

Analysis of variation at transcription factor binding sites in Drosophila and humans. Genome Biol (2012) 1.97

What everybody should know about the rat genome and its online resources. Nat Genet (2008) 1.94