Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

PubWeight™: 151.16‹?› | Rank: Top 0.01% | All-Time Top 100

🔗 View Article (PMC 2336801)

Published in Genome Res on March 18, 2008

Authors

Daniel R Zerbino1, Ewan Birney

Author Affiliations

1: EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.

Associated clinical trials:

OR PathTrac (Tracking Intra-operative Bacterial Transmission) | NCT03605498

Articles citing this

(truncated to the top 100)

Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res (2008) 157.44

TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (2009) 81.13

SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol (2012) 62.36

Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol (2011) 53.86

The diploid genome sequence of an Asian individual. Nature (2008) 46.29

De novo assembly of human genomes with massively parallel short read sequencing. Genome Res (2009) 45.91

Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics (2014) 44.23

ABySS: a parallel assembler for short read sequence data. Genome Res (2009) 43.20

Sequencing technologies - the next generation. Nat Rev Genet (2009) 40.57

A comprehensive catalogue of somatic mutations from a human cancer genome. Nature (2009) 24.27

High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A (2010) 22.97

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience (2012) 20.89

BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods (2009) 18.41

Computation for ChIP-seq and RNA-seq studies. Nat Methods (2009) 16.11

Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics (2009) 15.08

QUAST: quality assessment tool for genome assemblies. Bioinformatics (2013) 13.07

Quake: quality-aware detection and correction of sequencing errors. Genome Biol (2010) 12.52

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res (2009) 12.09

GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res (2012) 11.33

Mutational processes molding the genomes of 21 breast cancers. Cell (2012) 11.22

Aggressive assembly of pyrosequencing reads with mates. Bioinformatics (2008) 11.01

Toward almost closed genomes with GapFiller. Genome Biol (2012) 10.92

Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat Methods (2009) 10.41

Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics (2012) 9.68

SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics (2010) 9.47

BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics (2010) 9.30

Dindel: accurate indel calls from short-read data. Genome Res (2010) 8.62

Assembly algorithms for next-generation sequencing data. Genomics (2010) 8.56

Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res (2008) 8.44

Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res (2011) 8.38

Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet (2011) 8.34

Multilocus sequence typing of total-genome-sequenced bacteria. J Clin Microbiol (2012) 8.15

The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res (2009) 7.87

Annotating genomes with massive-scale RNA sequencing. Genome Biol (2008) 7.73

De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Res (2008) 7.66

Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res (2008) 7.35

Genome structural variation discovery and genotyping. Nat Rev Genet (2011) 7.34

A physical, genetic and functional sequence assembly of the barley genome. Nature (2012) 7.25

Assembling single-cell genomes and mini-metagenomes from chimeric MDA products. J Comput Biol (2013) 6.90

Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res (2011) 6.88

MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res (2008) 6.82

ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol (2009) 6.76

Visualizing genomes: techniques and challenges. Nat Methods (2010) 6.66

Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet (2011) 6.59

Transcriptome sequencing across a prostate cancer cohort identifies PCAT-1, an unannotated lincRNA implicated in disease progression. Nat Biotechnol (2011) 6.54

The European Nucleotide Archive. Nucleic Acids Res (2010) 6.48

Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol (2013) 6.40

Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. J Comput Biol (2010) 6.38

Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature (2013) 6.07

Efficient de novo assembly of large genomes using compressed data structures. Genome Res (2011) 6.05

Assembly of large genomes using second-generation sequencing. Genome Res (2010) 5.94

Next-generation transcriptome assembly. Nat Rev Genet (2011) 5.89

Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol (2010) 5.79

From RNA-seq reads to differential expression results. Genome Biol (2010) 5.77

Evidence for several waves of global transmission in the seventh cholera pandemic. Nature (2011) 5.62

De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet (2012) 5.61

Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat Biotechnol (2011) 5.60

Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet (2011) 5.58

The MaSuRCA genome assembler. Bioinformatics (2013) 5.07

Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology. Nucleic Acids Res (2008) 5.00

Improvements to services at the European Nucleotide Archive. Nucleic Acids Res (2009) 5.00

Stacks: building and genotyping Loci de novo from short-read sequences. G3 (Bethesda) (2011) 4.99

Using the Velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinformatics (2010) 4.88

SOPRA: Scaffolding algorithm for paired reads via statistical optimization. BMC Bioinformatics (2010) 4.76

Whole genome amplification and de novo assembly of single bacterial cells. PLoS One (2009) 4.65

CISA: contig integrator for sequence assembly of bacterial genomes. PLoS One (2013) 4.62

Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler. PLoS One (2009) 4.60

Genome assembly reborn: recent computational challenges. Brief Bioinform (2009) 4.53

Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. Genome Res (2009) 4.47

Sense from sequence reads: methods for alignment and assembly. Nat Methods (2009) 4.44

A primer on metagenomics. PLoS Comput Biol (2010) 4.40

Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics (2011) 4.31

Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res (2014) 4.29

Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet (2012) 4.29

Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res (2010) 4.18

Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One (2012) 4.18

Efficient construction of an assembly string graph using the FM-index. Bioinformatics (2010) 4.13

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience (2013) 4.11

Emergence and global spread of epidemic healthcare-associated Clostridium difficile. Nat Genet (2012) 4.10

De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification. DNA Res (2011) 4.06

High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nat Rev Microbiol (2012) 3.94

A pilot study of rapid benchtop sequencing of Staphylococcus aureus and Clostridium difficile for outbreak detection and surveillance. BMJ Open (2012) 3.93

Virus discovery by deep sequencing and assembly of virus-derived small silencing RNAs. Proc Natl Acad Sci U S A (2010) 3.92

Finished bacterial genomes from shotgun sequence data. Genome Res (2012) 3.86

Targeted restoration of the intestinal microbiota with a simple, defined bacteriotherapy resolves relapsing Clostridium difficile disease in mice. PLoS Pathog (2012) 3.81

Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw. Nature (2011) 3.80

Comparative genome and phenotypic analysis of Clostridium difficile 027 strains provides insight into the evolution of a hypervirulent bacterium. Genome Biol (2009) 3.66

CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics (2011) 3.64

Next generation sequence assembly with AMOS. Curr Protoc Bioinformatics (2011) 3.60

PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nat Biotechnol (2011) 3.57

MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res (2012) 3.52

Error correction of high-throughput sequencing datasets with non-uniform coverage. Bioinformatics (2011) 3.51

Comparing de novo assemblers for 454 transcriptome data. BMC Genomics (2010) 3.49

Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics (2010) 3.45

Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat Biotechnol (2012) 3.43

De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome Res (2008) 3.41

Sequence-based discovery of Bradyrhizobium enterica in cord colitis syndrome. N Engl J Med (2013) 3.39

How to apply de Bruijn graphs to genome assembly. Nat Biotechnol (2011) 3.36

Full-genome deep sequencing and phylogenetic analysis of novel human betacoronavirus. Emerg Infect Dis (2013) 3.35

Propionibacterium acnes strain populations in the human skin microbiome associated with acne. J Invest Dermatol (2013) 3.32

Articles cited by this

Initial sequencing and analysis of the human genome. Nature (2001) 212.86

Genome sequencing in microfabricated high-density picolitre reactors. Nature (2005) 150.21

The sequence of the human genome. Science (2001) 101.55

Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 96.15

Genome-wide mapping of in vivo protein-DNA interactions. Science (2007) 64.92

A whole-genome assembly of Drosophila. Science (2000) 38.48

Whole-genome re-sequencing. Curr Opin Genet Dev (2006) 35.24

An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci U S A (2001) 31.51

Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics (1988) 27.63

Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics (2005) 24.54

ARACHNE: a whole-genome shotgun assembler. Genome Res (2002) 22.72

Assembling millions of short DNA sequences using SSAKE. Bioinformatics (2006) 18.71

SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res (2007) 16.20

The phusion assembler. Genome Res (2003) 15.25

Extending assembly of short DNA sequences to handle error. Bioinformatics (2007) 14.46

A new algorithm for DNA sequence assembly. J Comput Biol (1995) 12.39

PCAP: a whole-genome assembly program. Genome Res (2003) 12.36

The fragment assembly string graph. Bioinformatics (2005) 11.84

Emerging technologies in DNA sequencing. Genome Res (2005) 10.64

Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution. Nat Genet (2007) 10.38

The Atlas genome assembly system. Genome Res (2004) 9.78

Fragment assembly with short reads. Bioinformatics (2004) 9.47

Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS One (2007) 8.70

Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nat Methods (2004) 7.93

A parallel graph decomposition algorithm for DNA sequencing with nanopores. Bioinformatics (2004) 5.89

Articles by these authors

(truncated to the top 100)

Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 96.15

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 75.09

The Bioperl toolkit: Perl modules for the life sciences. Genome Res (2002) 58.63

The Pfam protein families database. Nucleic Acids Res (2002) 51.34

Patterns of somatic mutation in human cancer genomes. Nature (2007) 38.41

Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics (2005) 24.54

Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature (2004) 24.40

A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol (2008) 21.72

The genome sequence of the malaria mosquito Anopheles gambiae. Science (2002) 20.36

International network of cancer genome projects. Nature (2010) 20.35

A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature (2009) 18.39

EnsMart: a generic system for fast and flexible access to biological data. Genome Res (2004) 17.64

Evolutionary and biomedical insights from the rhesus macaque genome. Science (2007) 16.21

Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res (2008) 15.69

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res (2009) 14.90

Ensembl 2011. Nucleic Acids Res (2010) 14.68

The International Protein Index: an integrated database for proteomics experiments. Proteomics (2004) 14.67

Ensembl 2012. Nucleic Acids Res (2011) 14.55

Reactome: a knowledge base of biologic pathways and processes. Genome Biol (2007) 13.36

EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res (2008) 12.72

Ensembl 2014. Nucleic Acids Res (2013) 12.62

Prepublication data sharing. Nature (2009) 12.24

Ensembl 2013. Nucleic Acids Res (2012) 11.70

Optimized design and assessment of whole genome tiling arrays. Bioinformatics (2007) 11.38

Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res (2010) 11.23

Ensembl's 10th year. Nucleic Acids Res (2009) 10.82

Mouse genomic variation and its effect on phenotypes and gene regulation. Nature (2011) 10.66

Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics (2012) 9.68

Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science (2002) 9.43

Genome sequence of Aedes aegypti, a major arbovirus vector. Science (2007) 9.19

The BioPAX community standard for pathway data sharing. Nat Biotechnol (2010) 9.19

A high-resolution map of human evolutionary constraint using 29 mammals. Nature (2011) 8.67

The Reactome pathway knowledgebase. Nucleic Acids Res (2013) 8.56

Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res (2008) 7.35

The Ensembl core software libraries. Genome Res (2004) 7.30

The HGNC Database in 2008: a resource for the human genome. Nucleic Acids Res (2007) 7.29

EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol (2006) 7.06

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res (2007) 7.05

Integrating biological data--the Distributed Annotation System. BMC Bioinformatics (2008) 6.56

The European Nucleotide Archive. Nucleic Acids Res (2010) 6.48

Challenges and standards in integrating surveys of structural variation. Nat Genet (2007) 6.05

Heritable individual-specific and allele-specific chromatin signatures in humans. Science (2010) 5.94

Genome analysis of the platypus reveals unique signatures of evolution. Nature (2008) 5.74

The landscape of histone modifications across 1% of the human genome in five human cell lines. Genome Res (2007) 5.67

Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res (2011) 5.60

Immunity-related genes and gene families in Anopheles gambiae. Science (2002) 5.47

Petabyte-scale innovations at the European Nucleotide Archive. Nucleic Acids Res (2008) 5.21

The genomic basis of adaptive evolution in threespine sticklebacks. Nature (2012) 5.20

Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res (2008) 5.12

Improvements to services at the European Nucleotide Archive. Nucleic Acids Res (2009) 5.00

A physical map of the mouse genome. Nature (2002) 4.97

An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs). Genome Res (2008) 4.84

Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res (2012) 4.80

High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res (2010) 4.69

A database and API for variation, dense genotyping and resequencing data. BMC Bioinformatics (2010) 4.68

Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler. PLoS One (2009) 4.60

Sense from sequence reads: methods for alignment and assembly. Nat Methods (2009) 4.44

Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Genome Res (2011) 4.43

Locus Reference Genomic sequences: an improved basis for describing human DNA variants. Genome Med (2010) 4.19

The implications of alternative splicing in the ENCODE protein complement. Proc Natl Acad Sci U S A (2007) 3.93

Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database. Nucleic Acids Res (2007) 3.84

Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res (2012) 3.80

Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc (2009) 3.77

VectorBase: a data resource for invertebrate vector genomics. Nucleic Acids Res (2008) 3.73

Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol (2012) 3.61

Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat Rev Genet (2003) 3.45

Defining functional DNA elements in the human genome. Proc Natl Acad Sci U S A (2014) 3.35

The European Bioinformatics Institute's data resources. Nucleic Acids Res (2003) 3.34

Ensembl variation resources. BMC Genomics (2010) 3.17

TranscriptSNPView: a genome-wide catalog of mouse coding variation. Nat Genet (2006) 3.10

SNP and haplotype mapping for genetic analysis in the rat. Nat Genet (2008) 2.96

VectorBase: a home for invertebrate vectors of human pathogens. Nucleic Acids Res (2006) 2.94

Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species. Nucleic Acids Res (2011) 2.87

Modeling gene expression using chromatin features in various cellular contexts. Genome Biol (2012) 2.76

Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res (2012) 2.66

Comparison of human chromosome 21 conserved nongenic sequences (CNGs) with the mouse and dog genomes shows that their selective constraint is independent of their genic environment. Genome Res (2004) 2.58

Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature (2013) 2.56

The EBI RDF platform: linked open data for the life sciences. Bioinformatics (2014) 2.55

Arabidopsis reactome: a foundation knowledgebase for plant systems biology. Plant Cell (2008) 2.50

Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment. Bioinformatics (2008) 2.46

Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium. Nucleic Acids Res (2012) 2.32

The Anopheles gambiae genome: an update. Trends Parasitol (2004) 2.10

A transcription factor collective defines cardiac cell fate and reflects lineage history. Cell (2012) 2.02

Trawler: de novo regulatory motif discovery pipeline for chromatin immunoprecipitation. Nat Methods (2007) 2.01

Analysis of variation at transcription factor binding sites in Drosophila and humans. Genome Biol (2012) 1.97

What everybody should know about the rat genome and its online resources. Nat Genet (2008) 1.94

Major submissions tool developments at the European Nucleotide Archive. Nucleic Acids Res (2011) 1.94

A survey of homozygous deletions in human cancer genomes. Proc Natl Acad Sci U S A (2005) 1.94

Genome browsing with Ensembl: a practical overview. Brief Funct Genomic Proteomic (2007) 1.93

Genomic information infrastructure after the deluge. Genome Biol (2010) 1.89

The future of DNA sequence archiving. Gigascience (2012) 1.85

Sockeye: a 3D environment for comparative genomics. Genome Res (2004) 1.80

EMMA--mouse mutant resources for the international scientific community. Nucleic Acids Res (2009) 1.75

RNAcentral: A vision for an international database of RNA sequences. RNA (2011) 1.73

Genome annotation techniques: new approaches and challenges. Drug Discov Today (2002) 1.65

The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates. Genome Biol (2005) 1.58

Cell-type specific and combinatorial usage of diverse transcription factors revealed by genome-wide binding studies in multiple human cells. Genome Res (2011) 1.53

Identification of novel peptide hormones in the human proteome by hidden Markov model screening. Genome Res (2007) 1.50

The genome sequence of the spontaneously hypertensive rat: Analysis and functional significance. Genome Res (2010) 1.45

Update of the Anopheles gambiae PEST genome assembly. Genome Biol (2007) 1.44