ARACHNE: a whole-genome shotgun assembler.

PubWeight™: 22.72‹?› | Rank: Top 0.01% | All-Time Top 10000

🔗 View Article (PMC 155255)

Published in Genome Res on January 01, 2002

Authors

Serafim Batzoglou1, David B Jaffe, Ken Stanley, Jonathan Butler, Sante Gnerre, Evan Mauceli, Bonnie Berger, Jill P Mesirov, Eric S Lander

Author Affiliations

1: Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.

Articles citing this

(truncated to the top 100)

Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res (2008) 151.16

Versatile and open software for comparing large genomes. Genome Biol (2004) 49.45

De novo assembly of human genomes with massively parallel short read sequencing. Genome Res (2009) 45.91

ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res (2008) 20.61

SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res (2007) 16.20

Short read fragment assembly of bacterial genomes. Genome Res (2007) 15.40

The phusion assembler. Genome Res (2003) 15.25

Quake: quality-aware detection and correction of sequencing errors. Genome Biol (2010) 12.52

PCAP: a whole-genome assembly program. Genome Res (2003) 12.36

Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res (2003) 12.30

Diet-induced obesity is linked to marked but reversible alterations in the mouse distal gut microbiome. Cell Host Microbe (2008) 11.97

The Atlas genome assembly system. Genome Res (2004) 9.78

Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS One (2007) 8.70

Assembly algorithms for next-generation sequencing data. Genomics (2010) 8.56

Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res (2011) 8.38

Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol (2002) 8.07

Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol (2009) 8.06

Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol Cell (2010) 6.88

Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. J Comput Biol (2010) 6.38

Assembly of large genomes using second-generation sequencing. Genome Res (2010) 5.94

Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol (2010) 5.79

On the sequencing of the human genome. Proc Natl Acad Sci U S A (2002) 5.59

The MaSuRCA genome assembler. Bioinformatics (2013) 5.07

Hierarchical scaffolding with Bambus. Genome Res (2004) 4.95

Genome assembly forensics: finding the elusive mis-assembly. Genome Biol (2008) 4.91

Hawkeye: an interactive visual analytics tool for genome assemblies. Genome Biol (2007) 4.80

Initial sequence and comparative analysis of the cat genome. Genome Res (2007) 4.67

Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler. PLoS One (2009) 4.60

De novo repeat classification and fragment assembly. Genome Res (2004) 4.58

Genome assembly reborn: recent computational challenges. Brief Bioinform (2009) 4.53

A primer on metagenomics. PLoS Comput Biol (2010) 4.40

RePS: a sequence assembler that masks exact repeats identified from the shotgun data. Genome Res (2002) 4.35

A bioinformatician's guide to metagenomics. Microbiol Mol Biol Rev (2008) 4.33

Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One (2012) 4.18

Assessing the gene space in draft genomes. Nucleic Acids Res (2008) 4.03

Lessons from the genome sequence of Neurospora crassa: tracing the path from genomic blueprint to multicellular organism. Microbiol Mol Biol Rev (2004) 3.67

Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics (2010) 3.45

The complete genome and proteome of Mycoplasma mobile. Genome Res (2004) 3.36

A hybrid approach for the automated finishing of bacterial genomes. Nat Biotechnol (2012) 3.29

Strand-specific RNA sequencing reveals extensive regulated long antisense transcripts that are conserved across yeast species. Genome Biol (2010) 3.09

Exploiting sparseness in de novo genome assembly. BMC Bioinformatics (2012) 2.88

Sequence and genetic map of Meloidogyne hapla: A compact nematode genome for plant parasitism. Proc Natl Acad Sci U S A (2008) 2.85

Correcting errors in shotgun sequences. Nucleic Acids Res (2003) 2.75

Assembly of polymorphic genomes: algorithms and application to Ciona savignyi. Genome Res (2005) 2.69

Sepsid even-skipped enhancers are functionally conserved in Drosophila despite lack of sequence conservation. PLoS Genet (2008) 2.67

Structure and architecture of the maize genome. Plant Physiol (2005) 2.52

Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers. J Comput Biol (2011) 2.47

Lymphopenia in the BB rat model of type 1 diabetes is due to a mutation in a novel immune-associated nucleotide (Ian)-related gene. Genome Res (2002) 2.45

Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi. Genome Res (2007) 2.37

Comparing de novo genome assembly: the long and short of it. PLoS One (2011) 2.37

A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma mansoni. PLoS Negl Trop Dis (2012) 2.32

Insights into evolution of multicellular fungi from the assembled chromosomes of the mushroom Coprinopsis cinerea (Coprinus cinereus). Proc Natl Acad Sci U S A (2010) 2.30

Chromatin profiling by directly sequencing small quantities of immunoprecipitated DNA. Nat Methods (2009) 2.26

Genomic analysis of the basal lineage fungus Rhizopus oryzae reveals a whole-genome duplication. PLoS Genet (2009) 2.15

Next-generation sequencing strategies enable routine detection of balanced chromosome rearrangements for clinical diagnostics and genetic research. Am J Hum Genet (2011) 2.12

Gene-boosted assembly of a novel bacterial genome from very short reads. PLoS Comput Biol (2008) 2.04

Genetic basis of virulence attenuation revealed by comparative genomic analysis of Mycobacterium tuberculosis strain H37Ra versus H37Rv. PLoS One (2008) 2.01

Uneven chromosome contraction and expansion in the maize genome. Genome Res (2006) 1.96

Assembling genomes using short-read sequencing technology. Genome Biol (2010) 1.95

Analysis of segmental duplications and genome assembly in the mouse. Genome Res (2004) 1.89

De novo assembly of highly diverse viral populations. BMC Genomics (2012) 1.87

A haplome alignment and reference sequence of the highly polymorphic Ciona savignyi genome. Genome Biol (2007) 1.82

Automated correction of genome sequence errors. Nucleic Acids Res (2004) 1.80

Complete genome sequence of the extremely acidophilic methanotroph isolate V4, Methylacidiphilum infernorum, a representative of the bacterial phylum Verrucomicrobia. Biol Direct (2008) 1.78

Comparative genomic characterization of Francisella tularensis strains belonging to low and high virulence subspecies. PLoS Pathog (2009) 1.77

Genome variation in Cryptococcus gattii, an emerging pathogen of immunocompetent hosts. MBio (2011) 1.76

Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies. Brief Bioinform (2011) 1.70

ECHO: a reference-free short-read error correction algorithm. Genome Res (2011) 1.68

Genomic analysis of smooth tubercle bacilli provides insights into ancestry and pathoadaptation of Mycobacterium tuberculosis. Nat Genet (2013) 1.67

Analysis of high-throughput sequencing and annotation strategies for phage genomes. PLoS One (2010) 1.64

An algorithm for assembly of ordered restriction maps from single DNA molecules. Proc Natl Acad Sci U S A (2006) 1.63

Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing. BMC Bioinformatics (2011) 1.62

Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics (2008) 1.60

Computational solutions for omics data. Nat Rev Genet (2013) 1.58

A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads. Bioinformatics (2009) 1.55

Detection and correction of false segmental duplications caused by genome mis-assembly. Genome Biol (2010) 1.54

GABenchToB: a genome assembly benchmark tuned on bacteria and benchtop sequencers. PLoS One (2014) 1.49

Comparative genomics of the apicomplexan parasites Toxoplasma gondii and Neospora caninum: Coccidia differing in host range and transmission strategy. PLoS Pathog (2012) 1.44

Genomic characterization of Campylobacter jejuni strain M1. PLoS One (2010) 1.44

An improved canine genome and a comprehensive catalogue of coding genes and non-coding transcripts. PLoS One (2014) 1.44

Comparative genomic analysis of carbon and nitrogen assimilation mechanisms in three indigenous bioleaching bacteria: predictions and validations. BMC Genomics (2008) 1.43

The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences. BMC Genomics (2010) 1.40

Evaluating the fidelity of de novo short read metagenomic assembly using simulated data. PLoS One (2011) 1.39

Genome structure and metabolic features in the red seaweed Chondrus crispus shed light on evolution of the Archaeplastida. Proc Natl Acad Sci U S A (2013) 1.36

The Reference Genome of the Halophytic Plant Eutrema salsugineum. Front Plant Sci (2013) 1.34

Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats. BMC Genomics (2008) 1.32

Error and error mitigation in low-coverage genome assemblies. PLoS One (2011) 1.31

Assisted assembly: how to improve a de novo genome assembly by using related species. Genome Biol (2009) 1.28

More on the sequencing of the human genome. Proc Natl Acad Sci U S A (2003) 1.27

Chromosome complement of the fungal plant pathogen Fusarium graminearum based on genetic and physical mapping and cytological observations. Genetics (2005) 1.26

Genome sequence of the Fleming strain of Micrococcus luteus, a simple free-living actinobacterium. J Bacteriol (2009) 1.25

Genetic variation and the de novo assembly of human genomes. Nat Rev Genet (2015) 1.24

Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief Bioinform (2015) 1.22

Transcriptome of Pneumocystis carinii during fulminate infection: carbohydrate metabolism and the concept of a compatible parasite. PLoS One (2007) 1.20

Comprehensive variation discovery in single human genomes. Nat Genet (2014) 1.18

Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants. PLoS Comput Biol (2009) 1.18

What's driving false discovery rates? J Proteome Res (2007) 1.14

Application of a superword array in genome assembly. Nucleic Acids Res (2006) 1.14

Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies. BMC Bioinformatics (2011) 1.13

Functional assignment of metagenomic data: challenges and applications. Brief Bioinform (2012) 1.13

Articles cited by this

DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A (1977) 790.54

Initial sequencing and analysis of the human genome. Nature (2001) 212.86

Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A (1988) 193.60

A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol (1970) 155.96

Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res (1998) 96.63

The genome sequence of Drosophila melanogaster. Science (2000) 74.32

Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature (2000) 70.33

Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science (1995) 68.34

Genome sequence of the nematode C. elegans: a platform for investigating biology. Science (1998) 61.48

CAP3: A DNA sequence assembly program. Genome Res (1999) 50.04

Life with 6000 genes. Science (1996) 41.51

A whole-genome assembly of Drosophila. Science (2000) 38.48

The DNA sequence of human chromosome 22. Nature (1999) 30.88

Nucleotide sequence of bacteriophage lambda DNA. J Mol Biol (1982) 19.80

Toward simplifying and accurately formulating fragment assembly. J Comput Biol (1995) 12.50

The DNA sequence of human chromosome 21. Nature (2000) 10.66

SEQAID: a DNA sequence assembling program based on a mathematical model. Nucleic Acids Res (1984) 6.91

A contig assembly program based on sensitive detection of fragment overlaps. Genomics (1992) 6.53

Automated DNA sequencing of the human HPRT locus. Genomics (1990) 5.92

AMASS: a structured pattern matching approach to shotgun sequence assembly. J Comput Biol (1999) 2.49

Articles by these authors

Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A (2005) 167.46

Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 96.15

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 75.09

Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature (2007) 65.18

Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol (2011) 53.86

PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet (2003) 53.59

The structure of haplotype blocks in the human genome. Science (2002) 50.88

A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell (2006) 48.80

Integrative genomics viewer. Nat Biotechnol (2011) 42.83

Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell (2010) 39.09

Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature (2009) 35.48

The landscape of somatic copy-number alteration across human cancers. Nature (2010) 31.88

The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature (2012) 31.78

Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature (2005) 31.60

Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature (2008) 30.29

Somatic mutations affect key pathways in lung adenocarcinoma. Nature (2008) 30.02

Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science (2009) 29.83

Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature (2003) 29.16

GenePattern 2.0. Nat Genet (2006) 29.07

Transcriptional regulatory code of a eukaryotic genome. Nature (2004) 27.21

Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol (2009) 27.17

The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science (2006) 25.99

Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform (2012) 23.58

Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature (2005) 23.04

High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A (2010) 22.97

Detecting recent positive selection in the human genome from haplotype structure. Nature (2002) 22.00

A molecular signature of metastasis in primary solid tumors. Nat Genet (2002) 21.36

Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A (2009) 20.66

ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res (2008) 20.61

International network of cancer genome projects. Nature (2010) 20.35

Genomic maps and comparative analysis of histone modifications in human and mouse. Cell (2005) 18.96

Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc Natl Acad Sci U S A (2007) 18.83

A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell (2006) 18.81

Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol (2010) 18.44

The mammalian epigenome. Cell (2007) 18.13

Evolution of genes and genomes on the Drosophila phylogeny. Nature (2007) 18.01

Initial genome sequencing and analysis of multiple myeloma. Nature (2011) 17.28

Genome-wide detection and characterization of positive selection in human populations. Nature (2007) 17.27

Risk alleles for multiple sclerosis identified by a genomewide study. N Engl J Med (2007) 17.06

The mutational landscape of head and neck squamous cell carcinoma. Science (2011) 16.88

Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet (2003) 16.51

Characterizing the cancer genome in lung adenocarcinoma. Nature (2007) 16.48

Dissecting direct reprogramming through integrative genomic analysis. Nature (2008) 16.47

Assessing the impact of population stratification on genetic association studies. Nat Genet (2004) 16.28

Gene expression correlates of clinical prostate cancer behavior. Cancer Cell (2002) 16.27

Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol (2013) 16.13

Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature (2002) 15.36

Genetic mapping in human disease. Science (2008) 15.12

The genomic complexity of primary human prostate cancer. Nature (2011) 14.06

Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med (2002) 14.01

The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol (2010) 13.99

MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet (2001) 13.79

A landscape of driver mutations in melanoma. Cell (2012) 12.61

High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods (2008) 12.56

Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science (2010) 12.39

Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature (2004) 12.32

Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res (2003) 12.30

A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell (2010) 12.27

Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell (2012) 11.69

Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature (2009) 11.46

The genome sequence of the filamentous fungus Neurospora crassa. Nature (2003) 11.39

lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature (2011) 11.31

The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci U S A (2012) 11.23

Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet (2008) 11.17

SF3B1 and other novel cancer genes in chronic lymphocytic leukemia. N Engl J Med (2011) 11.07