Detection and correction of false segmental duplications caused by genome mis-assembly.

PubWeight™: 1.54‹?› | Rank: Top 4%

🔗 View Article (PMC 2864568)

Published in Genome Biol on March 10, 2010

Authors

David R Kelley1, Steven L Salzberg

Author Affiliations

1: Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA. dakelley@umiacs.umd.edu

Articles citing this

GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res (2012) 11.33

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience (2013) 4.11

Next-generation sequencing and large genome assemblies. Pharmacogenomics (2012) 1.27

A multi-population consensus genetic map reveals inconsistent marker order among maps likely attributed to structural variations in the apple genome. PLoS One (2012) 1.07

Genomic organization and molecular phylogenies of the beta (beta) keratin multigene family in the chicken (Gallus gallus) and zebra finch (Taeniopygia guttata): implications for feather evolution. BMC Evol Biol (2010) 1.03

Why assembling plant genome sequences is so challenging. Biology (Basel) (2012) 1.02

RepARK--de novo creation of repeat libraries from whole-genome NGS reads. Nucleic Acids Res (2014) 1.01

Sequencing, assembling, and correcting draft genomes using recombinant populations. G3 (Bethesda) (2014) 0.95

Genome and transcriptome sequencing identifies breeding targets in the orphan crop tef (Eragrostis tef). BMC Genomics (2014) 0.89

BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes. Plant Biotechnol J (2016) 0.86

Identification of Low-Confidence Regions in the Pig Reference Genome (Sscrofa10.2). Front Genet (2015) 0.83

Genome re-sequencing of semi-wild soybean reveals a complex Soja population structure and deep introgression. PLoS One (2014) 0.83

Assembly errors cause false tandem duplicate regions in the chicken (Gallus gallus) genome sequence. Chromosoma (2013) 0.81

Deep landscape update of dispersed and tandem repeats in the genome model of the red jungle fowl, Gallus gallus, using a series of de novo investigating tools. BMC Genomics (2016) 0.78

Slip-sliding away: serial changes and homoplasy in repeat number in the Drosophila yakuba homolog of human cancer susceptibility gene BRCA2. PLoS One (2010) 0.76

A survey of innovation through duplication in the reduced genomes of twelve parasites. PLoS One (2014) 0.76

Comparative analyses across cattle genders and breeds reveal the pitfalls caused by false positive and lineage-differential copy number variations. Sci Rep (2016) 0.75

An efficient approach to BAC based assembly of complex genomes. Plant Methods (2016) 0.75

Revealing misassembled segments in the bovine reference genome by high resolution linkage disequilibrium scan. BMC Genomics (2016) 0.75

Characterization of genome-wide segmental duplications reveals a common genomic feature of association with immunity among domestic animals. BMC Genomics (2017) 0.75

Comparative analyses of the major royal jelly protein gene cluster in three Apis species with long amplicon sequencing. DNA Res (2017) 0.75

Articles cited by this

Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res (2008) 151.16

A second generation human haplotype map of over 3.1 million SNPs. Nature (2007) 85.39

dbSNP: the NCBI database of genetic variation. Nucleic Acids Res (2001) 76.97

The genome sequence of Drosophila melanogaster. Science (2000) 74.32

Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science (1995) 68.34

Versatile and open software for comparing large genomes. Genome Biol (2004) 49.45

The diploid genome sequence of an Asian individual. Nature (2008) 46.29

The diploid genome sequence of an individual human. PLoS Biol (2007) 44.80

Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science (1996) 41.35

A whole-genome assembly of Drosophila. Science (2000) 38.48

Initial sequence of the chimpanzee genome and comparison with the human genome. Nature (2005) 25.67

Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature (2005) 23.04

ARACHNE: a whole-genome shotgun assembler. Genome Res (2002) 22.72

Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature (2004) 21.40

Recent segmental duplications in the human genome. Science (2002) 21.30

Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res (2006) 20.92

ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res (2008) 20.61

The genome sequence of the malaria mosquito Anopheles gambiae. Science (2002) 20.36

An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature (2000) 19.19

The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature (2007) 18.04

Segmental duplications and copy-number variation in the human genome. Am J Hum Genet (2005) 13.33

PCAP: a whole-genome assembly program. Genome Res (2003) 12.36

Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res (2003) 12.30

The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science (2009) 8.23

Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science (2009) 7.64

The diploid genome sequence of Candida albicans. Proc Natl Acad Sci U S A (2004) 6.34

A whole-genome assembly of the domestic cow, Bos taurus. Genome Biol (2009) 5.93

A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature (2005) 5.51

A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms. Nature (2004) 5.24

Genome assembly forensics: finding the elusive mis-assembly. Genome Biol (2008) 4.91

Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol (2003) 4.32

Beware of mis-assembled genomes. Bioinformatics (2005) 4.14

Gene regulatory network growth by duplication. Nat Genet (2004) 3.79

A burst of segmental duplications in the genome of the African great ape ancestor. Nature (2009) 3.63

A review on SNP and other types of molecular markers and their use in animal genetics. Genet Sel Evol (2002) 3.22

HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics (2008) 2.96

Assembly of polymorphic genomes: algorithms and application to Ciona savignyi. Genome Res (2005) 2.69

A regulatory SNP causes a human genetic disease by creating a new transcriptional promoter. Science (2006) 2.58

The genomic architecture of segmental duplications and associated copy number variants in dogs. Genome Res (2009) 2.44

Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi. Genome Res (2007) 2.37

Low nucleotide diversity in chimpanzees and bonobos. Genetics (2003) 2.16

Gene duplication: a drive for phenotypic diversity and cause of human disease. Annu Rev Genomics Hum Genet (2007) 2.16

Comparing the human and chimpanzee genomes: searching for needles in a haystack. Genome Res (2005) 1.95

Analysis of segmental duplications and genome assembly in the mouse. Genome Res (2004) 1.89

Assembly reconciliation. Bioinformatics (2007) 1.84

Bos taurus genome assembly. BMC Genomics (2009) 1.83

Recent segmental and gene duplications in the mouse genome. Genome Biol (2003) 1.74

A preliminary comparative analysis of primate segmental duplications shows elevated substitution rates and a great-ape expansion of intrachromosomal duplications. Genome Res (2006) 1.69

Consensus generation and variant detection by Celera Assembler. Bioinformatics (2008) 1.68

A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads. Bioinformatics (2009) 1.55

Evidence for a complex demographic history of chimpanzees. Mol Biol Evol (2004) 1.41

Efficiently detecting polymorphisms during the fragment assembly process. Bioinformatics (2002) 1.30

Detecting heterozygosity in shotgun genome assemblies: Lessons from obligately outcrossing nematodes. Genome Res (2009) 1.27

A machine-learning approach to combined evidence validation of genome assemblies. Bioinformatics (2008) 0.91

Articles by these authors

(truncated to the top 100)

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol (2009) 235.12

Fast gapped-read alignment with Bowtie 2. Nat Methods (2012) 83.79

TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (2009) 81.13

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol (2010) 75.21

Versatile and open software for comparing large genomes. Genome Biol (2004) 49.45

Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (2007) 47.63

Genome sequence of the human malaria parasite Plasmodium falciparum. Nature (2002) 37.89

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc (2012) 35.75

TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol (2013) 32.42

The genome sequence of the malaria mosquito Anopheles gambiae. Science (2002) 20.36

Evolution of genes and genomes on the Drosophila phylogeny. Nature (2007) 18.01

Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res (2002) 17.31

FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics (2011) 13.71

Quake: quality-aware detection and correction of sequencing errors. Genome Biol (2010) 12.52

Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution. Nature (2005) 11.99

The genome of the African trypanosome Trypanosoma brucei. Science (2005) 11.48

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res (2003) 11.03

The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria. Nature (2003) 10.38

Searching for SNPs with cloud computing. Genome Biol (2009) 10.12

Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis. Science (2002) 9.83

A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science (2002) 9.59

Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science (2002) 9.43

Genome sequence of Aedes aegypti, a major arbovirus vector. Science (2007) 9.19

Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature (2002) 8.92

Genomic sequence of the pathogenic and allergenic filamentous fungus Aspergillus fumigatus. Nature (2005) 8.55

Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods (2009) 8.15

Cloud computing and the DNA data race. Nat Biotechnol (2010) 7.81

Minimus: a fast, lightweight genome assembler. BMC Bioinformatics (2007) 7.65

The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science (2005) 7.61

How to map billions of short reads onto genomes. Nat Biotechnol (2009) 6.59

TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol (2011) 6.23

Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature (2008) 5.96

The genome of the blood fluke Schistosoma mansoni. Nature (2009) 5.94

Assembly of large genomes using second-generation sequencing. Genome Res (2010) 5.94

A whole-genome assembly of the domestic cow, Bos taurus. Genome Biol (2009) 5.93

The genome of woodland strawberry (Fragaria vesca). Nat Genet (2010) 5.86

Comparative genome assembly. Brief Bioinform (2004) 5.81

Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet (2011) 5.58

The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature (2008) 5.54

Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol (2005) 5.48

Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote. PLoS Biol (2006) 5.44

Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol (2014) 5.40

Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol (2010) 5.39

Comparative genomics of trypanosomatid parasitic protozoa. Science (2005) 5.37

Bioinformatics challenges of new sequencing technology. Trends Genet (2008) 5.34

Draft genome of the filarial nematode parasite Brugia malayi. Science (2007) 5.28

The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts. Proc Natl Acad Sci U S A (2002) 5.28

The MaSuRCA genome assembler. Bioinformatics (2013) 5.07

Hierarchical scaffolding with Bambus. Genome Res (2004) 4.95

Full-length messenger RNA sequences greatly improve genome annotation. Genome Biol (2002) 4.90

Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis. Science (2007) 4.89

Hawkeye: an interactive visual analytics tool for genome assemblies. Genome Biol (2007) 4.80

The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans. Science (2005) 4.74

Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake. Genome Biol (2007) 4.27

Beware of mis-assembled genomes. Bioinformatics (2005) 4.14

DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics (2004) 3.74

Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol (2008) 3.73

Using MUMmer to identify similar regions in large sequence sets. Curr Protoc Bioinformatics (2003) 3.50

Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics (2010) 3.38

Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol (2004) 3.36

Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes. Science (2005) 2.71

Bacillus anthracis comparative genome analysis in support of the Amerithrax investigation. Proc Natl Acad Sci U S A (2011) 2.62

Computational gene prediction using multiple sources of evidence. Genome Res (2004) 2.58

The value of complete microbial genome sequencing (you get what you pay for). J Bacteriol (2002) 2.58

Sequence of Plasmodium falciparum chromosomes 2, 10, 11 and 14. Nature (2002) 2.49

JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics (2005) 2.37

Genomic insights into methanotrophy: the complete genome sequence of Methylococcus capsulatus (Bath). PLoS Biol (2004) 2.36

GAGE-B: an evaluation of genome assemblers for bacterial organisms. Bioinformatics (2013) 2.24

Physiogenomic resources for rat models of heart, lung and blood disorders. Nat Genet (2006) 2.05

JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions. Genome Biol (2006) 2.00

Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome Biol (2014) 1.90

Two new complete genome sequences offer insight into host and tissue specificity of plant pathogenic Xanthomonas spp. J Bacteriol (2011) 1.80

Automated correction of genome sequence errors. Nucleic Acids Res (2004) 1.80

Comprehensive DNA signature discovery and validation. PLoS Comput Biol (2007) 1.75

COMBREX: a project to accelerate the functional annotation of prokaryotic genomes. Nucleic Acids Res (2010) 1.72

Between a chicken and a grape: estimating the number of human genes. Genome Biol (2010) 1.72

Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies. Brief Bioinform (2011) 1.70

GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders. Nucleic Acids Res (2003) 1.63

The complete genome sequence of Bacillus anthracis Ames "Ancestor". J Bacteriol (2008) 1.61

The age of the Arabidopsis thaliana genome duplication. Plant Mol Biol (2003) 1.52

Computational discovery of internal micro-exons. Genome Res (2003) 1.52

Sequence, annotation, and analysis of synteny between rice chromosome 3 and diverged grass species. Genome Res (2005) 1.48

OperonDB: a comprehensive database of predicted operons in microbial genomes. Nucleic Acids Res (2008) 1.47

Sequencing and assembly of the 22-gb loblolly pine genome. Genetics (2014) 1.46

What are decision trees? Nat Biotechnol (2008) 1.39

2009 Swine-origin influenza A (H1N1) resembles previous influenza isolates. PLoS One (2009) 1.32

Unique features of the loblolly pine (Pinus taeda L.) megagenome revealed through sequence annotation. Genetics (2014) 1.32

Genome sequence of the dioxin-mineralizing bacterium Sphingomonas wittichii RW1. J Bacteriol (2010) 1.29

Computational gene finding in plants. Plant Mol Biol (2002) 1.29

Clustering metagenomic sequences with interpolated Markov models. BMC Bioinformatics (2010) 1.28

The COMBREX project: design, methodology, and initial results. PLoS Biol (2013) 1.24

Insignia: a DNA signature search web server for diagnostic assay development. Nucleic Acids Res (2009) 1.23

Acquisition and evolution of plant pathogenesis-associated gene clusters and candidate determinants of tissue-specificity in xanthomonas. PLoS One (2008) 1.22

A new rhesus macaque assembly and annotation for next-generation sequencing analyses. Biol Direct (2014) 1.20

Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res (2011) 1.17

Probing the pan-genome of Listeria monocytogenes: new insights into intraspecific niche expansion and genomic diversification. BMC Genomics (2010) 1.14

A computational survey of candidate exonic splicing enhancer motifs in the model plant Arabidopsis thaliana. BMC Bioinformatics (2007) 1.13

Do-it-yourself genetic testing. Genome Biol (2010) 1.06

Efficient decoding algorithms for generalized hidden Markov model gene finders. BMC Bioinformatics (2005) 1.05

Contamination in the draft of the human genome masquerades as lateral gene transfer. DNA Seq (2002) 1.00