Substantial biases in ultra-short read data sets from high-throughput DNA sequencing.

PubWeight™: 26.36‹?› | Rank: Top 0.01% | All-Time Top 10000

🔗 View Article (PMC 2532726)

Published in Nucleic Acids Res on July 26, 2008

Authors

Juliane C Dohm1, Claudio Lottaz, Tatiana Borodina, Heinz Himmelbauer

Author Affiliations

1: Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, 14195 Berlin, Germany.

Articles citing this

(truncated to the top 100)

A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet (2011) 59.36

Sequencing technologies - the next generation. Nat Rev Genet (2009) 40.57

Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol (2009) 27.17

Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics (2010) 19.86

A draft sequence of the Neandertal genome. Science (2010) 19.55

High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods (2008) 12.56

Quake: quality-aware detection and correction of sequencing errors. Genome Biol (2010) 12.52

RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics (2009) 10.74

Principles and challenges of genomewide DNA methylation analysis. Nat Rev Genet (2010) 10.71

Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat Methods (2009) 10.41

Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol (2012) 10.31

Searching for SNPs with cloud computing. Genome Biol (2009) 10.12

Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res (2008) 10.10

Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol (2009) 9.59

SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics (2010) 9.47

Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol (2011) 9.18

ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res (2012) 9.13

Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res (2010) 9.08

RNA sequencing: advances, challenges and opportunities. Nat Rev Genet (2010) 8.96

Assembly algorithms for next-generation sequencing data. Genomics (2010) 8.56

Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc Natl Acad Sci U S A (2008) 8.38

Computational methods for discovering structural variation with next-generation sequencing. Nat Methods (2009) 7.20

Performance comparison of exome DNA sequencing technologies. Nat Biotechnol (2011) 7.11

Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res (2011) 6.88

RazerS--fast read mapping with sensitivity control. Genome Res (2009) 6.53

Detection and quantification of rare mutations with massively parallel sequencing. Proc Natl Acad Sci U S A (2011) 6.29

Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol (2010) 6.10

Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat Genet (2011) 5.97

Assembly of large genomes using second-generation sequencing. Genome Res (2010) 5.94

Chromatin organization marks exon-intron structure. Nat Struct Mol Biol (2009) 5.90

High-throughput genotyping by whole-genome resequencing. Genome Res (2009) 5.74

Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol (2010) 5.39

Probabilistic base calling of Solexa sequencing data. BMC Bioinformatics (2008) 5.00

Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol (2011) 4.98

Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res (2012) 4.94

Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res (2012) 4.81

A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome Res (2009) 4.78

BS Seeker: precise mapping for bisulfite sequencing. BMC Bioinformatics (2010) 4.66

BIPES, a cost-effective high-throughput method for assessing microbial diversity. ISME J (2010) 4.52

Hybrid selection of discrete genomic intervals on custom-designed microarrays for massively parallel sequencing. Nat Protoc (2009) 4.45

Characterizing and measuring bias in sequence data. Genome Biol (2013) 4.39

Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing. BMC Genomics (2011) 4.39

Fast mapping of short sequences with mismatches, insertions and deletions using index structures. PLoS Comput Biol (2009) 4.24

PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol (2009) 4.18

GC-content normalization for RNA-Seq data. BMC Bioinformatics (2011) 3.89

Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics (2012) 3.74

A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform (2013) 3.60

Identifying ChIP-seq enrichment using MACS. Nat Protoc (2012) 3.60

Synthetic spike-in standards for RNA-seq experiments. Genome Res (2011) 3.58

TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS One (2014) 3.45

Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments. Nucleic Acids Res (2010) 3.36

Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies. Nucleic Acids Res (2010) 3.31

Mapping the Burkholderia cenocepacia niche response via high-throughput sequencing. Proc Natl Acad Sci U S A (2009) 3.16

Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities. Genome Biol (2011) 3.09

G+C content dominates intrinsic nucleosome occupancy. BMC Bioinformatics (2009) 3.05

Genome-wide mutational diversity in an evolving population of Escherichia coli. Cold Spring Harb Symp Quant Biol (2009) 3.03

Identification and correction of systematic error in high-throughput sequence data. BMC Bioinformatics (2011) 2.98

Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes. Am J Hum Genet (2010) 2.98

A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics (2010) 2.94

Identifying differentially expressed transcripts from RNA-seq data with biological variation. Bioinformatics (2012) 2.90

The challenges of sequencing by synthesis. Nat Biotechnol (2009) 2.89

Accurate detection and genotyping of SNPs utilizing population sequencing data. Genome Res (2010) 2.84

Virtual terminator nucleotides for next-generation DNA sequencing. Nat Methods (2009) 2.83

ConDeTri--a content dependent read trimmer for Illumina data. PLoS One (2011) 2.78

Detecting copy number variation with mated short reads. Genome Res (2010) 2.75

PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS One (2011) 2.75

RNA sequencing shows no dosage compensation of the active X-chromosome. Nat Genet (2010) 2.71

Model-based quality assessment and base-calling for second-generation sequencing data. Biometrics (2010) 2.70

Mutational and fitness landscapes of an RNA virus revealed through population sequencing. Nature (2013) 2.68

Impact of chromatin structures on DNA processing for genomic analyses. PLoS One (2009) 2.63

Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes. Proc Natl Acad Sci U S A (2012) 2.59

rQuant.web: a tool for RNA-Seq-based transcript quantitation. Nucleic Acids Res (2010) 2.56

Systematic bias in high-throughput sequencing data and its correction by BEADS. Nucleic Acids Res (2011) 2.48

Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet (2016) 2.48

ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment, even within amplified genomic regions. Genome Biol (2011) 2.47

High nucleosome occupancy is encoded at human regulatory sequences. PLoS One (2010) 2.43

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate. Nucleic Acids Res (2012) 2.36

Efficient frequency-based de novo short-read clustering for error trimming in next-generation sequencing. Genome Res (2009) 2.36

Uncovering the complexity of transcriptomes with RNA-Seq. J Biomed Biotechnol (2010) 2.32

Balanced codon usage optimizes eukaryotic translational efficiency. PLoS Genet (2012) 2.32

Next-generation sequencing for cancer diagnostics: a practical perspective. Clin Biochem Rev (2011) 2.31

Systematic evaluation of factors influencing ChIP-seq fidelity. Nat Methods (2012) 2.31

PCR biases distort bacterial and archaeal community structure in pyrosequencing datasets. PLoS One (2012) 2.31

Noninvasive prenatal diagnosis of fetal trisomy 18 and trisomy 13 by maternal plasma DNA sequencing. PLoS One (2011) 2.29

Chromatin profiling by directly sequencing small quantities of immunoprecipitated DNA. Nat Methods (2009) 2.26

GemSIM: general, error-model based simulator of next-generation sequencing data. BMC Genomics (2012) 2.21

High quality draft sequences for prokaryotic genomes using a mix of new sequencing technologies. BMC Genomics (2008) 2.21

Reciprocal intronic and exonic histone modification regions in humans. Nat Struct Mol Biol (2010) 2.20

Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes. BMC Genomics (2012) 2.20

Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing. BMC Genomics (2012) 2.19

Barcoding bias in high-throughput multiplex sequencing of miRNA. Genome Res (2011) 2.12

Sensitivity of noninvasive prenatal detection of fetal aneuploidy from maternal plasma using shotgun sequencing is limited only by counting statistics. PLoS One (2010) 2.11

Ultrasensitive detection of rare mutations using next-generation targeted resequencing. Nucleic Acids Res (2011) 2.11

Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation. Proc Natl Acad Sci U S A (2011) 2.05

Rapid genomic characterization of the genus vitis. PLoS One (2010) 2.04

Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res (2015) 2.04

A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. Genome Res (2011) 2.00

A metagenomic analysis of pandemic influenza A (2009 H1N1) infection in patients from North America. PLoS One (2010) 1.99

A myriad of miRNA variants in control and Huntington's disease brain regions detected by massively parallel sequencing. Nucleic Acids Res (2010) 1.99

Quantitative miRNA expression analysis: comparing microarrays with next-generation sequencing. RNA (2009) 1.96

Articles cited by this

DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A (1977) 790.54

Genome sequencing in microfabricated high-density picolitre reactors. Nature (2005) 150.21

High-resolution profiling of histone methylations in the human genome. Cell (2007) 85.74

Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods (2007) 45.04

Whole-genome sequencing and variant discovery in C. elegans. Nat Methods (2008) 31.92

Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature (2008) 26.78

Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol (2007) 23.64

SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res (2007) 16.20

Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. Cold Spring Harb Symp Quant Biol (1986) 11.66

Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing. Plant Physiol (2007) 9.45

Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Res (2006) 7.10

454 sequencing put to the test using the complex genome of barley. BMC Genomics (2006) 6.12

An analysis of the feasibility of short read sequencing. Nucleic Acids Res (2005) 6.10

Polony multiplex analysis of gene expression (PMAGE) in mouse hypertrophic cardiomyopathy. Science (2007) 5.19

Who ate whom? Adaptive Helicobacter genomic changes that accompanied a host jump from early humans to large felines. PLoS Genet (2006) 4.65

Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology. BMC Genomics (2006) 4.54

Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes. Nucleic Acids Res (2006) 4.51

Sequence biases in large scale gene expression profiling data. Nucleic Acids Res (2006) 3.83

PCR bias in amplification of androgen receptor alleles, a trinucleotide repeat marker used in clonality studies. Nucleic Acids Res (1995) 3.16

Identification and prevention of a GC content bias in SAGE libraries. Nucleic Acids Res (2001) 2.50

Articles by these authors

Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature (2004) 24.40

International network of cancer genome projects. Nature (2010) 20.35

SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res (2007) 16.20

Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature (2011) 13.18

Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res (2009) 9.40

The Collaborative Cross, a community resource for the genetic analysis of complex traits. Nat Genet (2004) 9.37

Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nat Genet (2011) 6.43

Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol (2011) 4.98

The nature and identification of quantitative trait loci: a community's view. Nat Rev Genet (2003) 3.96

An endoribonuclease-prepared siRNA screen in human cells identifies genes essential for cell division. Nature (2004) 3.69

Identification of a mutation in the extracellular domain of the Epidermal Growth Factor Receptor conferring cetuximab resistance in colorectal cancer. Nat Med (2012) 3.44

Comparison of gene expression profiles between human and mouse monocyte subsets. Blood (2009) 3.15

T cells become licensed in the lung to enter the central nervous system. Nature (2012) 2.45

Characterizing the mouse ES cell transcriptome with Illumina sequencing. Genomics (2008) 2.09

The role of a pseudo-response regulator gene in life cycle adaptation and domestication of beet. Curr Biol (2012) 1.94

Genetic analysis of the mouse brain proteome. Nat Genet (2002) 1.73

Strand-specific deep sequencing of the transcriptome. Genome Res (2010) 1.58

Architecture and evolution of a minute plant genome. Nature (2013) 1.57

The BTB and CNC homology 1 (BACH1) target genes are involved in the oxidative stress response and in control of the cell cycle. J Biol Chem (2011) 1.36

Gene-expression profiling identifies distinct subclasses of core binding factor acute myeloid leukemia. Blood (2007) 1.28

MEPD: a resource for medaka gene expression patterns. Bioinformatics (2005) 1.22

Maturation of mammalian H/ACA box snoRNAs: PAPD5-dependent adenylation and PARN-dependent trimming. RNA (2012) 1.22

Commonly altered genomic regions in acute myeloid leukemia are enriched for somatic mutations involved in chromatin remodeling and splicing. Blood (2012) 1.20

Haplotype divergence in Beta vulgaris and microsynteny with sequenced plant genomes. Plant J (2008) 1.20

The FunGenES database: a genomics resource for mouse embryonic stem cell differentiation. PLoS One (2009) 1.15

Integrated and sequence-ordered BAC- and YAC-based physical maps for the rat genome. Genome Res (2004) 1.11

Matrin 3 binds and stabilizes mRNA. PLoS One (2011) 1.11

The genomic sequence and comparative analysis of the rat major histocompatibility complex. Genome Res (2004) 1.11

ojoplano-mediated basal constriction is essential for optic cup morphogenesis. Development (2009) 1.07

Distinct organization of the candidate tumor suppressor gene RFP2 in human and mouse: multiple mRNA isoforms in both species- and human-specific antisense transcript RFP2OS. Gene (2003) 1.05

p53 Gene repair with zinc finger nucleases optimised by yeast 1-hybrid and validated by Solexa sequencing. PLoS One (2011) 1.04

Palaeohexaploid ancestry for Caryophyllales inferred from extensive gene-based physical and genetic mapping of the sugar beet genome (Beta vulgaris). Plant J (2012) 0.99

Comparative transcriptomics of early dipteran development. BMC Genomics (2013) 0.99

Microarray and deep sequencing cross-platform analysis of the mirRNome and isomiR variation in response to epidermal growth factor. BMC Genomics (2013) 0.97

LMX1B is essential for the maintenance of differentiated podocytes in adult kidneys. J Am Soc Nephrol (2013) 0.97

Role of medium- and short-chain L-3-hydroxyacyl-CoA dehydrogenase in the regulation of body weight and thermogenesis. Endocrinology (2011) 0.95

Neuronal functions, feeding behavior, and energy balance in Slc2a3+/- mice. Am J Physiol Endocrinol Metab (2008) 0.95

Expression of late cell cycle genes and an increased proliferative capacity characterize very early relapse of childhood acute lymphoblastic leukemia. Clin Cancer Res (2006) 0.93

Disruption and pseudoautosomal localization of the major histocompatibility complex in monotremes. Genome Biol (2007) 0.92

Conventional knockout of Tbc1d1 in mice impairs insulin- and AICAR-stimulated glucose uptake in skeletal muscle. Endocrinology (2013) 0.92

Lactate-modulated induction of THBS-1 activates transforming growth factor (TGF)-beta2 and migration of glioma cells in vitro. PLoS One (2013) 0.91

Mobilization and evolutionary history of miniature inverted-repeat transposable elements (MITEs) in Beta vulgaris L. Chromosome Res (2007) 0.88

Construction and characterization of a sugar beet (Beta vulgaris) fosmid library. Genome (2008) 0.88

Altered tissue distribution of 2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine-DNA adducts in mice transgenic for human sulfotransferases 1A1 and 1A2. Carcinogenesis (2011) 0.87

The DNA sequence of medaka chromosome LG22. Genomics (2006) 0.87

ReseqChip: automated integration of multiple local context probe data from the MitoChip array in mitochondrial DNA sequence assembly. BMC Bioinformatics (2009) 0.86

A proteomic method for the analysis of changes in protein concentrations in response to systemic perturbations using metabolic incorporation of stable isotopes and mass spectrometry. Proteomics (2005) 0.86

Epigenetic profiling of heterochromatic satellite DNA. Chromosoma (2011) 0.85

Comparative genomics of medaka and fugu. Comp Biochem Physiol Part D Genomics Proteomics (2006) 0.85

Proteomic shifts in embryonic stem cells with gene dose modifications suggest the presence of balancer proteins in protein regulatory networks. PLoS One (2007) 0.85

WDR55 is a nucleolar modulator of ribosomal RNA synthesis, cell cycle progression, and teleost organ development. PLoS Genet (2008) 0.85

Integrative normalization and comparative analysis for metabolic fingerprinting by comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry. Anal Chem (2009) 0.84

Neutrality, compensation, and negative selection during evolution of B-cell development transcriptomes. Mol Biol Evol (2007) 0.84

Physical mapping of the major histocompatibility complex class II and class III regions of the rat. Immunogenetics (2002) 0.83

Pronounced alterations of cellular metabolism and structure due to hyper- or hypo-osmosis. J Proteome Res (2008) 0.83

New members of the neurexin superfamily: multiple rodent homologues of the human CASPR5 gene. Mamm Genome (2006) 0.83

A first generation physical map of the medaka genome in BACs essential for positional cloning and clone-by-clone based genomic sequencing. Mech Dev (2004) 0.82

Differential expression patterns of non-symbiotic hemoglobins in sugar beet (Beta vulgaris ssp. vulgaris). Plant Cell Physiol (2014) 0.82

Multiple platform assessment of the EGF dependent transcriptome by microarray and deep tag sequencing analysis. BMC Genomics (2011) 0.82

Formation of hepatic DNA adducts by methyleugenol in mouse models: drastic decrease by Sult1a1 knockout and strong increase by transgenic human SULT1A1/2. Carcinogenesis (2013) 0.82

Controlled enzyme-catalyzed degradation of polymeric capsules templated on CaCO₃: influence of the number of LbL layers, conditions of degradation, and disassembly of multicompartments. J Control Release (2012) 0.80

Comparison of PCR-based mutation detection methods and application for identification of mouse Sult1a1 mutant embryonic stem cell clones using pooled templates. Hum Mutat (2005) 0.80

Mouse splice mutant generation from ENU-treated ES cells--a gene-driven approach. Genomics (2005) 0.80

Highly diverse chromoviruses of Beta vulgaris are classified by chromodomains and chromosomal integration. Mob DNA (2013) 0.79

A mouse translocation associated with Caspr5-2 disruption and perinatal lethality. Mamm Genome (2008) 0.79

Cloning of mouse ojoplano, a reticular cytoplasmic protein expressed during embryonic development. Gene Expr Patterns (2009) 0.78

Evolutionary reshuffling in the Errantivirus lineage Elbe within the Beta vulgaris genome. Plant J (2012) 0.78

Survey of sugar beet (Beta vulgaris L.) hAT transposons and MITE-like hATpin derivatives. Plant Mol Biol (2012) 0.78

Expression analysis of proline rich 15 (Prr15) in mouse and human gastrointestinal tumors. Mol Carcinog (2011) 0.77

High-throughput identification of genetic markers using representational oligonucleotide microarray analysis. Theor Appl Genet (2010) 0.77

Identification of mediator complex 26 (Crsp7) gametologs on platypus X1 and Y5 sex chromosomes: a candidate testis-determining gene in monotremes? Chromosome Res (2012) 0.77

The CHH motif in sugar beet satellite DNA: a modulator for cytosine methylation. Plant J (2014) 0.77

Fish genomes flying. Symposium on Medaka Genomics. EMBO Rep (2003) 0.77

Characterization of trimethylpsoralen as a mutagen for mouse embryonic stem cells. Mutat Res (2003) 0.77

Determination of sulfotransferase forms involved in the metabolic activation of the genotoxicant 1-hydroxymethylpyrene using bacterially expressed enzymes and genetically modified mouse models. Chem Res Toxicol (2014) 0.77

Induction and selection of Sox17-expressing endoderm cells generated from murine embryonic stem cells. Cells Tissues Organs (2011) 0.77

Profiling of extensively diversified plant LINEs reveals distinct plant-specific subclades. Plant J (2014) 0.77

Higher-order genome organization in platypus and chicken sperm and repositioning of sex chromosomes during mammalian evolution. Chromosoma (2008) 0.76

Cytosine methylation of an ancient satellite family in the wild beet Beta procumbens. Cytogenet Genome Res (2014) 0.76

Current status of medaka genetics and genomics. The Medaka Genome Initiative (MGI). Methods Cell Biol (2004) 0.75

The effect of high intensity ultrasound on the loading of Au nanoparticles into titanium dioxide. Ultrason Sonochem (2010) 0.75

Global transcriptomic analysis of murine embryonic stem cell-derived brachyury(+) (T) cells. Genes Cells (2010) 0.75