Mapping short DNA sequencing reads and calling variants using mapping quality scores.

PubWeight™: 157.44‹?› | Rank: Top 0.01% | All-Time Top 100

🔗 View Article (PMC 2577856)

Published in Genome Res on August 19, 2008

Authors

Heng Li1, Jue Ruan, Richard Durbin

Author Affiliations

1: The Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom.

Articles citing this

(truncated to the top 100)

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol (2009) 235.12

The Sequence Alignment/Map format and SAMtools. Bioinformatics (2009) 232.39

Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (2009) 190.94

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res (2010) 97.51

Accurate whole human genome sequencing using reversible terminator chemistry. Nature (2008) 90.20

TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (2009) 81.13

A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet (2011) 59.36

RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet (2009) 58.77

Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics (2010) 52.01

ABySS: a parallel assembler for short read sequence data. Genome Res (2009) 43.20

Sequencing technologies - the next generation. Nat Rev Genet (2009) 40.57

DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature (2008) 38.13

Targeted capture and massively parallel sequencing of 12 human exomes. Nature (2009) 33.96

Recurring mutations found by sequencing an acute myeloid leukemia genome. N Engl J Med (2009) 33.09

Exome sequencing identifies the cause of a mendelian disorder. Nat Genet (2009) 32.06

MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol (2010) 26.41

A comprehensive catalogue of somatic mutations from a human cancer genome. Nature (2009) 24.27

CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics (2009) 20.45

Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature (2010) 19.68

Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet (2011) 18.88

BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods (2009) 18.41

Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature (2009) 18.08

A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform (2010) 18.05

Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics (2010) 17.53

Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature (2010) 16.86

Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell (2011) 16.72

VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics (2009) 16.04

SNP detection for massively parallel whole-genome resequencing. Genome Res (2009) 15.96

Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci U S A (2009) 15.09

Transcriptome genetics using second generation sequencing in a Caucasian population. Nature (2010) 14.85

Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature (2009) 13.45

ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet (2009) 13.12

Quake: quality-aware detection and correction of sequencing errors. Genome Biol (2010) 12.52

The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature (2010) 12.43

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res (2009) 12.09

Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet (2009) 11.73

Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature (2009) 11.36

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat Biotechnol (2009) 11.28

Somatic mutations altering EZH2 (Tyr641) in follicular and diffuse large B-cell lymphomas of germinal-center origin. Nat Genet (2010) 10.97

Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat Methods (2009) 10.41

MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res (2010) 9.80

Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol (2009) 9.59

Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res (2010) 9.55

Complete Khoisan and Bantu genomes from southern Africa. Nature (2010) 9.06

Target-enrichment strategies for next-generation sequencing. Nat Methods (2010) 8.78

ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data. Bioinformatics (2009) 8.69

Dindel: accurate indel calls from short-read data. Genome Res (2010) 8.62

Assembly algorithms for next-generation sequencing data. Genomics (2010) 8.56

Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res (2008) 8.44

Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods (2012) 8.37

Single-molecule sequencing of an individual human genome. Nat Biotechnol (2009) 8.35

Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet (2011) 8.34

Reference Maps of human ES and iPS cell variation enable high-throughput characterization of pluripotent cell lines. Cell (2011) 8.21

Software for computing and annotating genomic ranges. PLoS Comput Biol (2013) 8.20

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics (2011) 8.19

The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res (2009) 7.87

Annotating genomes with massive-scale RNA sequencing. Genome Biol (2008) 7.73

BFAST: an alignment tool for large scale genome resequencing. PLoS One (2009) 7.48

Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res (2009) 7.42

CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res (2011) 6.97

Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat Genet (2012) 6.97

Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res (2011) 6.88

A map of open chromatin in human pancreatic islets. Nat Genet (2010) 6.75

Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat Genet (2011) 6.67

How to map billions of short reads onto genomes. Nat Biotechnol (2009) 6.59

RazerS--fast read mapping with sensitivity control. Genome Res (2009) 6.53

Savant: genome browser for high-throughput sequencing data. Bioinformatics (2010) 6.51

DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol (2011) 6.48

Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res (2009) 6.42

Massively parallel exon capture and library-free resequencing across 16 genomes. Nat Methods (2009) 6.36

Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput Biol (2012) 6.27

Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations. Nature (2010) 6.26

Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics (2012) 6.16

Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc Natl Acad Sci U S A (2009) 6.01

Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat Genet (2011) 5.97

Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol (2010) 5.96

The transcriptional and epigenomic foundations of ground state pluripotency. Cell (2012) 5.87

Charting a dynamic DNA methylation landscape of the human genome. Nature (2013) 5.80

From RNA-seq reads to differential expression results. Genome Biol (2010) 5.77

High-throughput genotyping by whole-genome resequencing. Genome Res (2009) 5.74

Frequent mutations of chromatin remodeling genes in transitional cell carcinoma of the bladder. Nat Genet (2011) 5.73

Human Y chromosome base-substitution mutation rate measured by direct sequencing in a deep-rooting pedigree. Curr Biol (2009) 5.70

BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics (2009) 5.67

Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics (2009) 5.62

SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics (2011) 5.62

De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet (2012) 5.61

Detection of inherited mutations for breast and ovarian cancer using genomic capture and massively parallel sequencing. Proc Natl Acad Sci U S A (2010) 5.60

Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat Genet (2011) 5.58

Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet (2011) 5.58

CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat Methods (2011) 5.34

Low-coverage sequencing: implications for design of complex trait association studies. Genome Res (2011) 5.34

Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer. Nat Genet (2012) 5.28

Design of a high density SNP genotyping assay in the pig using SNPs identified and characterized by next generation sequencing technology. PLoS One (2009) 5.26

Quantitative phenotyping via deep barcode sequencing. Genome Res (2009) 5.13

A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries. Genome Biol (2011) 5.11

Whole exome sequencing and homozygosity mapping identify mutation in the cell polarity protein GPSM2 as the cause of nonsyndromic hearing loss DFNB82. Am J Hum Genet (2010) 4.95

Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved. Nat Genet (2010) 4.94

Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res (2012) 4.94

Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nat Methods (2009) 4.91

Unlocking short read sequencing for metagenomics. PLoS One (2010) 4.88

Articles cited by this

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res (2008) 151.16

Genome sequencing in microfabricated high-density picolitre reactors. Nature (2005) 150.21

Identification of common molecular subsequences. J Mol Biol (1981) 130.53

BLAT--the BLAST-like alignment tool. Genome Res (2002) 126.78

Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res (1998) 106.16

Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res (1998) 96.63

High-resolution profiling of histone methylations in the human genome. Cell (2007) 85.74

Genome-wide mapping of in vivo protein-DNA interactions. Science (2007) 64.92

CAP3: A DNA sequence assembly program. Genome Res (1999) 50.04

SSAHA: a fast search method for large DNA databases. Genome Res (2001) 48.64

Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods (2007) 45.04

Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet (2008) 43.63

PatternHunter: faster and more sensitive homology search. Bioinformatics (2002) 35.65

Human-mouse alignments with BLASTZ. Genome Res (2003) 35.49

Whole-genome re-sequencing. Curr Opin Genet Dev (2006) 35.24

Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics (2002) 34.50

Whole-genome sequencing and variant discovery in C. elegans. Nat Methods (2008) 31.92

A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol (2008) 21.72

GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics (2005) 21.04

An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature (2000) 19.19

A general approach to single-nucleotide polymorphism discovery. Nat Genet (1999) 13.39

SNPdetector: a software tool for sensitive and accurate SNP detection. PLoS Comput Biol (2005) 10.04

Automating sequence-based detection and genotyping of SNPs from diploid samples. Nat Genet (2006) 9.04

Comparison of genome degradation in Paratyphi A and Typhi, human-restricted serovars of Salmonella enterica that cause typhoid. Nat Genet (2004) 8.60

novoSNP, a novel computational tool for sequence variation discovery. Genome Res (2005) 7.42

Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics (2001) 7.12

Multidrug-resistant Salmonella enterica serovar paratyphi A harbors IncHI1 plasmids similar to those found in serovar typhi. J Bacteriol (2007) 5.88

Base qualities help sequencing software. Genome Res (1998) 5.70

Articles by these authors

The Sequence Alignment/Map format and SAMtools. Bioinformatics (2009) 232.39

Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (2009) 190.94

Accurate whole human genome sequencing using reversible terminator chemistry. Nature (2008) 90.20

The Pfam protein families database. Nucleic Acids Res (2004) 56.46

Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics (2010) 52.01

The Pfam protein families database. Nucleic Acids Res (2002) 51.34

The diploid genome sequence of an Asian individual. Nature (2008) 46.29

De novo assembly of human genomes with massively parallel short read sequencing. Genome Res (2009) 45.91

Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet (2008) 43.63

Pfam: clans, web tools and services. Nucleic Acids Res (2006) 34.83

Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature (2003) 26.58

The variant call format and VCFtools. Bioinformatics (2011) 25.88

The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res (2003) 24.72

Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature (2004) 24.40

A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol (2008) 21.72

The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol (2005) 18.20

GeneWise and Genomewise. Genome Res (2004) 17.87

InterPro, progress and status in 2005. Nucleic Acids Res (2005) 17.53

The sequence and de novo assembly of the giant panda genome. Nature (2009) 15.76

A large genome center's improvements to the Illumina sequencing system. Nat Methods (2008) 15.56

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res (2009) 14.90

Ensembl 2011. Nucleic Acids Res (2010) 14.68

Ensembl 2012. Nucleic Acids Res (2011) 14.55

The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol (2003) 13.32

EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res (2008) 12.72

Prepublication data sharing. Nature (2009) 12.24

Population genomics of domestic and wild yeasts. Nature (2009) 11.79

Ensembl's 10th year. Nucleic Acids Res (2009) 10.82

Mouse genomic variation and its effect on phenotypes and gene regulation. Nature (2011) 10.66

An overview of Ensembl. Genome Res (2004) 10.35

Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes. Nature (2010) 9.95

QuickTree: building huge Neighbour-Joining trees of protein sequences. Bioinformatics (2002) 9.36

TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res (2006) 8.83

Dindel: accurate indel calls from short-read data. Genome Res (2010) 8.62

Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res (2011) 8.38

The genome of the cucumber, Cucumis sativus L. Nat Genet (2009) 8.19

Inference of human population history from individual whole-genome sequences. Nature (2011) 8.05

The DNA sequence of the human X chromosome. Nature (2005) 6.97

Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat Genet (2012) 6.97

WormBase: better software, richer content. Nucleic Acids Res (2006) 6.78

TreeFam: 2008 Update. Nucleic Acids Res (2007) 6.63

BAC TransgeneOmics: a high-throughput method for exploration of protein function in mammals. Nat Methods (2008) 6.51

Efficient de novo assembly of large genomes using compressed data structures. Genome Res (2011) 6.05

Enhanced protein domain discovery by using language modeling techniques from speech recognition. Proc Natl Acad Sci U S A (2003) 6.01

The architecture of gene regulatory variation across multiple human tissues: the MuTHER study. PLoS Genet (2011) 5.95

Patterns of cis regulatory variation in diverse human populations. PLoS Genet (2012) 5.28

WormBase: a cross-species database for comparative genomics. Nucleic Acids Res (2003) 5.28

A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms. Nature (2004) 5.24

WormBase: a comprehensive resource for nematode research. Nucleic Acids Res (2009) 5.20

Systematic analysis of human protein complexes identifies chromosome segregation proteins. Science (2010) 4.69

WormBase: a multi-species resource for nematode biology and genomics. Nucleic Acids Res (2004) 4.14

Efficient construction of an assembly string graph using the FM-index. Bioinformatics (2010) 4.13

Insights into hominid evolution from the gorilla genome sequence. Nature (2012) 4.12

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience (2013) 4.11

Mapping trait loci by use of inferred ancestral recombination graphs. Am J Hum Genet (2006) 4.01

WormBase: new content and better access. Nucleic Acids Res (2006) 4.01

The UK10K project identifies rare variants in health and disease. Nature (2015) 3.89

WormBase 2012: more genomes, more data, new website. Nucleic Acids Res (2011) 3.87

WormBase: a comprehensive data resource for Caenorhabditis biology and genomics. Nucleic Acids Res (2005) 3.82

Rapid growth of a hepatocellular carcinoma and the driving mutations revealed by cell-population genetic analysis of whole-genome data. Proc Natl Acad Sci U S A (2011) 3.80

WormBase 2007. Nucleic Acids Res (2007) 3.69

Revising the human mutation rate: implications for understanding human evolution. Nat Rev Genet (2012) 3.56

A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput Biol (2010) 3.23

Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics (2002) 2.99

SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res (2010) 2.97

InterPro: an integrated documentation resource for protein families, domains and functional sites. Brief Bioinform (2002) 2.66

Trait variation in yeast is defined by population history. PLoS Genet (2011) 2.50

SilkDB: a knowledgebase for silkworm biology and genomics. Nucleic Acids Res (2005) 2.50

Revealing the genetic structure of a trait by sequencing a population under selection. Genome Res (2011) 2.42

GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. Genome Res (2002) 2.20

Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat Protoc (2012) 2.14

A high-definition view of functional genetic variation from natural yeast genomes. Mol Biol Evol (2014) 2.10

Enhanced protein domain discovery using taxonomy. BMC Bioinformatics (2004) 1.81

High-resolution mapping of complex traits with a four-parent advanced intercross yeast population. Genetics (2013) 1.72

High levels of RNA-editing site conservation amongst 15 laboratory mouse strains. Genome Biol (2012) 1.68

The anatomy of successful computational biology software. Nat Biotechnol (2013) 1.68

Clustering of phosphorylation site recognition motifs can be exploited to predict the targets of cyclin-dependent kinase. Genome Biol (2007) 1.63

A probabilistic model of 3' end formation in Caenorhabditis elegans. Nucleic Acids Res (2004) 1.60

ChickVD: a sequence variation database for the chicken genome. Nucleic Acids Res (2005) 1.45

Copy number variant detection in inbred strains from short read sequence data. Bioinformatics (2009) 1.45

Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res (2004) 1.43

Deep short-read sequencing of chromosome 17 from the mouse strains A/J and CAST/Ei identifies significant germline variation and candidate genes that regulate liver triglyceride levels. Genome Biol (2009) 1.43

Gene expression changes with age in skin, adipose tissue, blood and brain. Genome Biol (2013) 1.34

Regulatory evolution in proteins by turnover and lineage-specific changes of cyclin-dependent kinase consensus sites. Proc Natl Acad Sci U S A (2007) 1.33

A systematic comparative and structural analysis of protein phosphorylation sites based on the mtcPTM database. Genome Biol (2007) 1.29

Extent, causes, and consequences of small RNA expression variation in human adipose tissue. PLoS Genet (2012) 1.28

Joint genetic analysis of gene expression data with inferred cellular phenotypes. PLoS Genet (2011) 1.26

Extending reference assembly models. Genome Biol (2015) 1.06

Identity-by-descent-based phasing and imputation in founder populations using graphical models. Genet Epidemiol (2011) 1.05

Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure. Bioinformatics (2007) 0.99

Vertebrate gene finding from multiple-species alignments using a two-level strategy. Genome Biol (2006) 0.99

Deciphering neo-sex and B chromosome evolution by the draft genome of Drosophila albomicans. BMC Genomics (2012) 0.98

Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads. Bioinformatics (2012) 0.96

WormBase: Annotating many nematode genomes. Worm (2012) 0.91

Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology. BMC Genomics (2013) 0.79

A conserved sequence motif in 3' untranslated regions of ribosomal protein mRNAs in nematodes. RNA (2006) 0.78

Reference-free comparative genomics of 174 chloroplasts. PLoS One (2012) 0.78