Fast and accurate short read alignment with Burrows-Wheeler transform.

PubWeight™: 190.94‹?› | Rank: Top 0.01% | All-Time Top 100

🔗 View Article (PMC 2705234)

Published in Bioinformatics on May 18, 2009

Authors

Heng Li1, Richard Durbin

Author Affiliations

1: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK.

Associated clinical trials:

Mitochondrial DNA and Nuclear SNPs to Predict Severity of COVID-19 Infection (mtDNA-COVID) | NCT04750330

In Situ Clonal Heterogeneity in Prostatic Diagnostic Biopsies | NCT04873427

Articles citing this

(truncated to the top 100)

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res (2010) 97.51

Fast gapped-read alignment with Bowtie 2. Nat Methods (2012) 83.79

A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet (2011) 59.36

Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics (2010) 52.01

Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics (2014) 44.23

A comprehensive catalogue of somatic mutations from a human cancer genome. Nature (2009) 24.27

A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD. Neuron (2011) 18.73

A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform (2010) 18.05

Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics (2010) 17.53

Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol (2013) 16.13

Clinical whole-exome sequencing for the diagnosis of mendelian disorders. N Engl J Med (2013) 15.85

A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One (2011) 15.19

Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci U S A (2009) 15.09

Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature (2012) 14.76

De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature (2012) 13.61

Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature (2011) 13.18

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res (2009) 12.09

Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat Genet (2011) 11.94

Rate of de novo mutations and the importance of father's age to disease risk. Nature (2012) 11.92

TREM2 variants in Alzheimer's disease. N Engl J Med (2012) 11.35

Mutational processes molding the genomes of 21 breast cancers. Cell (2012) 11.22

Driver mutations in histone H3.3 and chromatin remodelling genes in paediatric glioblastoma. Nature (2012) 10.99

Toward almost closed genomes with GapFiller. Genome Biol (2012) 10.92

Integrative analysis of 111 reference human epigenomes. Nature (2015) 10.32

Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet (2010) 10.15

A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics (2012) 9.97

MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res (2010) 9.80

Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature (2011) 9.71

De novo gene disruptions in children on the autistic spectrum. Neuron (2012) 9.69

De novo assembly and analysis of RNA-seq data. Nat Methods (2010) 9.69

Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res (2010) 9.55

Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet (2010) 9.53

Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet (2011) 9.26

Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol (2011) 9.18

Accurate and comprehensive sequencing of personal genomes. Genome Res (2011) 8.99

Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature (2010) 8.99

An integrated pipeline for de novo assembly of microbial genomes. PLoS One (2012) 8.97

Target-enrichment strategies for next-generation sequencing. Nat Methods (2010) 8.78

Dindel: accurate indel calls from short-read data. Genome Res (2010) 8.62

Circular RNAs are a large class of animal RNAs with regulatory potency. Nature (2013) 8.54

Integrative analysis of the melanoma transcriptome. Genome Res (2010) 8.46

Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet (2011) 8.34

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics (2011) 8.19

Inference of human population history from individual whole-genome sequences. Nature (2011) 8.05

AdapterRemoval: easy cleaning of next-generation sequencing reads. BMC Res Notes (2012) 7.94

A high-coverage genome sequence from an archaic Denisovan individual. Science (2012) 7.89

Melanoma genome sequencing reveals frequent PREX2 mutations. Nature (2012) 7.77

Cancer exome analysis reveals a T-cell-dependent mechanism of cancer immunoediting. Nature (2012) 7.76

BFAST: an alignment tool for large scale genome resequencing. PLoS One (2009) 7.48

Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA. Nature (2013) 7.42

Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol (2012) 7.37

BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC Genomics (2011) 7.27

A physical, genetic and functional sequence assembly of the barley genome. Nature (2012) 7.25

Punctuated evolution of prostate cancer genomes. Cell (2013) 7.23

Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma. Nat Genet (2012) 7.00

Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet (2011) 6.59

Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nat Genet (2011) 6.43

Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput Biol (2012) 6.27

Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations. Nature (2010) 6.26

Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM). Bioinformatics (2011) 6.20

Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics (2012) 6.16

Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol (2010) 6.07

Efficient de novo assembly of large genomes using compressed data structures. Genome Res (2011) 6.05

A polygenic burden of rare disruptive mutations in schizophrenia. Nature (2014) 5.99

Cpf1 Is a Single RNA-Guided Endonuclease of a Class 2 CRISPR-Cas System. Cell (2015) 5.97

The contribution of de novo coding mutations to autism spectrum disorder. Nature (2014) 5.94

Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature (2012) 5.84

miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res (2011) 5.83

Performance comparison of whole-genome sequencing platforms. Nat Biotechnol (2011) 5.79

From RNA-seq reads to differential expression results. Genome Biol (2010) 5.77

Frequent mutations of chromatin remodeling genes in transitional cell carcinoma of the bladder. Nat Genet (2011) 5.73

Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics (2009) 5.62

Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res (2011) 5.60

CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat Methods (2011) 5.34

Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat Biotechnol (2010) 5.32

High-throughput detection of actionable genomic alterations in clinical tumor samples by targeted, massively parallel sequencing. Cancer Discov (2011) 5.30

The oyster genome reveals stress adaptation and complexity of shell formation. Nature (2012) 5.30

Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer. Nat Genet (2012) 5.28

Medulloblastoma exome sequencing uncovers subtype-specific somatic mutations. Nature (2012) 5.23

Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature (2014) 5.22

Recurrent R-spondin fusions in colon cancer. Nature (2012) 5.10

An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat Med (2014) 4.98

Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol (2011) 4.98

A public genome-scale lentiviral expression library of human ORFs. Nat Methods (2011) 4.98

Resistance mechanisms for the Bruton's tyrosine kinase inhibitor ibrutinib. N Engl J Med (2014) 4.98

Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res (2012) 4.94

Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One (2014) 4.94

Whole-genome sequencing of liver cancers identifies etiological influences on mutation patterns and recurrent mutations in chromatin regulators. Nat Genet (2012) 4.84

Exome sequencing of gastric adenocarcinoma identifies recurrent somatic mutations in cell adhesion and chromatin remodeling genes. Nat Genet (2012) 4.82

Modularly assembled designer TAL effector nucleases for targeted gene knockout and gene replacement in eukaryotes. Nucleic Acids Res (2011) 4.75

Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat Biotechnol (2013) 4.72

The genetic landscape of high-risk neuroblastoma. Nat Genet (2013) 4.71

Population genetics of Vibrio cholerae from Nepal in 2010: evidence on the origin of the Haitian outbreak. MBio (2011) 4.70

Comprehensive genomic analysis identifies SOX2 as a frequently amplified gene in small-cell lung cancer. Nat Genet (2012) 4.70

Exome sequencing supports a de novo mutational paradigm for schizophrenia. Nat Genet (2011) 4.67

Integrated molecular analysis of clear-cell renal cell carcinoma. Nat Genet (2013) 4.61

Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS One (2012) 4.52

De novo germline and postzygotic mutations in AKT3, PIK3R2 and PIK3CA cause a spectrum of related megalencephaly syndromes. Nat Genet (2012) 4.51

The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res (2013) 4.45

Sense from sequence reads: methods for alignment and assembly. Nat Methods (2009) 4.44

Articles by these authors

The Sequence Alignment/Map format and SAMtools. Bioinformatics (2009) 232.39

Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res (2008) 157.44

Accurate whole human genome sequencing using reversible terminator chemistry. Nature (2008) 90.20

The Pfam protein families database. Nucleic Acids Res (2004) 56.46

Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics (2010) 52.01

The Pfam protein families database. Nucleic Acids Res (2002) 51.34

The diploid genome sequence of an Asian individual. Nature (2008) 46.29

Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet (2008) 43.63

Pfam: clans, web tools and services. Nucleic Acids Res (2006) 34.83

Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature (2003) 26.58

The variant call format and VCFtools. Bioinformatics (2011) 25.88

The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res (2003) 24.72

Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature (2004) 24.40

A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol (2008) 21.72

The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol (2005) 18.20

GeneWise and Genomewise. Genome Res (2004) 17.87

InterPro, progress and status in 2005. Nucleic Acids Res (2005) 17.53

A large genome center's improvements to the Illumina sequencing system. Nat Methods (2008) 15.56

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res (2009) 14.90

Ensembl 2011. Nucleic Acids Res (2010) 14.68

Ensembl 2012. Nucleic Acids Res (2011) 14.55

The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol (2003) 13.32

EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res (2008) 12.72

Prepublication data sharing. Nature (2009) 12.24

Population genomics of domestic and wild yeasts. Nature (2009) 11.79

Ensembl's 10th year. Nucleic Acids Res (2009) 10.82

Mouse genomic variation and its effect on phenotypes and gene regulation. Nature (2011) 10.66

An overview of Ensembl. Genome Res (2004) 10.35

Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes. Nature (2010) 9.95

QuickTree: building huge Neighbour-Joining trees of protein sequences. Bioinformatics (2002) 9.36

TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res (2006) 8.83

Dindel: accurate indel calls from short-read data. Genome Res (2010) 8.62

Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res (2011) 8.38

Inference of human population history from individual whole-genome sequences. Nature (2011) 8.05

The DNA sequence of the human X chromosome. Nature (2005) 6.97

Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat Genet (2012) 6.97

WormBase: better software, richer content. Nucleic Acids Res (2006) 6.78

TreeFam: 2008 Update. Nucleic Acids Res (2007) 6.63

BAC TransgeneOmics: a high-throughput method for exploration of protein function in mammals. Nat Methods (2008) 6.51

Efficient de novo assembly of large genomes using compressed data structures. Genome Res (2011) 6.05

Enhanced protein domain discovery by using language modeling techniques from speech recognition. Proc Natl Acad Sci U S A (2003) 6.01

The architecture of gene regulatory variation across multiple human tissues: the MuTHER study. PLoS Genet (2011) 5.95

Patterns of cis regulatory variation in diverse human populations. PLoS Genet (2012) 5.28

WormBase: a cross-species database for comparative genomics. Nucleic Acids Res (2003) 5.28

WormBase: a comprehensive resource for nematode research. Nucleic Acids Res (2009) 5.20

Systematic analysis of human protein complexes identifies chromosome segregation proteins. Science (2010) 4.69

WormBase: a multi-species resource for nematode biology and genomics. Nucleic Acids Res (2004) 4.14

Efficient construction of an assembly string graph using the FM-index. Bioinformatics (2010) 4.13

Insights into hominid evolution from the gorilla genome sequence. Nature (2012) 4.12

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience (2013) 4.11

Mapping trait loci by use of inferred ancestral recombination graphs. Am J Hum Genet (2006) 4.01

WormBase: new content and better access. Nucleic Acids Res (2006) 4.01

The UK10K project identifies rare variants in health and disease. Nature (2015) 3.89

WormBase 2012: more genomes, more data, new website. Nucleic Acids Res (2011) 3.87

WormBase: a comprehensive data resource for Caenorhabditis biology and genomics. Nucleic Acids Res (2005) 3.82

WormBase 2007. Nucleic Acids Res (2007) 3.69

Revising the human mutation rate: implications for understanding human evolution. Nat Rev Genet (2012) 3.56

A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput Biol (2010) 3.23

Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics (2002) 2.99

SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res (2010) 2.97

InterPro: an integrated documentation resource for protein families, domains and functional sites. Brief Bioinform (2002) 2.66

Trait variation in yeast is defined by population history. PLoS Genet (2011) 2.50

Revealing the genetic structure of a trait by sequencing a population under selection. Genome Res (2011) 2.42

GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. Genome Res (2002) 2.20

Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat Protoc (2012) 2.14

A high-definition view of functional genetic variation from natural yeast genomes. Mol Biol Evol (2014) 2.10

Enhanced protein domain discovery using taxonomy. BMC Bioinformatics (2004) 1.81

High-resolution mapping of complex traits with a four-parent advanced intercross yeast population. Genetics (2013) 1.72

High levels of RNA-editing site conservation amongst 15 laboratory mouse strains. Genome Biol (2012) 1.68

The anatomy of successful computational biology software. Nat Biotechnol (2013) 1.68

Clustering of phosphorylation site recognition motifs can be exploited to predict the targets of cyclin-dependent kinase. Genome Biol (2007) 1.63

A probabilistic model of 3' end formation in Caenorhabditis elegans. Nucleic Acids Res (2004) 1.60

Copy number variant detection in inbred strains from short read sequence data. Bioinformatics (2009) 1.45

Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res (2004) 1.43

Deep short-read sequencing of chromosome 17 from the mouse strains A/J and CAST/Ei identifies significant germline variation and candidate genes that regulate liver triglyceride levels. Genome Biol (2009) 1.43

Gene expression changes with age in skin, adipose tissue, blood and brain. Genome Biol (2013) 1.34

Regulatory evolution in proteins by turnover and lineage-specific changes of cyclin-dependent kinase consensus sites. Proc Natl Acad Sci U S A (2007) 1.33

A systematic comparative and structural analysis of protein phosphorylation sites based on the mtcPTM database. Genome Biol (2007) 1.29

Extent, causes, and consequences of small RNA expression variation in human adipose tissue. PLoS Genet (2012) 1.28

Joint genetic analysis of gene expression data with inferred cellular phenotypes. PLoS Genet (2011) 1.26

Extending reference assembly models. Genome Biol (2015) 1.06

Identity-by-descent-based phasing and imputation in founder populations using graphical models. Genet Epidemiol (2011) 1.05

Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure. Bioinformatics (2007) 0.99

Vertebrate gene finding from multiple-species alignments using a two-level strategy. Genome Biol (2006) 0.99

WormBase: Annotating many nematode genomes. Worm (2012) 0.91

A genome-wide survey of genetic variation in gorillas using reduced representation sequencing. PLoS One (2013) 0.78

A conserved sequence motif in 3' untranslated regions of ribosomal protein mRNAs in nematodes. RNA (2006) 0.78

Erratum: Whole-genome sequence-based analysis of thyroid function. Nat Commun (2015) 0.77

A table-driven, full-sensitivity similarity search algorithm. J Comput Biol (2003) 0.77

Inferring selection on amino acid preference in protein domains. Mol Biol Evol (2008) 0.77

[X]uniqMAP: unique gene sequence regions in the human and mouse genomes. BMC Genomics (2006) 0.75

Correction: Quantitative genetics of CTCF binding reveal local sequence effects and different modes of X-chromosome association. PLoS Genet (2015) 0.75

Corrigendum: Common genetic variation drives molecular heterogeneity in human iPSCs. Nature (2017) 0.75

Erratum: A rare variant in APOC3 is associated with plasma triglyceride and VLDL levels in Europeans. Nat Commun (2015) 0.75