The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

PubWeight™: 97.51‹?› | Rank: Top 0.01% | All-Time Top 1000

🔗 View Article (PMC 2928508)

Published in Genome Res on July 19, 2010

Authors

Aaron McKenna1, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran Garimella, David Altshuler, Stacey Gabriel, Mark Daly, Mark A DePristo

Author Affiliations

1: Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.

Articles citing this

(truncated to the top 100)

Fast gapped-read alignment with Bowtie 2. Nat Methods (2012) 83.79

A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet (2011) 59.36

The variant call format and VCFtools. Bioinformatics (2011) 25.88

A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) (2012) 20.08

A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD. Neuron (2011) 18.73

Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature (2012) 13.71

Rate of de novo mutations and the importance of father's age to disease risk. Nature (2012) 11.92

TREM2 variants in Alzheimer's disease. N Engl J Med (2012) 11.35

Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet (2010) 10.15

De novo gene disruptions in children on the autistic spectrum. Neuron (2012) 9.69

Genetic Analysis Workshop 17 mini-exome simulation. BMC Proc (2011) 9.16

From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics (2013) 8.79

A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell (2014) 8.79

Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet (2011) 8.34

Inference of human population history from individual whole-genome sequences. Nature (2011) 8.05

A high-coverage genome sequence from an archaic Denisovan individual. Science (2012) 7.89

CRISPR/Cas9-mediated gene editing in human tripronuclear zygotes. Protein Cell (2015) 7.80

Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol (2012) 7.37

Performance comparison of exome DNA sequencing technologies. Nat Biotechnol (2011) 7.11

Genomic landscape of non-small cell lung cancer in smokers and never-smokers. Cell (2012) 6.09

Truncations of titin causing dilated cardiomyopathy. N Engl J Med (2012) 6.07

The contribution of de novo coding mutations to autism spectrum disorder. Nature (2014) 5.94

Performance comparison of whole-genome sequencing platforms. Nat Biotechnol (2011) 5.79

Frequent mutations of chromatin remodeling genes in transitional cell carcinoma of the bladder. Nat Genet (2011) 5.73

De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet (2012) 5.61

Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet (2013) 5.58

Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet (2012) 5.48

Low-coverage sequencing: implications for design of complex trait association studies. Genome Res (2011) 5.34

Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat Biotechnol (2010) 5.32

High-throughput detection of actionable genomic alterations in clinical tumor samples by targeted, massively parallel sequencing. Cancer Discov (2011) 5.30

Rapid whole-genome sequencing for genetic disease diagnosis in neonatal intensive care units. Sci Transl Med (2012) 5.16

Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience (2015) 5.06

Resistance mechanisms for the Bruton's tyrosine kinase inhibitor ibrutinib. N Engl J Med (2014) 4.98

Melanoma whole-exome sequencing identifies (V600E)B-RAF amplification-mediated acquired B-RAF inhibitor resistance. Nat Commun (2012) 4.94

Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One (2014) 4.94

Exome sequencing of gastric adenocarcinoma identifies recurrent somatic mutations in cell adhesion and chromatin remodeling genes. Nat Genet (2012) 4.82

Population genetics of Vibrio cholerae from Nepal in 2010: evidence on the origin of the Haitian outbreak. MBio (2011) 4.70

Haplotype phasing: existing methods and new developments. Nat Rev Genet (2011) 4.66

Mutational analysis reveals the origin and therapy-driven evolution of recurrent glioma. Science (2013) 4.42

Characterizing and measuring bias in sequence data. Genome Biol (2013) 4.39

Acquired resistance and clonal evolution in melanoma during BRAF inhibitor therapy. Cancer Discov (2013) 4.39

Exome sequencing and the genetic basis of complex traits. Nat Genet (2012) 4.11

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol (2014) 4.07

Clonal evolution of preleukemic hematopoietic stem cells precedes human acute myeloid leukemia. Sci Transl Med (2012) 4.05

Molecular diagnosis of infantile mitochondrial disease with targeted next-generation sequencing. Sci Transl Med (2012) 4.02

Recurrent mutations in the U2AF1 splicing factor in myelodysplastic syndromes. Nat Genet (2011) 4.02

Whole genomes redefine the mutational landscape of pancreatic cancer. Nature (2015) 4.01

A framework for the interpretation of de novo mutation in human disease. Nat Genet (2014) 4.00

Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med (2013) 3.90

Inactivating Variants in ANGPTL4 and Risk of Coronary Artery Disease. N Engl J Med (2016) 3.88

Whole-genome and whole-exome sequencing of bladder cancer identifies frequent alterations in genes involved in sister chromatid cohesion and segregation. Nat Genet (2013) 3.87

The genomic and transcriptomic landscape of a HeLa cell line. G3 (Bethesda) (2013) 3.82

Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature (2014) 3.80

Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nat Med (2014) 3.75

Mutations in GNAL cause primary torsion dystonia. Nat Genet (2012) 3.66

Gain-of-function human STAT1 mutations impair IL-17 immunity and underlie chronic mucocutaneous candidiasis. J Exp Med (2011) 3.63

BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics (2011) 3.59

A rare penetrant mutation in CFH confers high risk of age-related macular degeneration. Nat Genet (2011) 3.55

Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways. Science (2015) 3.46

Using VAAST to identify an X-linked disorder resulting in lethality in male infants due to N-terminal acetyltransferase deficiency. Am J Hum Genet (2011) 3.43

Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011. Proc Natl Acad Sci U S A (2012) 3.42

Effects of the absence of apolipoprotein e on lipoproteins, neurocognitive function, and retinal function. JAMA Neurol (2014) 3.39

The next-generation sequencing revolution and its impact on genomics. Cell (2013) 3.35

Stacks: an analysis tool set for population genomics. Mol Ecol (2013) 3.29

Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat Rev Genet (2012) 3.29

Effectiveness of exome and genome sequencing guided by acuity of illness for diagnosis of neurodevelopmental disorders. Sci Transl Med (2014) 3.28

Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature (2015) 3.25

JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics (2012) 3.21

Phased whole-genome genetic risk in a family quartet using a major allele reference sequence. PLoS Genet (2011) 3.20

Genomic sequencing of meningiomas identifies oncogenic SMO and AKT1 mutations. Nat Genet (2013) 3.19

Exome sequencing of hepatitis B virus-associated hepatocellular carcinoma. Nat Genet (2012) 3.14

Organoid cultures derived from patients with advanced prostate cancer. Cell (2014) 3.12

Identification of genomic alterations in oesophageal squamous cell cancer. Nature (2014) 3.10

Whole-genome characterization of chemoresistant ovarian cancer. Nature (2015) 3.06

Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature (2013) 3.06

The genetic landscape of mutations in Burkitt lymphoma. Nat Genet (2012) 3.03

Bayesian inference of ancient human demography from individual genome sequences. Nat Genet (2011) 3.03

Interpreting noncoding genetic variation in complex traits and human disease. Nat Biotechnol (2012) 3.01

Regional Isolation Drives Bacterial Diversification within Cystic Fibrosis Lungs. Cell Host Microbe (2015) 2.98

SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res (2010) 2.97

RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics (2012) 2.94

Pacific biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genomics (2012) 2.92

ContEst: estimating cross-contamination of human samples in next-generation sequencing data. Bioinformatics (2011) 2.90

De novo gain-of-function KCNT1 channel mutations cause malignant migrating partial seizures of infancy. Nat Genet (2012) 2.86

Assessment of 2q23.1 microdeletion syndrome implicates MBD5 as a single causal locus of intellectual disability, epilepsy, and autism spectrum disorder. Am J Hum Genet (2011) 2.85

Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat Genet (2014) 2.84

The draft genome of sweet orange (Citrus sinensis). Nat Genet (2012) 2.81

Role of TP53 mutations in the origin and evolution of therapy-related acute myeloid leukaemia. Nature (2014) 2.79

Seventy-five genetic loci influencing the human red blood cell. Nature (2012) 2.77

Using whole-exome sequencing to identify inherited causes of autism. Neuron (2013) 2.74

Whole-exome sequencing-based discovery of STIM1 deficiency in a child with fatal classic Kaposi sarcoma. J Exp Med (2010) 2.70

Calmodulin mutations associated with recurrent cardiac arrest in infants. Circulation (2013) 2.66

The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature (2013) 2.65

Oncogenic and drug-sensitive NTRK1 rearrangements in lung cancer. Nat Med (2013) 2.65

CONTRA: copy number analysis for targeted resequencing. Bioinformatics (2012) 2.64

Comprehensive genomic analysis of rhabdomyosarcoma reveals a landscape of alterations affecting a common genetic axis in fusion-positive and fusion-negative tumors. Cancer Discov (2014) 2.64

Mutations in SWI/SNF chromatin remodeling complex gene ARID1B cause Coffin-Siris syndrome. Nat Genet (2012) 2.63

Landscape of genomic alterations in cervical carcinomas. Nature (2013) 2.61

Genome-wide genetic changes during modern breeding of maize. Nat Genet (2012) 2.60

Epistasis dominates the genetic architecture of Drosophila quantitative traits. Proc Natl Acad Sci U S A (2012) 2.59

Articles cited by this

The Sequence Alignment/Map format and SAMtools. Bioinformatics (2009) 232.39

Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (2009) 190.94

The human genome browser at UCSC. Genome Res (2002) 168.23

Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res (2008) 157.44

Genome sequencing in microfabricated high-density picolitre reactors. Nature (2005) 150.21

Accurate whole human genome sequencing using reversible terminator chemistry. Nature (2008) 90.20

dbSNP: the NCBI database of genetic variation. Nucleic Acids Res (2001) 76.97

The International HapMap Project. Nature (2003) 73.65

SOAP: short oligonucleotide alignment program. Bioinformatics (2008) 68.13

The complete genome of an individual by massively parallel DNA sequencing. Nature (2008) 52.81

SSAHA: a fast search method for large DNA databases. Genome Res (2001) 48.64

The diploid genome sequence of an Asian individual. Nature (2008) 46.29

Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet (2008) 43.63

Next-generation DNA sequencing. Nat Biotechnol (2008) 34.95

Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol (2009) 27.17

Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science (2009) 21.24

BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods (2009) 18.41

Computation for ChIP-seq and RNA-seq studies. Nat Methods (2009) 16.11

VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics (2009) 16.04

A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat Genet (2006) 15.63

Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res (2009) 15.15

Complete MHC haplotype sequencing for common disease gene mapping. Genome Res (2004) 12.09

ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data. Bioinformatics (2009) 8.69

PIQA: pipeline for Illumina G1 genome analyzer data quality assessment. Bioinformatics (2009) 6.17

Bayesian statistics in genetics: a guide for the uninitiated. Trends Genet (1999) 5.55

Articles by these authors

A map of human genome variation from population-scale sequencing. Nature (2010) 121.13

Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 96.15

A second generation human haplotype map of over 3.1 million SNPs. Nature (2007) 85.39

EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science (2004) 61.56

An integrated map of genetic variation from 1,092 human genomes. Nature (2012) 59.82

A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet (2011) 59.36

PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet (2003) 53.59

Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science (2007) 51.70

The structure of haplotype blocks in the human genome. Science (2002) 50.88

Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell (2010) 39.09

Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet (2008) 35.06

The landscape of somatic copy-number alteration across human cancers. Nature (2010) 31.88

Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature (2008) 30.29

Somatic mutations affect key pathways in lung adenocarcinoma. Nature (2008) 30.02

Biological, clinical and population relevance of 95 loci for blood lipids. Nature (2010) 28.21

Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol (2009) 27.17

The variant call format and VCFtools. Bioinformatics (2011) 25.88

Efficiency and power in genetic association studies. Nat Genet (2005) 25.56

Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat Genet (2008) 22.35

Detecting recent positive selection in the human genome from haplotype structure. Nature (2002) 22.00

Common variants at 30 loci contribute to polygenic dyslipidemia. Nat Genet (2008) 20.66

Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med (2013) 19.87

Association between microdeletion and microduplication at 16p11.2 and autism. N Engl J Med (2008) 19.71

Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet (2008) 19.55

New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat Genet (2010) 17.89

Activating mutation in the tyrosine kinase JAK2 in polycythemia vera, essential thrombocythemia, and myeloid metaplasia with myelofibrosis. Cancer Cell (2005) 17.41

Initial genome sequencing and analysis of multiple myeloma. Nature (2011) 17.28

Genome-wide detection and characterization of positive selection in human populations. Nature (2007) 17.27

Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science (2012) 17.12

Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis. Nat Genet (2010) 16.96

The mutational landscape of head and neck squamous cell carcinoma. Science (2011) 16.88

Assessing the impact of population stratification on genetic association studies. Nat Genet (2004) 16.28

Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol (2013) 16.13

Replicating genotype-phenotype associations. Nature (2007) 16.11

Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet (2008) 15.89

Calibrating a coalescent simulation of human genome sequence variation. Genome Res (2005) 15.04

A mutation in Orai1 causes immune deficiency by abrogating CRAC channel function. Nature (2006) 14.78

Evaluating and improving power in whole-genome association studies using fixed marker sets. Nat Genet (2006) 14.76

Multiple regions within 8q24 independently affect risk for prostate cancer. Nat Genet (2007) 14.37

The genomic complexity of primary human prostate cancer. Nature (2011) 14.06

New models of collaboration in genome-wide association studies: the Genetic Association Information Network. Nat Genet (2007) 13.76

Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature (2012) 13.71

Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat Genet (2011) 13.25

Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature (2011) 13.25

High-throughput oncogene mutation profiling in human cancer. Nat Genet (2007) 12.68

A landscape of driver mutations in melanoma. Cell (2012) 12.61

Genome-wide association study identifies eight loci associated with blood pressure. Nat Genet (2009) 12.44

A systematic survey of loss-of-function variants in human protein-coding genes. Science (2012) 12.25

TRAF1-C5 as a risk locus for rheumatoid arthritis--a genomewide study. N Engl J Med (2007) 12.24

Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants. Nat Genet (2009) 12.19

Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet (2012) 12.10

Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat Genet (2008) 12.07

Methods for high-density admixture mapping of disease genes. Am J Hum Genet (2004) 12.02

The somatic genomic landscape of glioblastoma. Cell (2013) 11.73

Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell (2012) 11.69

Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol (2008) 11.28

Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet (2012) 11.09

SF3B1 and other novel cancer genes in chronic lymphocytic leukemia. N Engl J Med (2011) 11.07

Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol (2012) 10.87

A high-density admixture map for disease gene discovery in african americans. Am J Hum Genet (2004) 10.87

Variants in MTNR1B influence fasting glucose levels. Nat Genet (2008) 10.85

Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc Natl Acad Sci U S A (2006) 10.32

Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet (2010) 10.15

Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat Genet (2012) 9.93

Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nat Genet (2008) 9.52

Demonstrating stratification in a European American population. Nat Genet (2005) 9.49

Testing for an unusual distribution of rare variants. PLoS Genet (2011) 9.28

Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell (2013) 9.24

Gene expression in fixed tissues and outcome in hepatocellular carcinoma. N Engl J Med (2008) 9.20

Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature (2012) 8.91

Variation in genome-wide mutation rates within and between human families. Nat Genet (2011) 8.84

Two independent alleles at 6q23 associated with risk of rheumatoid arthritis. Nat Genet (2007) 8.74