Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing.

PubWeight™: 3.90‹?› | Rank: Top 1%

🔗 View Article (PMC 3706896)

Published in Genome Med on March 27, 2013

Authors

Jason O'Rawe1, Tao Jiang2, Guangqing Sun2, Yiyang Wu1, Wei Wang3, Jingchu Hu2, Paul Bodily4, Lifeng Tian5, Hakon Hakonarson5, W Evan Johnson6, Zhi Wei3, Kai Wang7, Gholson J Lyon8

Author Affiliations

1: Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, One Bungtown Rd, Cold Spring Harbor, 11724, USA ; Stony Brook University, 100 Nicolls Rd, Stony Brook, 11794, USA.
2: BGI-Shenzhen, Shenzhen 518000, China.
3: New Jersey Institute of Technology, Martin Luther King Jr. Blvd, Newark, 07103, USA.
4: Brigham Young University, N University Ave, Provo, 84606, USA.
5: Children's Hospital of Philadelphia, Civic Center Blvd, Philadelphia, 19104, USA.
6: Boston University School of Medicine, E Concord St, Boston, 02118, USA.
7: University of Southern California, 1501 San Pablo Street, Los Angeles, 90089, USA ; Utah Foundation for Biomedical Research, E 3300 S, Salt Lake City, 84106, USA.
8: Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, One Bungtown Rd, Cold Spring Harbor, 11724, USA ; Stony Brook University, 100 Nicolls Rd, Stony Brook, 11794, USA ; Utah Foundation for Biomedical Research, E 3300 S, Salt Lake City, 84106, USA.

Articles citing this

(truncated to the top 100)

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol (2014) 4.07

Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat Genet (2014) 2.84

Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing. Am J Hum Genet (2013) 2.47

Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics (2014) 2.37

A decision support framework for genomically informed investigational cancer therapy. J Natl Cancer Inst (2015) 2.13

An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge. Genome Biol (2014) 1.95

Medical implications of technical accuracy in genome sequencing. Genome Med (2016) 1.86

A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat Genet (2016) 1.75

Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat Methods (2015) 1.67

The role of replicates for error mitigation in next-generation sequencing. Nat Rev Genet (2013) 1.66

Choice of transcripts and software has a large effect on variant annotation. Genome Med (2014) 1.58

Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genomics (2014) 1.53

Accurate de novo and transmitted indel detection in exome-capture data using microassembly. Nat Methods (2014) 1.50

A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat Commun (2015) 1.47

Comparing a few SNP calling algorithms using low-coverage sequencing data. BMC Bioinformatics (2013) 1.36

Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Med (2014) 1.30

The new sequencer on the block: comparison of Life Technology's Proton sequencer to an Illumina HiSeq for whole-exome sequencing. Hum Genet (2013) 1.27

Analytical validation of whole exome and whole genome sequencing for clinical applications. BMC Med Genomics (2014) 1.27

A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases. Genome Med (2015) 1.26

Global optimization of somatic variant identification in cancer genomes with a global community challenge. Nat Genet (2014) 1.21

Validation and assessment of variant calling pipelines for next-generation sequencing. Hum Genomics (2014) 1.16

Germline variation in cancer-susceptibility genes in a healthy, ancestrally diverse cohort: implications for individual genome sequencing. PLoS One (2014) 1.14

Systematic comparison of variant calling pipelines using gold standard personal exome variants. Sci Rep (2015) 1.14

An analytical framework for optimizing variant discovery from personal genomes. Nat Commun (2015) 1.13

Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer Inform (2014) 1.13

HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS. Bioinformatics (2015) 1.12

Navigating the rapids: the development of regulated next-generation sequencing-based clinical trial assays and companion diagnostics. Front Oncol (2014) 1.12

Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc (2015) 1.07

Detection of structural DNA variation from next generation sequencing data: a review of informatic approaches. Cancer Genet (2013) 1.06

Effective filtering strategies to improve data quality from population-based whole exome sequencing studies. BMC Bioinformatics (2014) 1.06

Toward better benchmarking: challenge-based methods assessment in cancer genomics. Genome Biol (2014) 1.05

New insights into the performance of human whole-exome capture platforms. Nucleic Acids Res (2015) 1.04

A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res (2016) 1.02

Transposable element detection from whole genome sequence data. Mob DNA (2015) 0.99

BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity. BMC Bioinformatics (2014) 0.99

Translating personalized medicine using new genetic technologies in clinical practice: the ethical issues. Per Med (2014) 0.98

Performance comparison of SNP detection tools with illumina exome sequencing data--an assessment using both family pedigree information and sample-matched SNP array data. Nucleic Acids Res (2014) 0.97

Verification and validation of bioinformatics software without a gold standard: a case study of BWA and Bowtie. BMC Bioinformatics (2014) 0.96

A community-based resource for automatic exome variant-calling and annotation in Mendelian disorders. BMC Genomics (2014) 0.93

Clinical exome performance for reporting secondary genetic findings. Clin Chem (2014) 0.93

Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling. BMC Genomics (2015) 0.92

Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes. Brief Bioinform (2015) 0.91

Next-generation sequencing in clinical oncology: next steps towards clinical validation. Cancers (Basel) (2014) 0.90

Estimating exome genotyping accuracy by comparing to data from large scale sequencing projects. Genome Med (2013) 0.90

Exome Sequencing Identifies a Novel LMNA Splice-Site Mutation and Multigenic Heterozygosity of Potential Modifiers in a Family with Sick Sinus Syndrome, Dilated Cardiomyopathy, and Sudden Cardiac Death. PLoS One (2016) 0.90

Whole-Exome Sequencing in the Differential Diagnosis of Primary Adrenal Insufficiency in Children. Front Endocrinol (Lausanne) (2015) 0.90

High-depth sequencing of over 750 genes supports linear progression of primary tumors and metastases in most patients with liver-limited metastatic colorectal cancer. Genome Biol (2015) 0.89

Systems genomics evaluation of the SH-SY5Y neuroblastoma cell line as a model for Parkinson's disease. BMC Genomics (2014) 0.88

Detailed comparison of two popular variant calling packages for exome and targeted exon studies. PeerJ (2014) 0.88

Population genetics identifies challenges in analyzing rare variants. Genet Epidemiol (2015) 0.87

RIG: Recalibration and interrelation of genomic sequence data with the GATK. G3 (Bethesda) (2015) 0.87

Medical genomics: The intricate path from genetic variant identification to clinical interpretation. Appl Transl Genom (2014) 0.87

Novel bioinformatic developments for exome sequencing. Hum Genet (2016) 0.86

Guanine holes are prominent targets for mutation in cancer and inherited disease. PLoS Genet (2013) 0.86

Identification and validation of loss of function variants in clinical contexts. Mol Genet Genomic Med (2013) 0.85

Prevalence of Titin Truncating Variants in General Population. PLoS One (2015) 0.85

Genotyping by sequencing approaches to characterise crop genomes: choosing the right tool for the right application. Plant Biotechnol J (2016) 0.85

MSEA: detection and quantification of mutation hotspots through mutation set enrichment analysis. Genome Biol (2014) 0.85

Towards a European consensus for reporting incidental findings during clinical NGS testing. Eur J Hum Genet (2015) 0.84

DRAW+SneakPeek: analysis workflow and quality metric management for DNA-seq experiments. Bioinformatics (2013) 0.83

Reliably Detecting Clinically Important Variants Requires Both Combined Variant Calls and Optimized Filtering Strategies. PLoS One (2015) 0.83

Amplicon sequencing of colorectal cancer: variant calling in frozen and formalin-fixed samples. PLoS One (2015) 0.82

SeqMule: automated pipeline for analysis of human exome/genome sequencing data. Sci Rep (2015) 0.82

Personalized targeted therapy for esophageal squamous cell carcinoma. World J Gastroenterol (2015) 0.82

Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition. PLoS One (2015) 0.82

Whole-exome sequencing of over 4100 men of African ancestry and prostate cancer risk. Hum Mol Genet (2015) 0.81

TruePrime is a novel method for whole-genome amplification from single cells based on TthPrimPol. Nat Commun (2016) 0.81

Comparison of custom capture for targeted next-generation DNA sequencing. J Mol Diagn (2015) 0.81

Genomic variation among populations of threatened coral: Acropora cervicornis. BMC Genomics (2016) 0.81

Species-wide genome sequence and nucleotide polymorphisms from the model allopolyploid plant Brassica napus. Sci Data (2015) 0.81

The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection. Nucleic Acids Res (2015) 0.81

Case-only exome sequencing and complex disease susceptibility gene discovery: study design considerations. J Med Genet (2014) 0.80

Efficient Genome-Wide Sequencing and Low-Coverage Pedigree Analysis from Noninvasively Collected Samples. Genetics (2016) 0.80

Identification of rare variants in Alzheimer's disease. Front Genet (2014) 0.80

How to test bioinformatics software? Biophys Rev (2015) 0.80

Reducing false-positive incidental findings with ensemble genotyping and logistic regression based variant filtering methods. Hum Mutat (2014) 0.79

Quality control metrics improve repeatability and reproducibility of single-nucleotide variants derived from whole-genome sequencing. Pharmacogenomics J (2014) 0.79

Crowdsourced direct-to-consumer genomic analysis of a family quartet. BMC Genomics (2015) 0.78

Genomic data sharing for translational research and diagnostics. Genome Med (2014) 0.78

Lessons learned from gene identification studies in Mendelian epilepsy disorders. Eur J Hum Genet (2015) 0.78

Consensus Genotyper for Exome Sequencing (CGES): improving the quality of exome variant genotypes. Bioinformatics (2014) 0.78

Diagnostic use of Massively Parallel Sequencing in Neuromuscular Diseases: Towards an Integrated Diagnosis. J Neuromuscul Dis (2015) 0.78

Collaborative science in the next-generation sequencing era: a viewpoint on how to combine exome sequencing data across sites to identify novel disease susceptibility genes. Brief Bioinform (2015) 0.78

Reference standards for next-generation sequencing. Nat Rev Genet (2017) 0.77

"Genotype-first" approaches on a curious case of idiopathic progressive cognitive decline. BMC Med Genomics (2014) 0.77

ExScalibur: A High-Performance Cloud-Enabled Suite for Whole Exome Germline and Somatic Mutation Identification. PLoS One (2015) 0.77

Integrating precision medicine in the study and clinical treatment of a severely mentally ill person. PeerJ (2013) 0.77

Using VAAST to Identify Disease-Associated Variants in Next-Generation Sequencing Data. Curr Protoc Hum Genet (2014) 0.77

The National Clinical Trials Network: Conducting Successful Clinical Trials of New Therapies for Rare Cancers. Semin Oncol (2015) 0.77

Advantages of Array-Based Technologies for Pre-Emptive Pharmacogenomics Testing. Microarrays (Basel) (2016) 0.76

Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken. BMC Genomics (2015) 0.76

Suitability of Different Mapping Algorithms for Genome-wide Polymorphism Scans with Pool-Seq Data. G3 (Bethesda) (2016) 0.76

From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing. Hum Mutat (2016) 0.76

The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes. BMC Genomics (2015) 0.76

Genome-wide variant analysis of simplex autism families with an integrative clinical-bioinformatics pipeline. Cold Spring Harb Mol Case Stud (2015) 0.76

SCN8A mutation in a child presenting with seizures and developmental delays. Cold Spring Harb Mol Case Stud (2016) 0.76

A simple data-adaptive probabilistic variant calling model. Algorithms Mol Biol (2015) 0.75

DNAseq Workflow in a Diagnostic Context and an Example of a User Friendly Implementation. Biomed Res Int (2015) 0.75

Heterozygous genome assembly via binary classification of homologous sequence. BMC Bioinformatics (2015) 0.75

Sequence analysis of pooled bacterial samples enables identification of strain variation in group A streptococcus. Sci Rep (2017) 0.75

Articles cited by this

The Sequence Alignment/Map format and SAMtools. Bioinformatics (2009) 232.39

Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (2009) 190.94

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res (2010) 97.51

SOAP: short oligonucleotide alignment program. Bioinformatics (2008) 68.13

An integrated map of genetic variation from 1,092 human genomes. Nature (2012) 59.82

A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet (2011) 59.36

De novo assembly of human genomes with massively parallel short read sequencing. Genome Res (2009) 45.91

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res (2010) 43.51

SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics (2009) 39.47

Targeted capture and massively parallel sequencing of 12 human exomes. Nature (2009) 33.96

Exome sequencing identifies the cause of a mendelian disorder. Nat Genet (2009) 32.06

Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet (2007) 24.68

Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science (2009) 21.24

Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science (2010) 18.45

Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science (2012) 17.12

SNP detection for massively parallel whole-genome resequencing. Genome Res (2009) 15.96

Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature (2012) 14.76

Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature (2012) 13.71

De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature (2012) 13.61

Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat Genet (2010) 12.63

Mapping copy number variation by population-scale genome sequencing. Nature (2011) 12.55

An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res (2006) 11.60

Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet (2012) 11.29

De novo gene disruptions in children on the autistic spectrum. Neuron (2012) 9.69

Variation in genome-wide mutation rates within and between human families. Nat Genet (2011) 8.84

Genotype imputation with thousands of genomes. G3 (Bethesda) (2011) 8.77

Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet (2011) 8.34

An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science (2012) 7.94

Performance comparison of whole-genome sequencing platforms. Nat Biotechnol (2011) 5.79

Low-coverage sequencing: implications for design of complex trait association studies. Genome Res (2011) 5.34

Pairwise end sequencing: a unified approach to genomic mapping and sequencing. Genomics (1995) 4.83

A fast, powerful method for detecting identity by descent. Am J Hum Genet (2011) 4.26

SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors. Bioinformatics (2010) 4.02

Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature (2012) 3.74

A public resource facilitating clinical use of genomes. Proc Natl Acad Sci U S A (2012) 3.72

De novo gene mutations highlight patterns of genetic and neural complexity in schizophrenia. Nat Genet (2012) 3.49

Using VAAST to identify an X-linked disorder resulting in lethality in male infants due to N-terminal acetyltransferase deficiency. Am J Hum Genet (2011) 3.43

Phased whole-genome genetic risk in a family quartet using a major allele reference sequence. PLoS Genet (2011) 3.20

Natural genetic variation caused by small insertions and deletions in the human genome. Genome Res (2011) 3.00

Microindel detection in short-read sequence data. Bioinformatics (2010) 2.73

Phasing of many thousands of genotyped samples. Am J Hum Genet (2012) 2.47

SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res (2011) 2.27

Small insertions and deletions (INDELs) in human genomes. Hum Mol Genet (2010) 2.25

Computational techniques for human genome resequencing using mated gapped reads. J Comput Biol (2011) 2.15

Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing. Nat Biotechnol (2011) 2.10

Whole-exome sequencing and homozygosity analysis implicate depolarization-regulated neuronal genes in autism. PLoS Genet (2012) 1.86

The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing. Bioinformatics (2009) 1.83

Detecting and annotating genetic variations using the HugeSeq pipeline. Nat Biotechnol (2012) 1.72

Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score. Bioinformatics (2012) 1.64

Exome sequencing and unrelated findings in the context of complex disease research: ethical and clinical implications. Discov Med (2011) 1.52

State of the art de novo assembly of human genomes from massively parallel sequencing data. Hum Genomics (2010) 1.51

Analysis of insertion-deletion from deep-sequencing data: software evaluation for optimal detection. Brief Bioinform (2012) 1.35

The advent of personal genome sequencing. Genet Med (2011) 1.27

Identifying disease mutations in genomic medicine settings: current challenges and how to accelerate progress. Genome Med (2012) 1.18

The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process. Nucleic Acids Res (2011) 1.18

Limitations of the human reference genome for personalized genomics. PLoS One (2012) 1.04

Human genetic individuality. Annu Rev Genomics Hum Genet (2012) 0.94