QualComp: a new lossy compressor for quality scores based on rate distortion theory.

PubWeight™: 0.93‹?›

🔗 View Article (PMC 3698011)

Published in BMC Bioinformatics on June 08, 2013

Authors

Idoia Ochoa1, Himanshu Asnani, Dinesh Bharadia, Mainak Chowdhury, Tsachy Weissman, Golan Yona

Author Affiliations

1: Department of Electrical Engineering, Stanford University, Stanford, CA, USA. iochoa@stanford.edu

Articles cited by this

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol (2009) 235.12

The Sequence Alignment/Map format and SAMtools. Bioinformatics (2009) 232.39

Initial sequencing and analysis of the human genome. Nature (2001) 212.86

Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (2009) 190.94

Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res (2008) 157.44

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res (2010) 97.51

A human gut microbial gene catalogue established by metagenomic sequencing. Nature (2010) 43.63

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res (2009) 12.09

SNPdetector: a software tool for sensitive and accurate SNP detection. PLoS Comput Biol (2005) 10.04

Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res (2010) 9.55

SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics (2010) 9.47

The sequence read archive. Nucleic Acids Res (2010) 7.97

RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res (2012) 7.48

Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science (2011) 6.93

Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res (2011) 5.60

Human genomes as email attachments. Bioinformatics (2008) 3.59

DNACompress: fast and effective DNA sequence compression. Bioinformatics (2002) 2.82

Compressing genomic sequence fragments using SlimGene. J Comput Biol (2011) 2.10

The future of DNA sequence archiving. Gigascience (2012) 1.85

G-SQZ: compact encoding of genomic sequence and quality data. Bioinformatics (2010) 1.79

Compression of next-generation sequencing reads aided by highly efficient de novo assembly. Nucleic Acids Res (2012) 1.76

Compression of DNA sequence reads in FASTQ format. Bioinformatics (2011) 1.70

A novel compression tool for efficient storage of genome resequencing data. Nucleic Acids Res (2011) 1.61

GReEn: a tool for efficient compression of genome resequencing data. Nucleic Acids Res (2011) 1.58

SCALCE: boosting sequence compression algorithms using locally consistent encoding. Bioinformatics (2012) 1.57

Transformations for the compression of FASTQ quality scores of next-generation sequencing data. Bioinformatics (2011) 1.24

On the representability of complete genomes by multiple competing finite-context (Markov) models. PLoS One (2011) 1.04

Compressing DNA sequence databases with coil. BMC Bioinformatics (2008) 1.02

Articles by these authors

BIOZON: a system for unification, management and analysis of heterogeneous biological data. BMC Bioinformatics (2006) 2.02

BIOZON: a hub of heterogeneous biological data. Nucleic Acids Res (2006) 1.39

Automatic prediction of protein domains from sequence information using a hybrid learning system. Bioinformatics (2004) 1.33

The human genome contracts again. Bioinformatics (2013) 0.99

Protein family comparison using statistical models and predicted structural information. BMC Bioinformatics (2004) 0.98

EST2Prot: mapping EST sequences to proteins. BMC Genomics (2006) 0.94

Novel subdomains of the mouse olfactory bulb defined by molecular heterogeneity in the nascent external plexiform and glomerular layers. BMC Dev Biol (2007) 0.84

Hubs of knowledge: using the functional link structure in Biozon to mine for biologically significant entities. BMC Bioinformatics (2006) 0.84

Automation of gene assignments to metabolic pathways using high-throughput expression data. BMC Bioinformatics (2005) 0.82

Enzyme function prediction with interpretable models. Methods Mol Biol (2009) 0.81

Correcting BLAST e-values for low-complexity segments. J Comput Biol (2005) 0.81

Effect of lossy compression of quality scores on variant calling. Brief Bioinform (2016) 0.78

Aligned genomic data compression via improved modeling. J Bioinform Comput Biol (2014) 0.78

Protein domain prediction. Methods Mol Biol (2008) 0.77

The distance-profile representation and its application to detection of distantly related protein families. BMC Bioinformatics (2005) 0.77

Prediction of protein-protein interactions: a study of the co-evolution model. Methods Mol Biol (2009) 0.75

Expectation-maximization algorithms for fuzzy assignment of genes to cellular pathways. Comput Syst Bioinformatics Conf (2006) 0.75

Immature myeloid cells derived from mouse placentas and malignant tumors demonstrate similar proangiogenic transcriptional signatures. Fertil Steril (2012) 0.75

QVZ: lossy compression of quality values. Bioinformatics (2017) 0.75