Data structures and compression algorithms for genomic sequence data.

PubWeight™: 1.96‹?› | Rank: Top 2%

🔗 View Article (PMC 2705231)

Published in Bioinformatics on May 15, 2009

Authors

Marty C Brandon1, Douglas C Wallace, Pierre Baldi

Author Affiliations

1: Department of Computer Science, UCI, Irvine, CA 92697, USA.

Articles citing this

Compressing genomic sequence fragments using SlimGene. J Comput Biol (2011) 2.10

Compressive genomics. Nat Biotechnol (2012) 1.66

A novel compression tool for efficient storage of genome resequencing data. Nucleic Acids Res (2011) 1.61

Indexes of large genome collections on a PC. PLoS One (2014) 1.61

Computational solutions for omics data. Nat Rev Genet (2013) 1.58

GReEn: a tool for efficient compression of genome resequencing data. Nucleic Acids Res (2011) 1.58

NGC: lossless and lossy compression of aligned high-throughput sequencing data. Nucleic Acids Res (2012) 1.33

DNA barcode goes two-dimensions: DNA QR code web server. PLoS One (2012) 1.12

ERGC: an efficient referential genome compression algorithm. Bioinformatics (2015) 1.02

Compressive genomics for protein databases. Bioinformatics (2013) 0.94

Compression and fast retrieval of SNP data. Bioinformatics (2014) 0.88

An extended IUPAC nomenclature code for polymorphic nucleic acids. Bioinformatics (2010) 0.86

Data-dependent bucketing improves reference-free compression of sequencing reads. Bioinformatics (2015) 0.85

Reference-based compression of short-read sequences using path encoding. Bioinformatics (2015) 0.83

SOLiDzipper: A High Speed Encoding Method for the Next-Generation Sequencing Data. Evol Bioinform Online (2011) 0.81

Sequence Factorization with Multiple References. PLoS One (2015) 0.75

RecountDB: a database of mapped and count corrected transcribed sequences. Nucleic Acids Res (2011) 0.75

An Adaptive Difference Distribution-based Coding with Hierarchical Tree Structure for DNA Sequence Compression. Proc Data Compress Conf (2013) 0.75

Bitpacking techniques for indexing genomes: I. Hash tables. Algorithms Mol Biol (2016) 0.75

HapZipper: sharing HapMap populations just got easier. Nucleic Acids Res (2012) 0.75

On-Demand Indexing for Referential Compression of DNA Sequences. PLoS One (2015) 0.75

Articles cited by this

A second generation human haplotype map of over 3.1 million SNPs. Nature (2007) 85.39

The International HapMap Project. Nature (2003) 73.65

Genome-wide mapping of in vivo protein-DNA interactions. Science (2007) 64.92

Sequence and organization of the human mitochondrial genome. Nature (1981) 57.39

The complete genome of an individual by massively parallel DNA sequencing. Nature (2008) 52.81

The diploid genome sequence of an Asian individual. Nature (2008) 46.29

The diploid genome sequence of an individual human. PLoS Biol (2007) 44.80

Fine-scale structural variation of the human genome. Nat Genet (2005) 24.31

Whole-genome patterns of common DNA variation in three human populations. Science (2005) 21.22

Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet (1999) 18.22

DNA sequencing. A plan to capture human diversity in 1000 genomes. Science (2008) 13.17

Natural selection shaped regional mtDNA variation in humans. Proc Natl Acad Sci U S A (2002) 6.95

Gene sequencing. The race for the $1000 genome. Science (2006) 6.76

DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet (2007) 4.99

An enhanced MITOMAP with a global mtDNA mutational phylogeny. Nucleic Acids Res (2006) 4.51

Human genomes as email attachments. Bioinformatics (2008) 3.59

MITOMAP: a human mitochondrial genome database--2004 update. Nucleic Acids Res (2005) 3.35

DNACompress: fast and effective DNA sequence compression. Bioinformatics (2002) 2.82

Frequency of a 9-bp deletion in the mitochondrial DNA among Asian populations. Hum Biol (1992) 1.67

Molecular instability in the COII-tRNA(Lys) intergenic region of the human mitochondrial genome: multiple origins of the 9-bp deletion and heteroplasmy for expanded repeats. Philos Trans R Soc Lond B Biol Sci (1998) 1.30

Genomics: understanding human diversity. Nature (2005) 1.29

The YH database: the first Asian diploid genome database. Nucleic Acids Res (2009) 1.22

Lossless compression of chemical fingerprints using integer entropy codes improves storage and retrieval. J Chem Inf Model (2007) 1.14

MITOMASTER: a bioinformatics tool for the analysis of mitochondrial DNA sequences. Hum Mutat (2009) 1.10

Compression of nucleotide databases for fast searching. Comput Appl Biosci (1997) 0.85

Articles by these authors

Extension of murine life span by overexpression of catalase targeted to mitochondria. Science (2005) 10.11

Natural selection shaped regional mtDNA variation in humans. Proc Natl Acad Sci U S A (2002) 6.95

Effects of purifying and adaptive selection on regional variation in human mtDNA. Science (2004) 5.87

The ADP/ATP translocator is not essential for the mitochondrial permeability transition pore. Nature (2004) 5.85

Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins (2002) 5.29

mtDNA mutations increase tumorigenicity in prostate cancer. Proc Natl Acad Sci U S A (2005) 4.66

An enhanced MITOMAP with a global mtDNA mutational phylogeny. Nucleic Acids Res (2006) 4.51

A mouse model of mitochondrial disease reveals germline selection against severe mtDNA mutations. Science (2008) 3.95

Global landscape of recent inferred Darwinian selection for Homo sapiens. Proc Natl Acad Sci U S A (2005) 3.77

A back migration from Asia to sub-Saharan Africa is supported by high-resolution analysis of human Y-chromosome haplotypes. Am J Hum Genet (2002) 3.71

A tandem affinity tag for two-step purification under fully denaturing conditions: application in ubiquitin profiling and protein complex identification combined with in vivocross-linking. Mol Cell Proteomics (2006) 3.62

Profiling the humoral immune response to infection by using proteome microarrays: high-throughput vaccine and diagnostic antigen discovery. Proc Natl Acad Sci U S A (2005) 3.59

MITOMAP: a human mitochondrial genome database--2004 update. Nucleic Acids Res (2005) 3.35

Mutations in DNMT1 cause hereditary sensory neuropathy with dementia and hearing loss. Nat Genet (2011) 3.16

Prediction of protein stability changes for single-site mutations using support vector machines. Proteins (2006) 3.00

A prospective analysis of the Ab response to Plasmodium falciparum before and after a malaria season by protein microarray. Proc Natl Acad Sci U S A (2010) 2.90

Alzheimer's brains harbor somatic mtDNA control-region mutations that suppress mitochondrial transcription and replication. Proc Natl Acad Sci U S A (2004) 2.77

Prediction of coordination number and relative solvent accessibility in proteins. Proteins (2002) 2.66

Profiling humoral immune responses to P. falciparum infection with protein microarrays. Proteomics (2008) 2.43

Global gene expression profiling in Escherichia coli K12. The effects of oxygen availability and FNR. J Biol Chem (2003) 2.32

Coordination of the transcriptome and metabolome by the circadian clock. Proc Natl Acad Sci U S A (2012) 2.22

ChemDB: a public database of small molecules and related chemoinformatics resources. Bioinformatics (2005) 2.16

A genome-wide proteome array reveals a limited set of immunogens in natural infections of humans and white-footed mice with Borrelia burgdorferi. Infect Immun (2008) 2.15

A Burkholderia pseudomallei protein microarray reveals serodiagnostic and cross-reactive antigens. Proc Natl Acad Sci U S A (2009) 2.10

The basal proton conductance of mitochondria depends on adenine nucleotide translocase content. Biochem J (2005) 2.05

Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics (2007) 2.03

Evidence for adaptive selection acting on the tRNA and rRNA genes of human mitochondrial DNA. Hum Mutat (2006) 1.98

Life extension through neurofibromin mitochondrial regulation and antioxidant therapy for neurofibromatosis-1 in Drosophila melanogaster. Nat Genet (2007) 1.98

Elevated male European and female African contributions to the genomes of African American individuals. Hum Genet (2006) 1.97

The mitochondrial theory of aging and its relationship to reactive oxygen species damage and somatic mtDNA mutations. Proc Natl Acad Sci U S A (2005) 1.88

Graph kernels for chemical informatics. Neural Netw (2005) 1.86

Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time. J Chem Inf Model (2007) 1.86

Bayesian surprise attracts human attention. Vision Res (2008) 1.86

Immunodominant Francisella tularensis antigens identified using proteome microarray. Proteomics (2007) 1.85

ChemDB update--full-text search and virtual chemical space. Bioinformatics (2007) 1.83

PEPITO: improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure. Bioinformatics (2008) 1.78

A machine learning information retrieval approach to protein fold recognition. Bioinformatics (2006) 1.75

Succinate dehydrogenase is a direct target of sirtuin 3 deacetylase activity. PLoS One (2011) 1.73

MotifMap: a human genome-wide map of candidate regulatory motif sites. Bioinformatics (2008) 1.73

Global gene expression profiling in Escherichia coli K12: effects of oxygen availability and ArcA. J Biol Chem (2005) 1.73

Functional estrogen receptors in the mitochondria of breast cancer cells. Mol Biol Cell (2006) 1.72

Global gene expression profiling in Escherichia coli K12. The effects of leucine-responsive regulatory protein. J Biol Chem (2002) 1.69

From protein microarrays to diagnostic antigen discovery: a study of the pathogen Francisella tularensis. Bioinformatics (2007) 1.68

The role of mtDNA background in disease expression: a new primary LHON mutation associated with Western Eurasian haplogroup J. Hum Genet (2002) 1.65

The dual origin and Siberian affinities of Native American Y chromosomes. Am J Hum Genet (2001) 1.65

Identification of humoral immune responses in protein microarrays using DNA microarray data analysis techniques. Bioinformatics (2006) 1.65

Development of a novel cross-linking strategy for fast and accurate identification of cross-linked peptides of protein complexes. Mol Cell Proteomics (2010) 1.61

Adenine nucleotide translocase 1 deficiency results in dilated cardiomyopathy with defects in myocardial mechanics, histopathological alterations, and activation of apoptosis. JACC Cardiovasc Imaging (2011) 1.60

Ancient mtDNA genetic variants modulate mtDNA transcription and replication. PLoS Genet (2009) 1.58

Cyber-T web server: differential analysis of high-throughput data. Nucleic Acids Res (2012) 1.58

Control region mtDNA variants: longevity, climatic adaptation, and a forensic conundrum. Proc Natl Acad Sci U S A (2003) 1.57

Mitochondrial DNA diversity in indigenous populations of the southern extent of Siberia, and the origins of Native American haplogroups. Ann Hum Genet (2005) 1.54

COBEpro: a novel system for predicting continuous B-cell epitopes. Protein Eng Des Sel (2008) 1.54

Sterile protective immunity to malaria is associated with a panel of novel P. falciparum antigens. Mol Cell Proteomics (2011) 1.53

A mitochondrial etiology of Alzheimer and Parkinson disease. Biochim Biophys Acta (2011) 1.52

A novel NDUFA1 mutation leads to a progressive mitochondrial complex I-specific neurodegenerative disease. Mol Genet Metab (2009) 1.47

Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics (2005) 1.47

Deep architectures for protein contact map prediction. Bioinformatics (2012) 1.44

Differential analysis of DNA microarray gene expression data. Mol Microbiol (2003) 1.43

Structure-based inhibitor design of AccD5, an essential acyl-CoA carboxylase carboxyltransferase domain of Mycobacterium tuberculosis. Proc Natl Acad Sci U S A (2006) 1.42

Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching. Proteins (2006) 1.41

Mitochondrial DNA-like sequences in the nucleus (NUMTs): insights into our African origins and the mechanism of foreign DNA integration. Hum Mutat (2004) 1.39

Assessment of predictions submitted for the CASP7 domain prediction category. Proteins (2007) 1.39

Retroviruses and yeast retrotransposons use overlapping sets of host genes. Genome Res (2005) 1.38

Mitochondrial variants in schizophrenia, bipolar disorder, and major depressive disorder. PLoS One (2009) 1.36

The neuron-specific chromatin regulatory subunit BAF53b is necessary for synaptic plasticity and memory. Nat Neurosci (2013) 1.36

Mouse mtDNA mutant model of Leber hereditary optic neuropathy. Proc Natl Acad Sci U S A (2012) 1.35

VCP associated inclusion body myopathy and paget disease of bone knock-in mouse model exhibits tissue pathology typical of human disease. PLoS One (2010) 1.34

Leptin engages a hypothalamic neurocircuitry to permit survival in the absence of insulin. Cell Metab (2013) 1.32

Adaptive selection of mitochondrial complex I subunits during primate radiation. Gene (2006) 1.31

Data structures and compression algorithms for high-throughput sequencing technologies. BMC Bioinformatics (2010) 1.31

Landscape of the mitochondrial Hsp90 metabolome in tumours. Nat Commun (2013) 1.29

Mitochondrial DNA haplogroups influence AIDS progression. AIDS (2008) 1.29

Analysis of mitochondrial DNA diversity in the aleuts of the commander islands and its implications for the genetic history of beringia. Am J Hum Genet (2002) 1.29

Traces of early Eurasians in the Mansi of northwest Siberia revealed by mitochondrial DNA analysis. Am J Hum Genet (2002) 1.27

Mitochondrial DNA haplogroups associated with age-related macular degeneration. Invest Ophthalmol Vis Sci (2009) 1.26

Distribution patterns of over-represented k-mers in non-coding yeast DNA. Bioinformatics (2002) 1.25

Mitochondrial DNA haplogroups influence lipoatrophy after highly active antiretroviral therapy. J Acquir Immune Defic Syndr (2009) 1.24

The molecular mechanisms of OPA1-mediated optic atrophy in Drosophila model and prospects for antioxidant treatment. PLoS Genet (2008) 1.24

The stability and complexity of antibody responses to the major surface antigen of Plasmodium falciparum are associated with age in a malaria endemic area. Mol Cell Proteomics (2011) 1.24

Valosin containing protein associated inclusion body myopathy: abnormal vacuolization, autophagy and cell fusion in myoblasts. Neuromuscul Disord (2009) 1.23

Why are complementary DNA strands symmetric? Bioinformatics (2002) 1.23

A mitochondrial etiology of neurodegenerative diseases: evidence from Parkinson's disease. Ann N Y Acad Sci (2008) 1.22

Differences of sperm motility in mitochondrial DNA haplogroup U sublineages. Gene (2005) 1.21

Mitochondrial DNA variant associated with Leber hereditary optic neuropathy and high-altitude Tibetans. Proc Natl Acad Sci U S A (2012) 1.21

A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Bioinformatics (2010) 1.20

Systemic mitochondrial dysfunction and the etiology of Alzheimer's disease and down syndrome dementia. J Alzheimers Dis (2010) 1.19

Three-stage prediction of protein beta-sheets by neural networks, alignments and graph algorithms. Bioinformatics (2005) 1.16

Mitochondrial cardiomyopathies: how to identify candidate pathogenic mutations by mitochondrial DNA sequencing, MITOMASTER and phylogeny. Eur J Hum Genet (2010) 1.15

Association of mitochondrial SOD deficiency with salt-sensitive hypertension and accelerated renal senescence. J Appl Physiol (1985) (2006) 1.15

SOLpro: accurate sequence-based prediction of protein solubility. Bioinformatics (2009) 1.14

CircadiOmics: integrating circadian genomics, transcriptomics, proteomics and metabolomics. Nat Methods (2012) 1.14

LineUp: statistical detection of chromosomal homology with application to plant comparative genomics. Genome Res (2003) 1.14

ARL2 and BART enter mitochondria and bind the adenine nucleotide transporter. Mol Biol Cell (2002) 1.11

Discovery of power-laws in chemical space. J Chem Inf Model (2008) 1.11

Genome-wide identification of Bcl11b gene targets reveals role in brain-derived neurotrophic factor signaling. PLoS One (2011) 1.11

Circadian acetylome reveals regulation of mitochondrial metabolic pathways. Proc Natl Acad Sci U S A (2013) 1.11

Mapping the structural topology of the yeast 19S proteasomal regulatory particle using chemical cross-linking and probabilistic modeling. Mol Cell Proteomics (2012) 1.10