The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

PubWeight™: 14.90‹?› | Rank: Top 0.1% | All-Time Top 10000

🔗 View Article (PMC 2704439)

Published in Genome Res on June 04, 2009

Authors

Kim D Pruitt1, Jennifer Harrow, Rachel A Harte, Craig Wallin, Mark Diekhans, Donna R Maglott, Steve Searle, Catherine M Farrell, Jane E Loveland, Barbara J Ruef, Elizabeth Hart, Marie-Marthe Suner, Melissa J Landrum, Bronwen Aken, Sarah Ayling, Robert Baertsch, Julio Fernandez-Banet, Joshua L Cherry, Val Curwen, Michael Dicuccio, Manolis Kellis, Jennifer Lee, Michael F Lin, Michael Schuster, Andrew Shkeda, Clara Amid, Garth Brown, Oksana Dukhanina, Adam Frankish, Jennifer Hart, Bonnie L Maidak, Jonathan Mudge, Michael R Murphy, Terence Murphy, Jeena Rajan, Bhanu Rajput, Lillian D Riddick, Catherine Snow, Charles Steward, David Webb, Janet A Weber, Laurens Wilming, Wenyu Wu, Ewan Birney, David Haussler, Tim Hubbard, James Ostell, Richard Durbin, David Lipman

Author Affiliations

1: National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894, USA. Pruitt@ncbi.nlm.nih.gov

Associated clinical trials:

Exome and Genome Analysis to Elucidate Genetic Etiologies and Population Characteristics in the Plain Community | NCT02927158

Articles citing this

(truncated to the top 100)

Exome sequencing identifies the cause of a mendelian disorder. Nat Genet (2009) 32.06

International network of cancer genome projects. Nature (2010) 20.35

The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res (2009) 19.70

GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res (2012) 19.19

The UCSC Genome Browser database: update 2010. Nucleic Acids Res (2009) 16.58

The UCSC Genome Browser database: update 2011. Nucleic Acids Res (2010) 16.24

Ensembl 2011. Nucleic Acids Res (2010) 14.68

Ensembl 2012. Nucleic Acids Res (2011) 14.55

Systematic localization of common disease-associated variation in regulatory DNA. Science (2012) 14.47

Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet (2011) 14.29

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res (2011) 14.04

Ensembl's 10th year. Nucleic Acids Res (2009) 10.82

De novo gene disruptions in children on the autistic spectrum. Neuron (2012) 9.69

A copy number variation morbidity map of developmental delay. Nat Genet (2011) 9.58

The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res (2012) 9.02

A high-resolution map of human evolutionary constraint using 29 mammals. Nature (2011) 8.67

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2011) 8.62

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2012) 8.41

RefSeq: an update on mammalian reference sequences. Nucleic Acids Res (2013) 7.29

dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat (2011) 6.97

The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res (2011) 6.88

The UCSC Genome Browser database: 2014 update. Nucleic Acids Res (2013) 6.54

genenames.org: the HGNC resources in 2011. Nucleic Acids Res (2010) 5.77

Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet (2013) 5.58

dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum Mutat (2013) 5.11

The genetic landscape of high-risk neuroblastoma. Nat Genet (2013) 4.71

PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics (2011) 4.50

Genenames.org: the HGNC resources in 2013. Nucleic Acids Res (2012) 3.69

Ensembl 2016. Nucleic Acids Res (2015) 3.61

Exome sequencing: the sweet spot before whole genomes. Hum Mol Genet (2010) 3.50

Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet (2014) 3.42

Massively parallel sequencing and rare disease. Hum Mol Genet (2010) 3.28

The IKMC web portal: a central point of entry to data and resources from the International Knockout Mouse Consortium. Nucleic Acids Res (2010) 3.23

Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet (2010) 3.19

Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities. Genome Biol (2011) 3.09

Pervasive sequence patents cover the entire human genome. Genome Med (2013) 3.07

The genetic landscape of mutations in Burkitt lymphoma. Nat Genet (2012) 3.03

A survey of the genetics of stomach, liver, and adipose gene expression from a morbidly obese cohort. Genome Res (2011) 3.00

Comparison of solution-based exome capture methods for next generation sequencing. Genome Biol (2011) 2.78

The genomic landscape of Neanderthal ancestry in present-day humans. Nature (2014) 2.59

The European Bioinformatics Institute's data resources. Nucleic Acids Res (2009) 2.55

The landscape of kinase fusions in cancer. Nat Commun (2014) 2.48

Loss-of-function variants in the genomes of healthy humans. Hum Mol Genet (2010) 2.45

De novo mutations in ATP1A3 cause alternating hemiplegia of childhood. Nat Genet (2012) 2.34

Sequencing studies in human genetics: design and interpretation. Nat Rev Genet (2013) 2.27

The completion of the Mammalian Gene Collection (MGC). Genome Res (2009) 2.21

Current status and new features of the Consensus Coding Sequence database. Nucleic Acids Res (2013) 2.19

Locus Reference Genomic: reference sequences for the reporting of clinically relevant sequence variants. Nucleic Acids Res (2013) 2.18

The UCSC Genome Browser database: 2016 update. Nucleic Acids Res (2015) 2.18

Three periods of regulatory innovation during vertebrate evolution. Science (2011) 2.09

Epigenetic and genetic features of 24 colon cancer cell lines. Oncogenesis (2013) 2.02

Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics (2011) 2.00

Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes. Genome Res (2011) 1.91

The GENCODE exome: sequencing the complete human exome. Eur J Hum Genet (2011) 1.88

A comparative analysis of exome capture. Genome Biol (2011) 1.78

Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol Syst Biol (2013) 1.78

Tracking and coordinating an international curation effort for the CCDS Project. Database (Oxford) (2012) 1.78

neXtProt: a knowledge platform for human proteins. Nucleic Acids Res (2011) 1.77

The Genomic HyperBrowser: inferential genomics at the sequence level. Genome Biol (2010) 1.74

Between a chicken and a grape: estimating the number of human genes. Genome Biol (2010) 1.72

Whole-exome sequencing identifies compound heterozygous mutations in WDR62 in siblings with recurrent polymicrogyria. Am J Med Genet A (2011) 1.71

Mutation patterns in cancer genomes. Proc Natl Acad Sci U S A (2009) 1.69

Massively parallel sequencing of the mouse exome to accurately identify rare, induced mutations: an immediate source for thousands of new mouse models. Open Biol (2012) 1.65

Consistent annotation of gene expression arrays. BMC Genomics (2010) 1.62

The UCSC Genome Browser. Curr Protoc Bioinformatics (2009) 1.59

Choice of transcripts and software has a large effect on variant annotation. Genome Med (2014) 1.58

Modeling read counts for CNV detection in exome sequencing data. Stat Appl Genet Mol Biol (2011) 1.55

Solving the Problem: Genome Annotation Standards before the Data Deluge. Stand Genomic Sci (2011) 1.54

Efficient targeted resequencing of human germline and cancer genomes by oligonucleotide-selective sequencing. Nat Biotechnol (2011) 1.52

Performance comparison of four exome capture systems for deep sequencing. BMC Genomics (2014) 1.51

Transcriptome analyses of the human retina identify unprecedented transcript diversity and 3.5 Mb of novel transcribed sequence via significant alternative splicing and novel genes. BMC Genomics (2013) 1.46

SNPnexus: a web server for functional annotation of novel and publicly known genetic variants (2012 update). Nucleic Acids Res (2012) 1.45

CRAVAT: cancer-related analysis of variants toolkit. Bioinformatics (2013) 1.43

DNA replication timing and selection shape the landscape of nucleotide variation in cancer genomes. Nat Commun (2012) 1.40

Conservation of trans-acting circuitry during mammalian regulatory evolution. Nature (2014) 1.38

Quantitative RNA-Seq analysis in non-model species: assessing transcriptome assemblies as a scaffold and the utility of evolutionary divergent genomic reference species. BMC Genomics (2012) 1.38

DBTSS provides a tissue specific dynamic view of Transcription Start Sites. Nucleic Acids Res (2009) 1.36

A flexible approach for highly multiplexed candidate gene targeted resequencing. PLoS One (2011) 1.35

Long runs of homozygosity are enriched for deleterious variation. Am J Hum Genet (2013) 1.32

Consequences of the discontinuation of the International Protein Index (IPI) database and its substitution by the UniProtKB "complete proteome" sets. Proteomics (2011) 1.31

The Ensembl gene annotation system. Database (Oxford) (2016) 1.31

Patterns of coding variation in the complete exomes of three Neandertals. Proc Natl Acad Sci U S A (2014) 1.29

Transparency tools in gene patenting for informing policy and practice. Nat Biotechnol (2013) 1.26

dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs. Hum Mutat (2016) 1.25

An analysis of exome sequencing for diagnostic testing of the genes associated with muscle disease and spastic paraplegia. Hum Mutat (2012) 1.25

T1DBase: update 2011, organization and presentation of large-scale data sets for type 1 diabetes research. Nucleic Acids Res (2010) 1.23

SIMPLEX: cloud-enabled pipeline for the comprehensive analysis of exome sequencing data. PLoS One (2012) 1.22

APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res (2012) 1.21

Trans genomic capture and sequencing of primate exomes reveals new targets of positive selection. Genome Res (2011) 1.20

CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer. Bioinformatics (2011) 1.18

The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process. Nucleic Acids Res (2011) 1.18

Comparison of predicted and actual consequences of missense mutations. Proc Natl Acad Sci U S A (2015) 1.17

H-InvDB in 2009: extended database and data mining resources for human genes and transcripts. Nucleic Acids Res (2009) 1.16

Linear motifs confer functional diversity onto splice variants. Nucleic Acids Res (2012) 1.13

The Vertebrate Genome Annotation browser 10 years on. Nucleic Acids Res (2013) 1.10

Saturation of the human phenome. Curr Genomics (2010) 1.09

In silico identification of plant miRNAs in mammalian breast milk exosomes--a small step forward? PLoS One (2014) 1.09

Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios. Nat Commun (2015) 1.09

N-terminal proteomics and ribosome profiling provide a comprehensive view of the alternative translation initiation landscape in mice and men. Mol Cell Proteomics (2014) 1.07

The UCSC Genome Browser. Curr Protoc Hum Genet (2011) 1.04

Articles cited by this

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 75.09

NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res (2006) 48.10

Human-mouse alignments with BLASTZ. Genome Res (2003) 35.49

Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature (2003) 29.16

NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res (2008) 26.04

Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics (2005) 24.54

The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res (2007) 23.13

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2007) 22.53

Structural variation in the human genome. Nat Rev Genet (2006) 21.40

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2008) 21.36

Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res (2006) 20.92

GENCODE: producing a reference annotation for ENCODE. Genome Biol (2006) 15.08

Segmental duplications: organization and impact within the current human genome project assembly. Genome Res (2001) 11.77

An overview of Ensembl. Genome Res (2004) 10.35

The Mouse Genome Database (MGD): from genes to mice--a community resource for mouse biology. Nucleic Acids Res (2005) 8.19

Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci U S A (2007) 8.00

The HGNC Database in 2008: a resource for the human genome. Nucleic Acids Res (2007) 7.29

The Vertebrate Genome Annotation (Vega) database. Nucleic Acids Res (2005) 7.06

The Ensembl analysis pipeline. Genome Res (2004) 5.90

VARSPLIC: alternatively-spliced protein sequences derived from SWISS-PROT and TrEMBL. Bioinformatics (2000) 5.03

Mechanistic links between nonsense-mediated mRNA decay and pre-mRNA splicing in mammalian cells. Curr Opin Cell Biol (2005) 3.99

Identification of a novel SNF2/SWI2 protein family member, SRCAP, which interacts with CREB-binding protein. J Biol Chem (1999) 1.97

The chromatin remodeling protein, SRCAP, is critical for deposition of the histone variant H2A.Z at promoters. J Biol Chem (2007) 1.96

Retrocopy contributions to the evolution of the human genome. BMC Genomics (2008) 1.95

UCSC genome browser tutorial. Genomics (2008) 1.73

Articles by these authors

The Sequence Alignment/Map format and SAMtools. Bioinformatics (2009) 232.39

Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (2009) 190.94

The human genome browser at UCSC. Genome Res (2002) 168.23

Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res (2008) 157.44

Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res (2008) 151.16

Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 96.15

Accurate whole human genome sequencing using reversible terminator chemistry. Nature (2008) 90.20

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 75.09

The Bioperl toolkit: Perl modules for the life sciences. Genome Res (2002) 58.63

The Pfam protein families database. Nucleic Acids Res (2004) 56.46

Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics (2010) 52.01

The Pfam protein families database. Nucleic Acids Res (2002) 51.34

NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res (2006) 48.10

The diploid genome sequence of an Asian individual. Nature (2008) 46.29

Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res (2005) 44.08

Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet (2008) 43.63

Patterns of somatic mutation in human cancer genomes. Nature (2007) 38.41

NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res (2005) 37.39

Human-mouse alignments with BLASTZ. Genome Res (2003) 35.49

Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature (2009) 35.48

Pfam: clans, web tools and services. Nucleic Acids Res (2006) 34.83

Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature (2005) 31.60

Transcriptional regulatory code of a eukaryotic genome. Nature (2004) 27.21

Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature (2003) 26.58

NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res (2008) 26.04

The variant call format and VCFtools. Bioinformatics (2011) 25.88

GenBank. Nucleic Acids Res (2007) 25.54

The UCSC Table Browser data retrieval tool. Nucleic Acids Res (2004) 25.12

The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res (2003) 24.72

Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics (2005) 24.54

Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res (2004) 24.52

Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature (2009) 24.41

Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature (2004) 24.40

Mapping and analysis of chromatin state dynamics in nine human cell types. Nature (2011) 24.37

Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature (2005) 23.04

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2005) 22.98

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2007) 22.53

A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol (2008) 21.72

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2008) 21.36

The genome sequence of the malaria mosquito Anopheles gambiae. Science (2002) 20.36

International network of cancer genome projects. Nature (2010) 20.35

Landscape of transcription in human cells. Nature (2012) 20.18

GenBank. Nucleic Acids Res (2005) 19.25

GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res (2012) 19.19

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2006) 18.85

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2006) 18.84

A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature (2009) 18.39

Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell (2007) 18.35

The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol (2005) 18.20

Evolution of genes and genomes on the Drosophila phylogeny. Nature (2007) 18.01

The NCBI dbGaP database of genotypes and phenotypes. Nat Genet (2007) 17.93

GeneWise and Genomewise. Genome Res (2004) 17.87

EnsMart: a generic system for fast and flexible access to biological data. Genome Res (2004) 17.64

InterPro, progress and status in 2005. Nucleic Acids Res (2005) 17.53

GenBank. Nucleic Acids Res (2002) 17.24

Ultraconserved elements in the human genome. Science (2004) 17.14

HLH-2004: Diagnostic and therapeutic guidelines for hemophagocytic lymphohistiocytosis. Pediatr Blood Cancer (2007) 16.96

GenBank. Nucleic Acids Res (2007) 16.92

The UCSC Genome Browser database: update 2010. Nucleic Acids Res (2009) 16.58

Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A (2003) 16.58

The UCSC Genome Browser database: update 2011. Nucleic Acids Res (2010) 16.24