GENCODE: the reference human genome annotation for The ENCODE Project.

PubWeight™: 19.19‹?› | Rank: Top 0.1% | All-Time Top 10000

🔗 View Article (PMC 3431492)

Published in Genome Res on September 01, 2012

Authors

Jennifer Harrow1, Adam Frankish, Jose M Gonzalez, Electra Tapanari, Mark Diekhans, Felix Kokocinski, Bronwen L Aken, Daniel Barrell, Amonida Zadissa, Stephen Searle, If Barnes, Alexandra Bignell, Veronika Boychenko, Toby Hunt, Mike Kay, Gaurab Mukherjee, Jeena Rajan, Gloria Despacio-Reyes, Gary Saunders, Charles Steward, Rachel Harte, Michael Lin, Cédric Howald, Andrea Tanzer, Thomas Derrien, Jacqueline Chrast, Nathalie Walters, Suganthi Balasubramanian, Baikang Pei, Michael Tress, Jose Manuel Rodriguez, Iakes Ezkurdia, Jeltje van Baren, Michael Brent, David Haussler, Manolis Kellis, Alfonso Valencia, Alexandre Reymond, Mark Gerstein, Roderic Guigó, Tim J Hubbard

Author Affiliations

1: Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, United Kingdom. jla1@sanger.ac.uk

Articles citing this

(truncated to the top 100)

An integrated encyclopedia of DNA elements in the human genome. Nature (2012) 64.73

STAR: ultrafast universal RNA-seq aligner. Bioinformatics (2012) 25.21

Landscape of transcription in human cells. Nature (2012) 20.18

The accessible chromatin landscape of the human genome. Nature (2012) 16.86

The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res (2012) 15.41

Annotation of functional variation in personal genomes using RegulomeDB. Genome Res (2012) 11.47

Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science (2015) 9.92

The long-range interaction landscape of gene promoters. Nature (2012) 9.20

Transcriptome and genome sequencing uncovers functional variation in humans. Nature (2013) 8.89

Ensembl 2015. Nucleic Acids Res (2014) 8.30

Guidelines for investigating causality of sequence variants in human disease. Nature (2014) 7.30

The UCSC Genome Browser database: 2014 update. Nucleic Acids Res (2013) 6.54

ENCODE data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res (2012) 6.29

Linking disease associations with regulatory information in the human genome. Genome Res (2012) 5.47

dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum Mutat (2013) 5.11

The UCSC Genome Browser database: 2015 update. Nucleic Acids Res (2014) 4.87

starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res (2013) 4.33

Predicting effective microRNA target sites in mammalian mRNAs. Elife (2015) 4.30

The GENCODE pseudogene resource. Genome Biol (2012) 4.18

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol (2014) 3.99

Ensembl 2016. Nucleic Acids Res (2015) 3.61

Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements. Genome Res (2012) 3.61

The landscape of long noncoding RNAs in the human transcriptome. Nat Genet (2015) 3.58

The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res (2014) 3.57

Genenames.org: the HGNC resources in 2015. Nucleic Acids Res (2014) 3.54

Human genomics. The human transcriptome across tissues and individuals. Science (2015) 3.53

RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science (2015) 3.49

Assessment of transcript reconstruction methods for RNA-seq. Nat Methods (2013) 3.11

Transcriptional regulation and its misregulation in disease. Cell (2013) 2.92

A benchmark for RNA-seq quantification pipelines. Genome Biol (2016) 2.78

Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res (2012) 2.66

Resetting transcription factor control circuitry toward ground-state pluripotency in human. Cell (2014) 2.60

Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat Genet (2014) 2.58

The landscape of kinase fusions in cancer. Nat Commun (2014) 2.48

Diversity and dynamics of the Drosophila transcriptome. Nature (2014) 2.47

Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs. PLoS Genet (2013) 2.46

Functional annotation of noncoding sequence variants. Nat Methods (2014) 2.41

A single-molecule long-read survey of the human transcriptome. Nat Biotechnol (2013) 2.24

From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res (2013) 2.23

Transdifferentiation of human fibroblasts to endothelial cells: role of innate immunity. Circulation (2014) 2.19

Current status and new features of the Consensus Coding Sequence database. Nucleic Acids Res (2013) 2.19

Locus Reference Genomic: reference sequences for the reporting of clinically relevant sequence variants. Nucleic Acids Res (2013) 2.18

The UCSC Genome Browser database: 2016 update. Nucleic Acids Res (2015) 2.18

Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome. Genome Res (2012) 2.13

Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions. Cell (2015) 2.11

Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet (2013) 2.07

Gene-gene and gene-environment interactions detected by transcriptome sequence analysis in twins. Nat Genet (2014) 2.05

Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep (2014) 2.04

Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res (2013) 1.96

DIANA-LncBase: experimentally verified and computationally predicted microRNA targets on long non-coding RNAs. Nucleic Acids Res (2012) 1.94

Genome-wide mapping and characterization of Notch-regulated long noncoding RNAs in acute leukemia. Cell (2014) 1.88

A gene-based association method for mapping traits using reference transcriptome data. Nat Genet (2015) 1.82

Data integration in the era of omics: current and future challenges. BMC Syst Biol (2014) 1.82

Unique features of long non-coding RNA biogenesis and function. Nat Rev Genet (2015) 1.81

The Ensembl Variant Effect Predictor. Genome Biol (2016) 1.80

Rare loss-of-function variants in SETD1A are associated with schizophrenia and developmental disorders. Nat Neurosci (2016) 1.78

Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene. Genome Biol (2013) 1.75

Web Apollo: a web-based genomic annotation editing platform. Genome Biol (2013) 1.69

lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res (2014) 1.69

Discovery of four recessive developmental disorders using probabilistic genotype and phenotype matching among 4,125 families. Nat Genet (2015) 1.67

Extensive localization of long noncoding RNAs to the cytosol and mono- and polyribosomal complexes. Genome Biol (2014) 1.66

Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet (2016) 1.64

Computational approaches to identify functional genetic variants in cancer genomes. Nat Methods (2013) 1.64

Long Noncoding RNAs in Cancer Pathways. Cancer Cell (2016) 1.63

Comparison of the transcriptional landscapes between human and mouse tissues. Proc Natl Acad Sci U S A (2014) 1.60

RNA Sequence Analysis of Human Huntington Disease Brain Reveals an Extensive Increase in Inflammatory and Developmental Gene Expression. PLoS One (2015) 1.59

Choice of transcripts and software has a large effect on variant annotation. Genome Med (2014) 1.58

Tracking Distinct RNA Populations Using Efficient and Reversible Covalent Chemistry. Mol Cell (2015) 1.56

Long non-coding RNAs as a source of new peptides. Elife (2014) 1.56

The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity. Nat Struct Mol Biol (2014) 1.55

Proteogenomics: concepts, applications and computational strategies. Nat Methods (2014) 1.54

Down-regulation of long non-coding RNA GAS5 is associated with the prognosis of hepatocellular carcinoma. Int J Clin Exp Pathol (2014) 1.54

Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum Mol Genet (2014) 1.51

Long noncoding RNA associated-competing endogenous RNAs in gastric cancer. Sci Rep (2014) 1.50

Correlation of circular RNA abundance with proliferation--exemplified with colorectal and ovarian cancer, idiopathic lung fibrosis, and normal human tissues. Sci Rep (2015) 1.50

An integrative transcriptomic atlas of organogenesis in human embryos. Elife (2016) 1.50

Analysis of long non-coding RNA expression profiles in gastric cancer. World J Gastroenterol (2013) 1.50

Modulation of long noncoding RNAs by risk SNPs underlying genetic predispositions to prostate cancer. Nat Genet (2016) 1.48

What does our genome encode? Genome Res (2012) 1.47

The Landscape of long noncoding RNA classification. Trends Genet (2015) 1.47

Spatial genomic heterogeneity within localized, multifocal prostate cancer. Nat Genet (2015) 1.47

Identification of the missing pluripotency mediator downstream of leukaemia inhibitory factor. EMBO J (2013) 1.46

Promoter-like epigenetic signatures in exons displaying cell type-specific splicing. Genome Biol (2015) 1.46

DECKO: Single-oligo, dual-CRISPR deletion of genomic elements including long non-coding RNAs. BMC Genomics (2015) 1.45

Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat Genet (2015) 1.45

Differentiated human stem cells resemble fetal, not adult, β cells. Proc Natl Acad Sci U S A (2014) 1.44

The genetic architecture of type 2 diabetes. Nature (2016) 1.43

Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat Genet (2017) 1.42

Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs. Genome Biol (2013) 1.41

A pan-cancer analysis of transcriptome changes associated with somatic mutations in U2AF1 reveals commonly altered splicing events. PLoS One (2014) 1.40

Global genomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism. Proc Natl Acad Sci U S A (2014) 1.40

Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional Annotation of Animal Genomes project. Genome Biol (2015) 1.40

Genome-wide association between YAP/TAZ/TEAD and AP-1 at enhancers drives oncogenic growth. Nat Cell Biol (2015) 1.39

Finding the lost treasures in exome sequencing data. Trends Genet (2013) 1.36

FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol (2014) 1.36

Finding the active genes in deep RNA-seq gene expression studies. BMC Genomics (2013) 1.35

EZH2-mediated epigenetic suppression of long noncoding RNA SPRY4-IT1 promotes NSCLC cell proliferation and metastasis by affecting the epithelial-mesenchymal transition. Cell Death Dis (2014) 1.34

Mouse regulatory DNA landscapes reveal global principles of cis-regulatory evolution. Science (2014) 1.34

RNAcentral: an international database of ncRNA sequences. Nucleic Acids Res (2014) 1.32

The Ensembl gene annotation system. Database (Oxford) (2016) 1.31

Articles cited by this

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol (2009) 235.12

The Protein Data Bank. Nucleic Acids Res (2000) 187.10

A map of human genome variation from population-scale sequencing. Nature (2010) 121.13

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 75.09

Finishing the euchromatic sequence of the human genome. Nature (2004) 41.40

Sequencing technologies - the next generation. Nat Rev Genet (2009) 40.57

The Pfam protein families database. Nucleic Acids Res (2009) 37.98

Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature (2009) 35.48

Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics (2005) 24.54

Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res (2004) 24.52

International network of cancer genome projects. Nature (2010) 20.35

Landscape of transcription in human cells. Nature (2012) 20.18

Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc (2007) 19.50

RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science (2007) 18.59

Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res (2011) 18.50

Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol (2010) 18.44

Global identification of human transcribed sequences with genome tiling arrays. Science (2004) 17.85

EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA. Comput Appl Biosci (1997) 17.51

TANDEM: matching proteins with tandem mass spectra. Bioinformatics (2004) 17.41

Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev (2011) 16.77

A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol (2011) 16.53

A combined transmembrane topology and signal peptide prediction method. J Mol Biol (2004) 15.77

Antisense transcription in the mammalian transcriptome. Science (2005) 15.69

The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res (2012) 15.41

GENCODE: producing a reference annotation for ENCODE. Genome Biol (2006) 15.08

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res (2009) 14.90

Ensembl 2011. Nucleic Acids Res (2010) 14.68

Ensembl 2012. Nucleic Acids Res (2011) 14.55

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res (2011) 14.04

A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature (2010) 13.82

Molecular mechanisms of long noncoding RNAs. Mol Cell (2011) 11.44

Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell (2011) 10.56

Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans. Proc Natl Acad Sci U S A (2002) 10.28

A high-resolution map of human evolutionary constraint using 29 mammals. Nature (2011) 8.67

Open source system for analyzing, validating, and storing protein identification data. J Proteome Res (2004) 8.11

The PeptideAtlas project. Nucleic Acids Res (2006) 7.87

The vertebrate genome annotation (Vega) database. Nucleic Acids Res (2007) 7.53

AceView: a comprehensive cDNA-supported gene and transcripts annotation. Genome Biol (2006) 7.52

Kalign--an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics (2005) 7.01

Rfam: Wikipedia, clans and the "decimal" release. Nucleic Acids Res (2010) 6.58

An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci U S A (2005) 6.26

RNA regulation: a new genetics? Nat Rev Genet (2004) 5.97

genenames.org: the HGNC resources in 2011. Nucleic Acids Res (2010) 5.77

Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics (2007) 5.68

Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol (2009) 5.45

The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res (2010) 4.90

PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics (2011) 4.50

lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res (2010) 4.49

The GENCODE pseudogene resource. Genome Biol (2012) 4.18

Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat Biotechnol (2011) 4.18

A window into third-generation sequencing. Hum Mol Genet (2010) 4.07

The otter annotation system. Genome Res (2004) 3.88

Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res (2007) 3.82

The reality of pervasive transcription. PLoS Biol (2011) 3.41

Best alpha-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information. Protein Sci (2004) 3.40

Detecting amino acid sites under positive selection and purifying selection. Genetics (2005) 3.35

PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics (2006) 2.85

Genome-wide computational identification and manual annotation of human long noncoding RNA genes. RNA (2010) 2.85

Identification of a microRNA that activates gene expression by repressing nonsense-mediated RNA decay. Mol Cell (2011) 2.73

miRBase: microRNA sequences and annotation. Curr Protoc Bioinformatics (2010) 2.13

Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome. Genome Res (2012) 2.13

Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates. Genome Biol (2010) 2.12

firestar--prediction of functionally important residues using structural templates and alignment reliability. Nucleic Acids Res (2007) 1.91

Conservation of alternative polyadenylation patterns in mammalian genes. BMC Genomics (2006) 1.75

Integrated graphical analysis of protein sequence features predicted from sequence composition. Proteins (2001) 1.65

CAGE (cap analysis of gene expression): a protocol for the detection of promoter and transcriptional networks. Methods Mol Biol (2012) 1.50

firestar--advances in the prediction of functionally important residues. Nucleic Acids Res (2011) 1.43

A transcribed pseudogene of MYLK promotes cell proliferation. FASEB J (2011) 1.37

Determination and validation of principal gene products. Bioinformatics (2007) 1.25

The former annotated human pseudogene dihydrofolate reductase-like 1 (DHFRL1) is expressed and functional. Proc Natl Acad Sci U S A (2011) 1.12

mRNA stability and control of cell proliferation. Biochem Soc Trans (2011) 1.10

Zebrafish as a genomics model for human neurological and polygenic disorders. Dev Neurobiol (2012) 1.04

Using semantic web rules to reason on an ontology of pseudogenes. Bioinformatics (2010) 1.02

AnnoTrack--a tracking system for genome annotation. BMC Genomics (2010) 0.98

Articles by these authors

The human genome browser at UCSC. Genome Res (2002) 168.23

Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 96.15

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 75.09

RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet (2009) 58.77

The transcriptional landscape of the yeast genome defined by RNA sequencing. Science (2008) 48.99

Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res (2005) 44.08

Functional profiling of the Saccharomyces cerevisiae genome. Nature (2002) 36.10

Human-mouse alignments with BLASTZ. Genome Res (2003) 35.49

Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature (2009) 35.48

Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature (2005) 31.60

Paired-end mapping reveals extensive structural variation in the human genome. Science (2007) 30.46

Transcriptional regulatory code of a eukaryotic genome. Nature (2004) 27.21

The UCSC Table Browser data retrieval tool. Nucleic Acids Res (2004) 25.12

The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res (2003) 24.72

Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res (2004) 24.52

Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature (2009) 24.41

Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature (2004) 24.40

Mapping and analysis of chromatin state dynamics in nine human cell types. Nature (2011) 24.37

Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature (2006) 24.29

Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature (2005) 23.04

International network of cancer genome projects. Nature (2010) 20.35

Landscape of transcription in human cells. Nature (2012) 20.18

The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res (2004) 18.75

Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell (2007) 18.35

Evolution of genes and genomes on the Drosophila phylogeny. Nature (2007) 18.01

Global identification of human transcribed sequences with genome tiling arrays. Science (2004) 17.85

Ultraconserved elements in the human genome. Science (2004) 17.14

The UCSC Genome Browser database: update 2010. Nucleic Acids Res (2009) 16.58

Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A (2003) 16.58

The UCSC Genome Browser database: update 2011. Nucleic Acids Res (2010) 16.24

Evolutionary and biomedical insights from the rhesus macaque genome. Science (2007) 16.21

A map of the interactome network of the metazoan C. elegans. Science (2004) 15.60

The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res (2012) 15.41

GENCODE: producing a reference annotation for ENCODE. Genome Biol (2006) 15.08

IntAct: an open source molecular interaction database. Nucleic Acids Res (2004) 15.02

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res (2009) 14.90

Ensembl 2011. Nucleic Acids Res (2010) 14.68

The UCSC Known Genes. Bioinformatics (2006) 14.67

Ensembl 2012. Nucleic Acids Res (2011) 14.55

Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics (2005) 14.50

The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol (2010) 13.99

Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature (2011) 13.18

Long noncoding RNAs with enhancer-like function in human cells. Cell (2010) 13.00

The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res (2003) 12.81

Ensembl 2014. Nucleic Acids Res (2013) 12.62

Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science (2010) 12.39

Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell (2012) 12.32

A systematic survey of loss-of-function variants in human protein-coding genes. Science (2012) 12.25