A beginner's guide to eukaryotic genome annotation.

PubWeight™: 2.67‹?› | Rank: Top 1%

🔗 View Article (PMID 22510764)

Published in Nat Rev Genet on April 18, 2012

Authors

Mark Yandell1, Daniel Ence

Author Affiliations

1: Department of Human Genetics, Eccles Institute of Human Genetics, School of Medicine, University of Utah, Salt Lake City, Utah 84112-5330, USA. myandell@genetics.utah.edu

Articles citing this

(truncated to the top 100)

CISA: contig integrator for sequence assembly of bacterial genomes. PLoS One (2013) 4.62

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience (2013) 4.11

WebAUGUSTUS--a web service for training AUGUSTUS and predicting genes in eukaryotes. Nucleic Acids Res (2013) 1.95

The genome of Chenopodium quinoa. Nature (2017) 1.58

Evidence for extensive horizontal gene transfer from the draft genome of a tardigrade. Proc Natl Acad Sci U S A (2015) 1.51

MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol (2013) 1.48

Resistance gene enrichment sequencing (RenSeq) enables reannotation of the NB-LRR gene family from sequenced plant genomes and rapid mapping of resistance loci in segregating populations. Plant J (2013) 1.41

The Ensembl gene annotation system. Database (Oxford) (2016) 1.31

The banana genome hub. Database (Oxford) (2013) 1.29

An introduction to the analysis of shotgun metagenomic data. Front Plant Sci (2014) 1.27

Mechanisms and dynamics of orphan gene emergence in insect genomes. Genome Biol Evol (2013) 1.13

A field guide to whole-genome sequencing, assembly and annotation. Evol Appl (2014) 1.12

Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens. PLoS One (2013) 1.05

Genome Annotation and Curation Using MAKER and MAKER-P. Curr Protoc Bioinformatics (2014) 1.05

A simple method to control over-alignment in the MAFFT multiple sequence alignment program. Bioinformatics (2016) 1.05

The duplicated genes database: identification and functional annotation of co-localised duplicated genes across genomes. PLoS One (2012) 1.05

Transposable element islands facilitate adaptation to novel environments in an invasive species. Nat Commun (2014) 1.04

The genome sequence and effector complement of the flax rust pathogen Melampsora lini. Front Plant Sci (2014) 1.03

RepARK--de novo creation of repeat libraries from whole-genome NGS reads. Nucleic Acids Res (2014) 1.01

The use of RelocaTE and unassembled short reads to produce high-resolution snapshots of transposable element generated diversity in rice. G3 (Bethesda) (2013) 0.97

Evolution of the eukaryotic dynactin complex, the activator of cytoplasmic dynein. BMC Evol Biol (2012) 0.94

Patterns of sequencing coverage bias revealed by ultra-deep sequencing of vertebrate mitochondria. BMC Genomics (2014) 0.91

Plastid proteome prediction for diatoms and other algae with secondary plastids of the red lineage. Plant J (2015) 0.88

The possibility of de novo assembly of the genome and population genomics of the mangrove rivulus, Kryptolebias marmoratus. Integr Comp Biol (2012) 0.88

Comparative genomics of flatworms (platyhelminthes) reveals shared genomic features of ecto- and endoparastic neodermata. Genome Biol Evol (2014) 0.88

Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes. Plant Physiol (2014) 0.87

RNA-Seq optimization with eQTL gold standards. BMC Genomics (2013) 0.87

AnaLysis of Expression on human chromosome 21, ALE-HSA21: a pilot integrated web resource. Database (Oxford) (2014) 0.86

Machine learning and genome annotation: a match meant to be? Genome Biol (2013) 0.86

Computation as the mechanistic bridge between precision medicine and systems therapeutics. Clin Pharmacol Ther (2012) 0.85

Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-Seq and ESTs. PLoS One (2014) 0.85

Old world monkeys and new age science: the evolution of nonhuman primate systems virology. ILAR J (2013) 0.85

Gene prediction and annotation in Penstemon (Plantaginaceae): A workflow for marker development from extremely low-coverage genome sequencing. Appl Plant Sci (2014) 0.84

Small-scale gene duplications played a major role in the recent evolution of wheat chromosome 3B. Genome Biol (2015) 0.84

Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.). DNA Res (2014) 0.84

Cephalopod genomics: A plan of strategies and organization. Stand Genomic Sci (2012) 0.83

WebScipio: Reconstructing alternative splice variants of eukaryotic proteins. Nucleic Acids Res (2013) 0.83

An improved genome assembly uncovers prolific tandem repeats in Atlantic cod. BMC Genomics (2017) 0.82

The Road to Metagenomics: From Microbiology to DNA Sequencing Technologies and Bioinformatics. Front Genet (2015) 0.82

Divergence of protein-coding capacity and regulation in the Bacillus cereus sensu lato group. BMC Bioinformatics (2014) 0.82

Integrated modeling of protein-coding genes in the Manduca sexta genome using RNA-Seq data from the biochemical model insect. Insect Biochem Mol Biol (2015) 0.82

When plants produce not enough or at all: metabolic engineering of flavonoids in microbial hosts. Front Plant Sci (2015) 0.82

Transcriptome analysis of northern elephant seal (Mirounga angustirostris) muscle tissue provides a novel molecular resource and physiological insights. BMC Genomics (2015) 0.81

From plant genomes to protein families: computational tools. Comput Struct Biotechnol J (2013) 0.80

Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies. Biomed Res Int (2015) 0.80

FRAMA: from RNA-seq data to annotated mRNA assemblies. BMC Genomics (2016) 0.80

Tree shrew database (TreeshrewDB): a genomic knowledge base for the Chinese tree shrew. Sci Rep (2014) 0.79

The diverse applications of RNA-seq for functional genomic studies in Aspergillus fumigatus. Ann N Y Acad Sci (2012) 0.79

Rock, paper, scissors: harnessing complementarity in ortholog detection methods improves comparative genomic inference. G3 (Bethesda) (2015) 0.79

Reconstructing the Backbone of the Saccharomycotina Yeast Phylogeny Using Genome-Scale Data. G3 (Bethesda) (2016) 0.78

Xenopus tropicalis Genome Re-Scaffolding and Re-Annotation Reach the Resolution Required for In Vivo ChIA-PET Analysis. PLoS One (2015) 0.78

Hologenome analysis of two marine sponges with different microbiomes. BMC Genomics (2016) 0.78

Mouse genome annotation by the RefSeq project. Mamm Genome (2015) 0.78

Extension of human lncRNA transcripts by RACE coupled with long-read high-throughput sequencing (RACE-Seq). Nat Commun (2016) 0.78

MEGANTE: a web-based system for integrated plant genome annotation. Plant Cell Physiol (2013) 0.78

Identification and Functional Analysis of the Mycophenolic Acid Gene Cluster of Penicillium roqueforti. PLoS One (2016) 0.78

A robust (re-)annotation approach to generate unbiased mapping references for RNA-seq-based analyses of differential expression across closely related species. BMC Genomics (2016) 0.78

Proteomics informed by transcriptomics for characterising active transposable elements and genome annotation in Aedes aegypti. BMC Genomics (2017) 0.78

The state of play in higher eukaryote gene annotation. Nat Rev Genet (2016) 0.77

Cloning and Functional Characterization of Two BTB Genes in the Predatory Mite Metaseiulus occidentalis. PLoS One (2015) 0.77

The Evolution of Orphan Regions in Genomes of a Fungal Pathogen of Wheat. MBio (2016) 0.77

Death of a dogma: eukaryotic mRNAs can code for more than one protein. Nucleic Acids Res (2015) 0.77

Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment. BMC Bioinformatics (2014) 0.77

Development of an undergraduate bioinformatics degree program at a liberal arts college. Yale J Biol Med (2012) 0.77

Comparative transcriptome profiling approach to glean virulence and immunomodulation-related genes of Fasciola hepatica. BMC Genomics (2015) 0.76

Transcriptomics of diapause in an isogenic self-fertilizing vertebrate. BMC Genomics (2015) 0.76

BEACON: automated tool for Bacterial GEnome Annotation ComparisON. BMC Genomics (2015) 0.76

GeneValidator: identify problems with protein-coding gene predictions. Bioinformatics (2016) 0.76

Functional Characterization of New Polyketide Synthase Genes Involved in Ochratoxin A Biosynthesis in Aspergillus Ochraceus fc-1. Toxins (Basel) (2015) 0.76

xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud. Plant Cell (2016) 0.76

Challenges, Solutions, and Quality Metrics of Personal Genome Assembly in Advancing Precision Medicine. Pharmaceutics (2016) 0.76

Bidirectional-genetics platform, a dual-purpose mutagenesis strategy for filamentous fungi. Eukaryot Cell (2013) 0.76

Phylogenomic Insights into Mouse Evolution Using a Pseudoreference Approach. Genome Biol Evol (2017) 0.76

Increased taxon sampling reveals thousands of hidden orthologs in flatworms. Genome Res (2017) 0.75

Draft Assembly of Elite Inbred Line PH207 Provides Insights into Genomic and Transcriptome Diversity in Maize. Plant Cell (2016) 0.75

Dual use of peptide mass spectra: Protein atlas and genome annotation. Curr Plant Biol (2015) 0.75

Harnessing cross-species alignment to discover SNPs and generate a draft genome sequence of a bighorn sheep (Ovis canadensis). BMC Genomics (2015) 0.75

Functional Annotations of Paralogs: A Blessing and a Curse. Life (Basel) (2016) 0.75

Resequencing and annotation of the Nostoc punctiforme ATTC 29133 genome: facilitating biofuel and high-value chemical production. AMB Express (2017) 0.75

The Genome of a Southern Hemisphere Seagrass Species (Zostera muelleri). Plant Physiol (2016) 0.75

Using intron position conservation for homology-based gene prediction. Nucleic Acids Res (2016) 0.75

Students' perspective on genomics: from sample to sequence using the case study of blueberry. Front Genet (2013) 0.75

Computational Identification of Novel Genes: Current and Future Perspectives. Bioinform Biol Insights (2016) 0.75

The genomic mosaicism of hybrid speciation. Sci Adv (2017) 0.75

YeATSAM analysis of the walnut and chickpea transcriptome reveals key genes undetected by current annotation tools. F1000Res (2016) 0.75

Tiling Assembly: a new tool for reference annotation-independent transcript assembly and novel gene identification by RNA-sequencing. DNA Res (2015) 0.75

Tissue resolved, gene structure refined equine transcriptome. BMC Genomics (2017) 0.75

ncRNA orthologies in the vertebrate lineage. Database (Oxford) (2016) 0.75

Improved annotation with de novo transcriptome assembly in four social amoeba species. BMC Genomics (2017) 0.75

Contrasting Patterns in the Evolution of Vertebrate MLX Interacting Protein (MLXIP) and MLX Interacting Protein-Like (MLXIPL) Genes. PLoS One (2016) 0.75

GASS: genome structural annotation for Eukaryotes based on species similarity. BMC Genomics (2015) 0.75

Genome annotation of a Saccharomyces sp. lager brewer's yeast. Genom Data (2016) 0.75

Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA. F1000Res (2017) 0.75

Misannotation Awareness: A Tale of Two Gene-Groups. Front Plant Sci (2016) 0.75

A high-quality annotated transcriptome of swine peripheral blood. BMC Genomics (2017) 0.75

OrthoFiller: utilising data from multiple species to improve the completeness of genome annotations. BMC Genomics (2017) 0.75

Transcriptome sequencing based annotation and homologous evidence based scaffolding of Anguilla japonica draft genome. BMC Genomics (2016) 0.75

Fold-specific sequence scoring improves protein sequence matching. BMC Bioinformatics (2016) 0.75

Coding sequence density estimation via topological pressure. J Math Biol (2014) 0.75

Proteomics technique opens new frontiers in mobilome research. Mob Genet Elements (2017) 0.75

Articles cited by this

(truncated to the top 100)

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

Basic local alignment search tool. J Mol Biol (1990) 659.07

Initial sequencing and analysis of the human genome. Nature (2001) 212.86

Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A (1988) 193.60

tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res (1997) 142.55

Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods (2008) 126.81

BLAT--the BLAST-like alignment tool. Genome Res (2002) 126.78

The sequence of the human genome. Science (2001) 101.55

RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res (2007) 85.81

TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (2009) 81.13

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol (2010) 75.21

The genome sequence of Drosophila melanogaster. Science (2000) 74.32

Prediction of complete gene structures in human genomic DNA. J Mol Biol (1997) 58.76

The Bioperl toolkit: Perl modules for the life sciences. Genome Res (2002) 58.63

Prediction of mammalian microRNA targets. Cell (2003) 53.70

Artemis: sequence visualization and annotation. Bioinformatics (2000) 46.68

De novo assembly of human genomes with massively parallel short read sequencing. Genome Res (2009) 45.91

ABySS: a parallel assembler for short read sequence data. Genome Res (2009) 43.20

Integrative genomics viewer. Nat Biotechnol (2011) 42.83

The generic genome browser: a building block for a model organism system database. Genome Res (2002) 42.64

Finishing the euchromatic sequence of the human genome. Nature (2004) 41.40

The Pfam protein families database. Nucleic Acids Res (2009) 37.98

BLAST+: architecture and applications. BMC Bioinformatics (2009) 36.53

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc (2012) 35.75

Real-time quantification of microRNAs by stem-loop RT-PCR. Nucleic Acids Res (2005) 29.17

GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res (1998) 25.21

Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics (2005) 24.54

Rfam: an RNA family database. Nucleic Acids Res (2003) 22.93

A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res (1998) 22.69

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2008) 21.36

The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res (2005) 19.81

Ab initio gene finding in Drosophila genomic DNA. Genome Res (2000) 19.23

Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol (2010) 18.44

The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol (2005) 18.20

Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics (2010) 17.53

GenBank. Nucleic Acids Res (2008) 13.29

Infernal 1.0: inference of RNA alignments. Bioinformatics (2009) 12.98

CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics (2007) 12.21

Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics (2000) 11.75

Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res (2010) 11.32

Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res (2011) 11.32

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res (2003) 11.03

A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics (2002) 10.84

Apollo: a sequence annotation editor. Genome Biol (2002) 10.77

Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature (2008) 10.71

MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res (2010) 9.80

Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics (2003) 8.92

Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol (2002) 8.59

The impact of retrotransposons on human genome evolution. Nat Rev Genet (2009) 8.08

Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol (2002) 8.07

De novo identification of repeat families in large genomes. Bioinformatics (2005) 7.95

Integrating genomic homology into gene structure prediction. Bioinformatics (2001) 7.92

Annotating genomes with massive-scale RNA sequencing. Genome Biol (2008) 7.73

Genomics. Genome project standards in a new era of sequencing. Science (2009) 7.72

Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods (2011) 7.52

Genie--gene finding in Drosophila melanogaster. Genome Res (2000) 7.47

EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol (2006) 7.06

Computational inference of homologous gene structures in the human genome. Genome Res (2001) 6.96

JBrowse: a next-generation genome browser. Genome Res (2009) 6.77

GeneID in Drosophila. Genome Res (2000) 6.61

MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res (2007) 6.48

Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res (2002) 6.36

WindowMasker: window-based masker for sequenced genomes. Bioinformatics (2005) 5.91

Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol (2010) 5.79

Swiss-Prot: juggling between evolution and stability. Brief Bioinform (2004) 5.73

Spidey: a tool for mRNA-to-genomic alignments. Genome Res (2001) 5.61

Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet (2011) 5.58

MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics (2011) 5.35

Prediction of gene structure. J Mol Biol (1992) 5.31

An integrated computational pipeline and database to support whole-genome sequence annotation. Genome Biol (2002) 5.24

Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics (2006) 5.23

Splign: algorithms for computing spliced alignments with identification of paralogs. Biol Direct (2008) 5.03

ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics (2009) 4.98

Ab initio gene identification in metagenomic sequences. Nucleic Acids Res (2010) 4.47

Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res (2008) 4.38

Direct RNA sequencing. Nature (2009) 4.37

Creating a honey bee consensus gene set. Genome Biol (2007) 4.17

VectorBase: a data resource for invertebrate vector genomics. Nucleic Acids Res (2008) 3.73

Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol (2008) 3.73

Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation. Nucleic Acids Res (2006) 3.62

A universal classification of eukaryotic transposable elements implemented in Repbase. Nat Rev Genet (2008) 3.17

Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics (2000) 3.14

Plant DNA flow cytometry and estimation of nuclear genome size. Ann Bot (2005) 3.13

InterPro and InterProScan: tools for protein sequence classification and comparison. Methods Mol Biol (2007) 3.05

Gramene database in 2010: updates and extensions. Nucleic Acids Res (2010) 2.70

PlantGDB: a resource for comparative plant genomics. Nucleic Acids Res (2007) 2.61

Long noncoding RNA in genome regulation: prospects and mechanisms. RNA Biol (2010) 2.53

Comparative analysis of processed pseudogenes in the mouse and human genomes. Trends Genet (2004) 2.51

JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics (2005) 2.37

Integrated pseudogene annotation for human chromosome 22: evidence for transcription. J Mol Biol (2005) 2.32

GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. Genome Res (2002) 2.20

The genome sequence of the leaf-cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle. PLoS Genet (2011) 2.16

Hymenoptera Genome Database: integrated community resources for insect species of the order Hymenoptera. Nucleic Acids Res (2010) 2.15

Genome annotation past, present, and future: how to define an ORF at each locus. Genome Res (2005) 2.11

CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biol (2007) 2.01

r2cat: synteny plots and comparative assembly. Bioinformatics (2009) 1.95

Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile). Proc Natl Acad Sci U S A (2011) 1.94

Molecular characterization of the Drosophila genome. Genetics (1969) 1.72

EGASP: collaboration through competition to find human genes. Nat Methods (2005) 1.72

The genome of the leaf-cutting ant Acromyrmex echinatior suggests key adaptations to advanced social life and fungus farming. Genome Res (2011) 1.70