Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.

PubWeight™: 2.70‹?› | Rank: Top 1%

🔗 View Article (PMC 2291194)

Published in PLoS Comput Biol on April 18, 2008

Authors

Michael F Lin1, Ameya N Deoras, Matthew D Rasmussen, Manolis Kellis

Author Affiliations

1: Broad Institute of MIT and Harvard University, Cambridge, Massachusetts, United States of America.

Articles citing this

Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A (2009) 20.66

Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol (2010) 18.44

Genome regulation by long noncoding RNAs. Annu Rev Biochem (2012) 11.70

Modular regulatory principles of large non-coding RNAs. Nature (2012) 7.42

Extensive and coordinated transcription of noncoding RNAs within cell-cycle promoters. Nat Genet (2011) 5.62

PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics (2011) 4.50

CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res (2013) 2.88

Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related Drosophila species. PLoS Biol (2010) 2.32

Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep (2014) 2.04

Regulation of mammalian cell differentiation by long non-coding RNAs. EMBO Rep (2012) 1.82

Evidence of abundant stop codon readthrough in Drosophila and other metazoa. Genome Res (2011) 1.60

Cooperativity and rapid evolution of cobound transcription factors in closely related mammals. Cell (2013) 1.52

RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data. RNA (2011) 1.42

Functional Translational Readthrough: A Systems Biology Perspective. PLoS Genet (2016) 1.39

Potential roles of microRNAs in regulating long intergenic noncoding RNAs. BMC Med Genomics (2013) 1.19

Extensive divergence of transcription factor binding in Drosophila embryos with highly conserved gene expression. PLoS Genet (2013) 1.17

A genome-wide survey of maternal and embryonic transcripts during Xenopus tropicalis development. BMC Genomics (2013) 0.98

Comparison of RefSeq protein-coding regions in human and vertebrate genomes. BMC Genomics (2013) 0.84

A Brief Review: The Z-curve Theory and its Application in Genome Analysis. Curr Genomics (2014) 0.83

Long non-coding RNAs in haematological malignancies. Int J Mol Sci (2013) 0.83

Type I Interferon Regulates the Expression of Long Non-Coding RNAs. Front Immunol (2014) 0.83

Recognition of Protein-coding Genes Based on Z-curve Algorithms. Curr Genomics (2014) 0.78

Statistical assessment of discriminative features for protein-coding and non coding cross-species conserved sequence elements. BMC Bioinformatics (2009) 0.77

A rebuttal to the comments on the genome order index and the Z-curve. Biol Direct (2011) 0.76

Assessing Recent Selection and Functionality at Long Noncoding RNA Loci in the Mouse Genome. Genome Biol Evol (2015) 0.76

Coding sequence density estimation via topological pressure. J Math Biol (2014) 0.75

Evolutionary Dynamics of Regulatory Changes Underlying Gene Expression Divergence among Saccharomyces Species. Genome Biol Evol (2017) 0.75

Articles cited by this

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet (2000) 336.52

The human genome browser at UCSC. Genome Res (2002) 168.23

A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol (2003) 102.57

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 75.09

The genome sequence of Drosophila melanogaster. Science (2000) 74.32

Improved microbial gene identification with GLIMMER. Nucleic Acids Res (1999) 51.34

PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci (1997) 45.07

Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol (1986) 36.06

Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature (2003) 29.16

Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res (2004) 24.52

The genome sequence of the malaria mosquito Anopheles gambiae. Science (2002) 20.36

Evolution of genes and genomes on the Drosophila phylogeny. Nature (2007) 18.01

Insights into social insects from the genome of the honeybee Apis mellifera. Nature (2006) 13.67

Comparative analyses of multi-species sequences from targeted genomic regions. Nature (2003) 13.31

Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol (2000) 13.01

CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Evol (1999) 12.64

Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature (2007) 11.66

Identification and characterization of multi-species conserved sequences. Genome Res (2003) 10.18

Statistical methods for detecting molecular adaptation. Trends Ecol Evol (2000) 10.10

Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science (2002) 9.43

Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS One (2007) 8.70

Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol (2002) 8.59

Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. Genome Res (2005) 8.38

Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res (2000) 8.28

Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci U S A (2007) 8.00

Integrating genomic homology into gene structure prediction. Bioinformatics (2001) 7.92

An analysis of the feasibility of short read sequencing. Nucleic Acids Res (2005) 6.10

MAVID: constrained ancestral alignment of multiple sequences. Genome Res (2004) 5.83

Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res (2007) 4.73

A model of the statistical power of comparative genome sequence analysis. PLoS Biol (2005) 4.54

An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. Proc Natl Acad Sci U S A (2005) 4.38

Assessment of protein coding measures. Nucleic Acids Res (1992) 4.32

SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res (2003) 4.17

Comparative gene prediction in human and mouse. Genome Res (2003) 4.13

Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet (2002) 3.97

Gene expression and molecular evolution. Curr Opin Genet Dev (2001) 3.27

Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome. Genome Biol (2002) 3.09

Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys Rev Lett (1992) 3.05

Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA. Nat Cell Biol (2007) 3.04

Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics (2002) 2.99

Prediction of probable genes by Fourier analysis of genomic sequences. Comput Appl Biosci (1997) 2.96

The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study. Genome Res (2002) 2.84

Peptides encoded by short ORFs control development and define a new eukaryotic gene family. PLoS Biol (2007) 2.54

Using multiple alignments to improve gene prediction. J Comput Biol (2006) 2.48

Methods in comparative genomics: genome correspondence, gene identification and regulatory motif discovery. J Comput Biol (2004) 2.20

Genome annotation past, present, and future: how to define an ORF at each locus. Genome Res (2005) 2.11

CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biol (2007) 2.01

The abundance of short proteins in the mammalian proteome. PLoS Genet (2006) 1.96

Discrimination of non-protein-coding transcripts from protein-coding mRNA. RNA Biol (2006) 1.82

Distinguishing protein-coding from non-coding RNAs through support vector machines. PLoS Genet (2006) 1.81

Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. Nucleic Acids Res (2000) 1.70

Drosophila biology in the genomic age. Genetics (2007) 1.66

Analysis of small human proteins reveals the translation of upstream open reading frames of mRNAs. Genome Res (2004) 1.63

Global discriminative learning for higher-accuracy computational gene prediction. PLoS Comput Biol (2007) 1.60

Accurate gene-tree reconstruction by learning gene- and species-specific substitution rates across multiple complete genomes. Genome Res (2007) 1.57

Comparison of various algorithms for recognizing short coding sequences of human genes. Bioinformatics (2004) 1.39

The role of population size in molecular evolution. Theor Popul Biol (1999) 1.33

Tip of another iceberg: Drosophila serpins. Trends Cell Biol (2005) 1.29

Conrad: gene prediction using conditional random fields. Genome Res (2007) 1.26

Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis. Nucleic Acids Res (2003) 1.21

Nested genes in the human genome. Genomics (2005) 1.18

A screen for immunity genes evolving under positive selection in Drosophila. J Evol Biol (2007) 1.18

Human-mouse gene identification by comparative evidence integration and evolutionary analysis. Genome Res (2003) 1.14

Computation and analysis of genomic multi-sequence alignments. Annu Rev Genomics Hum Genet (2007) 1.11

Associations between human disease genes and overlapping gene groups and multiple amino acid runs. Proc Natl Acad Sci U S A (2002) 1.09

In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists. Bioinformatics (2007) 1.00

Should we expect substitution rate to depend on population size? Genetics (1998) 0.99

Articles by these authors

Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature (2009) 35.48

Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature (2005) 31.60

Transcriptional regulatory code of a eukaryotic genome. Nature (2004) 27.21

Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature (2009) 24.41

Mapping and analysis of chromatin state dynamics in nine human cell types. Nature (2011) 24.37

Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature (2005) 23.04

GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res (2012) 19.19

Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell (2007) 18.35

Evolution of genes and genomes on the Drosophila phylogeny. Nature (2007) 18.01

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res (2009) 14.90

The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol (2010) 13.99

Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science (2010) 12.39

Unlocking the secrets of the genome. Nature (2009) 11.80

Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature (2007) 11.66

Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature (2004) 11.03

HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res (2011) 11.03

Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol (2010) 9.84

Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res (2007) 9.61

RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo. Nat Genet (2007) 9.56

ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res (2012) 9.13

A high-resolution map of human evolutionary constraint using 29 mammals. Nature (2011) 8.67

Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci U S A (2007) 8.00

An endogenous small interfering RNA pathway in Drosophila. Nature (2008) 7.88

ChromHMM: automating chromatin-state discovery and characterization. Nat Methods (2012) 7.66

Wisdom of crowds for robust gene network inference. Nat Methods (2012) 6.91

Comprehensive analysis of the chromatin landscape in Drosophila melanogaster. Nature (2010) 6.87

FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. N Engl J Med (2015) 6.48

Comparative functional genomics of the fission yeasts. Science (2011) 6.00

Evolution of pathogenicity and sexual reproduction in eight Candida genomes. Nature (2009) 5.90

Genome analysis of the platypus reveals unique signatures of evolution. Nature (2008) 5.74

Extensive and coordinated transcription of noncoding RNAs within cell-cycle promoters. Nat Genet (2011) 5.62

Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sites. Proc Natl Acad Sci U S A (2007) 4.91

Reliable prediction of regulator targets using 12 Drosophila genomes. Genome Res (2007) 4.89

A cis-regulatory map of the Drosophila genome. Nature (2011) 4.80

Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res (2007) 4.73

Whole-genome ChIP-chip analysis of Dorsal, Twist, and Snail suggests integration of diverse patterning processes in the Drosophila embryo. Genes Dev (2007) 4.59

PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics (2011) 4.50

Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res (2007) 4.28

Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res (2012) 3.80

Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat Biotechnol (2012) 3.52

A comprehensive map of insulator elements for the Drosophila genome. PLoS Genet (2010) 3.25

Combinatorial patterning of chromatin regulators uncovered by genome-wide location analysis in human cells. Cell (2011) 3.20

The tissue-specific lncRNA Fendrr is an essential regulator of heart and body wall development in the mouse. Dev Cell (2013) 3.14

Interpreting noncoding genetic variation in complex traits and human disease. Nat Biotechnol (2012) 3.01

Extensive variation in chromatin states across humans. Science (2013) 2.83

Evidence of abundant purifying selection in humans for recently acquired regulatory functions. Science (2012) 2.81

Position specific variation in the rate of evolution in transcription factor binding sites. BMC Evol Biol (2003) 2.67

Multiple knockout mouse models reveal lincRNAs are required for life and brain development. Elife (2013) 2.66

Common genetic variants modulate pathogen-sensing responses in human dendritic cells. Science (2014) 2.63

An epigenetic signature for monoallelic olfactory receptor expression. Cell (2011) 2.56

The Tasmanian devil transcriptome reveals Schwann cell origins of a clonally transmissible cancer. Science (2010) 2.53

A single Hox locus in Drosophila produces functional microRNAs from opposite DNA strands. Genes Dev (2008) 2.45

Three periods of regulatory innovation during vertebrate evolution. Science (2011) 2.09

Dynamics of the epigenetic landscape during erythroid differentiation after GATA1 restoration. Genome Res (2011) 2.07

Network deconvolution as a general method to distinguish direct dependencies in networks. Nat Biotechnol (2013) 2.05

Constitutive nuclear lamina-genome interactions are highly conserved and associated with A/T-rich sequence. Genome Res (2012) 2.04

Common variants at 9p21 and 8q22 are associated with increased susceptibility to optic nerve degeneration in glaucoma. PLoS Genet (2012) 2.00

Analysis of variation at transcription factor binding sites in Drosophila and humans. Genome Biol (2012) 1.97

Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res (2013) 1.96

Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes. Genome Res (2011) 1.91

Long noncoding RNAs regulate adipogenesis. Proc Natl Acad Sci U S A (2013) 1.90

Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res (2013) 1.78

Genomic evidence for ameiotic evolution in the bdelloid rotifer Adineta vaga. Nature (2013) 1.76

Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals. Genome Res (2014) 1.69

RFECS: a random-forest based algorithm for enhancer identification from chromatin state. PLoS Comput Biol (2013) 1.66

Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics (2012) 1.62

Evidence of abundant stop codon readthrough in Drosophila and other metazoa. Genome Res (2011) 1.60

Accurate gene-tree reconstruction by learning gene- and species-specific substitution rates across multiple complete genomes. Genome Res (2007) 1.57

Conservation of small RNA pathways in platypus. Genome Res (2008) 1.51

A Bayesian approach for fast and accurate gene tree reconstruction. Mol Biol Evol (2010) 1.49

Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks. Genome Res (2012) 1.47

Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res (2012) 1.44

Phylogenetically and spatially conserved word pairs associated with gene-expression changes in yeasts. Genome Biol (2003) 1.40

Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types. Genome Res (2013) 1.33

New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes. Genome Res (2011) 1.32

Error and error mitigation in low-coverage genome assemblies. PLoS One (2011) 1.31

Sharing and Specificity of Co-expression Networks across 35 Human Tissues. PLoS Comput Biol (2015) 1.27

RNA folding with soft constraints: reconciliation of probing data and thermodynamic secondary structure prediction. Nucleic Acids Res (2012) 1.27

Linking DNA methyltransferases to epigenetic marks and nucleosome structure genome-wide in human tumor cells. Cell Rep (2012) 1.27

SubMAP: aligning metabolic pathways with subnetwork mappings. J Comput Biol (2011) 1.24

Disruption of a large intergenic noncoding RNA in subjects with neurodevelopmental disabilities. Am J Hum Genet (2012) 1.07

The evolutionary dynamics of the Saccharomyces cerevisiae protein interaction network after duplication. Proc Natl Acad Sci U S A (2008) 1.05

Spatial expression of transcription factors in Drosophila embryonic organ development. Genome Biol (2013) 1.05

Evolutionary principles of modular gene regulation in yeasts. Elife (2013) 1.05

TreeFix: statistically informed gene tree error correction using species trees. Syst Biol (2012) 1.03

Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees. Genome Res (2013) 1.01

Computational analysis of noncoding RNAs. Wiley Interdiscip Rev RNA (2012) 1.01

Reconciliation revisited: handling multiple optima when reconciling with duplication, transfer, and loss. J Comput Biol (2013) 0.97

Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue. J Proteome Res (2014) 0.95

Arboretum: reconstruction and analysis of the evolutionary history of condition-specific transcriptional modules. Genome Res (2013) 0.94

Integrating and mining the chromatin landscape of cell-type specificity using self-organizing maps. Genome Res (2013) 0.90

Evolution at the subgene level: domain rearrangements in the Drosophila phylogeny. Mol Biol Evol (2011) 0.88