Prediction of complete gene structures in human genomic DNA.

PubWeight™: 58.76‹?› | Rank: Top 0.01% | All-Time Top 1000

🔗 View Article (PMID 9149143)

Published in J Mol Biol on April 25, 1997

Authors

C Burge1, S Karlin

Author Affiliations

1: Department of Mathematics, Stanford University, CA 94305, USA.

Articles citing this

(truncated to the top 100)

The human genome browser at UCSC. Genome Res (2002) 168.23

The Bioperl toolkit: Perl modules for the life sciences. Genome Res (2002) 58.63

The Ensembl genome database project. Nucleic Acids Res (2002) 40.87

The UCSC Genome Browser Database. Nucleic Acids Res (2003) 32.84

GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res (1998) 25.21

A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res (1998) 22.69

Ab initio gene finding in Drosophila genomic DNA. Genome Res (2000) 19.23

ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature (2009) 18.38

GeneWise and Genomewise. Genome Res (2004) 17.87

PipMaker--a web server for aligning two genomic DNA sequences. Genome Res (2000) 17.46

The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Res (2003) 15.27

GENCODE: producing a reference annotation for ENCODE. Genome Biol (2006) 15.08

Taverna: a tool for building and running workflows of services. Nucleic Acids Res (2006) 14.90

Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res (2003) 12.26

The Ensembl automatic gene annotation system. Genome Res (2004) 12.24

REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res (2001) 11.88

Genome annotation assessment in Drosophila melanogaster. Genome Res (2000) 11.77

Apollo: a sequence annotation editor. Genome Biol (2002) 10.77

Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol (2004) 10.59

Identification and characterization of multi-species conserved sequences. Genome Res (2003) 10.18

Gene finding in novel genomes. BMC Bioinformatics (2004) 8.64

Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol (2002) 8.59

Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. Genome Res (2005) 8.38

Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res (2000) 8.28

SGP-1: prediction and validation of homologous genes based on sequence alignments. Genome Res (2001) 8.15

Genome sequence and analysis of the tuber crop potato. Nature (2011) 7.77

The Genomes of Oryza sativa: a history of duplications. PLoS Biol (2005) 7.67

Using GeneWise in the Drosophila annotation experiment. Genome Res (2000) 7.50

Genie--gene finding in Drosophila melanogaster. Genome Res (2000) 7.47

Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res (2007) 7.34

RefSeq: an update on mammalian reference sequences. Nucleic Acids Res (2013) 7.29

Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics (2001) 7.07

The Vertebrate Genome Annotation (Vega) database. Nucleic Acids Res (2005) 7.06

EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol (2006) 7.06

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res (2007) 7.05

Computational inference of homologous gene structures in the human genome. Genome Res (2001) 6.96

Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res (1998) 6.95

Hd1, a major photoperiod sensitivity quantitative trait locus in rice, is closely related to the Arabidopsis flowering time gene CONSTANS. Plant Cell (2000) 6.93

Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics. Proc Natl Acad Sci U S A (2003) 6.87

The transcriptional activity of human Chromosome 22. Genes Dev (2003) 6.82

The Arabidopsis abscisic acid response gene ABI5 encodes a basic leucine zipper transcription factor. Plant Cell (2000) 6.18

A millennial myosin census. Mol Biol Cell (2001) 6.13

An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region. Genetics (1999) 5.87

MAVID: constrained ancestral alignment of multiple sequences. Genome Res (2004) 5.83

Distinguishing regulatory DNA from neutral sites. Genome Res (2003) 5.63

DIAN: a novel algorithm for genome ontological classification. Genome Res (2001) 5.55

Discovery of tissue-specific exons using comprehensive human exon microarrays. Genome Biol (2007) 5.53

Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res (2001) 5.38

An integrated computational pipeline and database to support whole-genome sequence annotation. Genome Biol (2002) 5.24

Sodium taurocholate cotransporting polypeptide is a functional receptor for human hepatitis B and D virus. Elife (2012) 5.18

Additivity in protein-DNA interactions: how good an approximation is it? Nucleic Acids Res (2002) 5.04

Bone dysplasia sclerosteosis results from loss of the SOST gene product, a novel cystine knot-containing protein. Am J Hum Genet (2001) 4.97

Genome structure of the legume, Lotus japonicus. DNA Res (2008) 4.93

Full-length messenger RNA sequences greatly improve genome annotation. Genome Biol (2002) 4.90

Comparing sequenced segments of the tomato and Arabidopsis genomes: large-scale duplication followed by selective gene loss creates a network of synteny. Proc Natl Acad Sci U S A (2000) 4.90

The UCSC Genome Browser database: 2015 update. Nucleic Acids Res (2014) 4.87

Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res (2007) 4.73

An assessment of gene prediction accuracy in large DNA sequences. Genome Res (2000) 4.71

A computational analysis of sequence features involved in recognition of short introns. Proc Natl Acad Sci U S A (2001) 4.68

Initial sequence and comparative analysis of the cat genome. Genome Res (2007) 4.67

Computational definition of sequence motifs governing constitutive exon splicing. Genes Dev (2004) 4.59

Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes. Nucleic Acids Res (2006) 4.51

Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res (2008) 4.38

An intermediate grade of finished genomic sequence suitable for comparative analyses. Genome Res (2004) 4.38

Variation analysis and gene annotation of eight MHC haplotypes: the MHC Haplotype Project. Immunogenetics (2008) 4.25

Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release. BMC Biol (2005) 4.18

SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res (2003) 4.17

AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res (2005) 4.13

Comparative gene prediction in human and mouse. Genome Res (2003) 4.13

AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res (2006) 4.11

Rat Genome Database (RGD): mapping disease onto the genome. Nucleic Acids Res (2002) 4.04

Evaluation of gene-finding programs on mammalian sequences. Genome Res (2001) 4.02

The institute for genomic research Osa1 rice genome annotation database. Plant Physiol (2005) 3.96

Cross-species sequence comparisons: a review of methods and available resources. Genome Res (2003) 3.90

The complete sequence of 340 kb of DNA around the rice Adh1-adh2 region reveals interrupted colinearity with maize chromosome 4. Plant Cell (2000) 3.84

Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci U S A (2001) 3.81

Biased chromatin signatures around polyadenylation sites and exons. Mol Cell (2009) 3.74

Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc Natl Acad Sci U S A (2003) 3.73

Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol (2008) 3.73

GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res (2001) 3.71

Transmembrane topology and signal peptide prediction using dynamic bayesian networks. PLoS Comput Biol (2008) 3.56

Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res (2005) 3.49

Biopipe: a flexible framework for protocol-based bioinformatics analysis. Genome Res (2003) 3.45

Mutations of the protocadherin gene PCDH15 cause Usher syndrome type 1F. Am J Hum Genet (2001) 3.42

Inferring loss-of-heterozygosity from unpaired tumors using high-density oligonucleotide SNP arrays. PLoS Comput Biol (2006) 3.39

Ehd1, a B-type response regulator in rice, confers short-day promotion of flowering and controls FT-like gene expression independently of Hd1. Genes Dev (2004) 3.35

KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites. Nucleic Acids Res (2005) 3.28

Germline mutations in RAD51D confer susceptibility to ovarian cancer. Nat Genet (2011) 3.25

A probabilistic disease-gene finder for personal genomes. Genome Res (2011) 3.23

PKHD1, the polycystic kidney and hepatic disease 1 gene, encodes a novel large protein containing multiple immunoglobulin-like plexin-transcription-factor domains and parallel beta-helix 1 repeats. Am J Hum Genet (2002) 3.21

Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map. Genome Res (2003) 3.18

Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinformatics (2006) 3.15

Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res (2002) 3.12

Genetic and physiological data implicating the new human gene G72 and the gene for D-amino acid oxidase in schizophrenia. Proc Natl Acad Sci U S A (2002) 3.11

Comparative genomics of Brassica oleracea and Arabidopsis thaliana reveal gene loss, fragmentation, and dispersal after polyploidy. Plant Cell (2006) 3.10

Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome. Genome Biol (2002) 3.09

Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities. Genome Biol (2011) 3.09

An integrated gene annotation and transcriptional profiling approach towards the full gene content of the Drosophila genome. Genome Biol (2003) 3.08

Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics (2004) 3.08

Annotation of the Arabidopsis genome. Plant Physiol (2003) 3.06

Articles by these authors

(truncated to the top 100)

Finding the genes in genomic DNA. Curr Opin Struct Biol (1998) 7.31

Linkage and selection: two locus symmetric viability model. Theor Popul Biol (1970) 5.90

Over- and under-representation of short oligonucleotides in DNA sequences. Proc Natl Acad Sci U S A (1992) 5.55

General two-locus selection models: some objectives, results and interpretations. Theor Popul Biol (1975) 5.16

Polymorphisms for genetic and ecological systems with weak coupling. Theor Popul Biol (1972) 4.47

Towards a theory of the evolution of modifier genes. Theor Popul Biol (1974) 3.79

Application of method of small parameters to multi-niche population genetic models. Theor Popul Biol (1972) 3.72

Rates and probabilities of fixation for two locus random mating finite populations without selection. Genetics (1968) 3.64

Methods and algorithms for statistical analysis of protein sequences. Proc Natl Acad Sci U S A (1992) 3.63

Numerical studies on two-loci selection models with general viabilities. Theor Popul Biol (1975) 3.20

New approaches for computer analysis of nucleic acid sequences. Proc Natl Acad Sci U S A (1983) 2.85

Association of charge clusters with functional domains of cellular transcription factors. Proc Natl Acad Sci U S A (1989) 2.58

On mutation selection balance for two-locus haploid and diploid populations. Theor Popul Biol (1971) 2.57

Random temporal variation in selection intensities: case of large population size. Theor Popul Biol (1974) 2.46

Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA. Proc Natl Acad Sci U S A (1999) 2.38

Pervasive CpG suppression in animal mitochondrial genomes. Proc Natl Acad Sci U S A (1994) 2.35

Genome-scale compositional comparisons in eukaryotes. Genome Res (2001) 2.32

Why are human G-protein-coupled receptors predominantly intronless? Trends Genet (1999) 2.31

Further analysis of negative assortative mating. Genetics (1968) 2.27

Human cytomegalovirus origin of DNA replication (oriLyt) resides within a highly complex repetitive region. Proc Natl Acad Sci U S A (1992) 2.09

Identification of significant sequence patterns in proteins. Methods Enzymol (1990) 1.99

Statistical methods for assessing linkage disequilibrium at the HLA-A, B, C loci. Ann Hum Genet (1981) 1.93

Analysis of biochemical genetic data on Jewish populations: II. Results and interpretations of heterogeneity indices and distance measures with respect to standards. Am J Hum Genet (1979) 1.91

Strand compositional asymmetry in bacterial and large viral genomes. Proc Natl Acad Sci U S A (1998) 1.84

Detecting alien genes in bacterial genomes. Ann N Y Acad Sci (1999) 1.80

Theoretical models of genetic map functions. Theor Popul Biol (1984) 1.76

Similarities and dissimilarities of phage genomes. Proc Natl Acad Sci U S A (1996) 1.67

Evolutionary comparisons of RecA-like proteins across all major kingdoms of living organisms. J Mol Evol (1997) 1.67

Sibling and parent--offspring correlation estimation with variable family size. Proc Natl Acad Sci U S A (1981) 1.63

Very long charge runs in systemic lupus erythematosus-associated autoantigens. Proc Natl Acad Sci U S A (1991) 1.58

Clusters of charged residues in protein three-dimensional structures. Proc Natl Acad Sci U S A (1996) 1.50

Measures of residue density in protein structures. Proc Natl Acad Sci U S A (1999) 1.46

Analysis of genetic data on Jewish populations. I. Historical background, demographic features, and genetic markers. Am J Hum Genet (1979) 1.43

Analysis of models with homozygote x heterozygote matings. Genetics (1968) 1.42

Conservation among HSP60 sequences in relation to structure, function, and evolution. Protein Sci (2000) 1.41

Structured exploratory data analysis (SEDA) for determining mode of inheritance of quantitative traits. I. Simulation studies on the effect of background distributions. Am J Hum Genet (1981) 1.40

Linkage and selection: new equilibrium properties of the two-locus symmetric viability model. Proc Natl Acad Sci U S A (1969) 1.40

Highly expressed and alien genes of the Synechocystis genome. Nucleic Acids Res (2001) 1.33

An efficient algorithm for identifying matches with errors in multiple long molecular sequences. J Mol Biol (1991) 1.32

Central equilibria in multilocus systems. I. Generalized nonepistatic selection regimes. Genetics (1979) 1.32

A symmetric-iterated multiple alignment of protein sequences. J Mol Biol (1998) 1.31

Too many leucine zippers? Nature (1989) 1.27

Addendum to a paper of W. Ewens. Theor Popul Biol (1972) 1.24

The evolutionary development of modifier genes. Proc Natl Acad Sci U S A (1972) 1.24

Comparisons of positive assortative mating and sexual selection models. Theor Popul Biol (1978) 1.22

Assortative mating based on phenotype. I. Two alleles with dominance. Genetics (1969) 1.22

Assortative mating based on phenotype. II. Two autosomal alleles without dominance. Genetics (1969) 1.15

Genetic analysis of the Stanford LRC family study data. I. Structured exploratory data analysis of height and weight measurements. Am J Epidemiol (1981) 1.14

Evolutionary aspects and sensitivity studies of some major gene models. J Theor Biol (1978) 1.13

Geometry of interplanar residue contacts in protein structures. Proc Natl Acad Sci U S A (1994) 1.12

Index measures for assessing the mode of inheritance of continuously distributed traits: I, theory and justifications. Theor Popul Biol (1979) 1.09

Significant potential secondary structures in the Epstein-Barr virus genome. Proc Natl Acad Sci U S A (1986) 1.09

Models of multifactorial inheritance: II. The covariance structure for a scalar phenotype under selective assortative mating and sex-dependent symmetric parental-transmission. Theor Popul Biol (1979) 1.08

The evolution of dominance: a direct approach through the theory of linkage and selection. Theor Popul Biol (1971) 1.06

Association arrays for comparing familial total cholesterol, high density lipoprotein cholesterol, and triglyceride similarity in the Israeli population by country of origin. Am J Epidemiol (1982) 1.04

Association arrays for the study of familial height, weight, lipid, and lipoprotein similarity in three West Coast populations. Am J Epidemiol (1982) 1.03

structured exploratory data analysis (SEDA) for determining mode of inheritance of quantitative traits. II. simulation studies on the effect of ascertaining families through high-valued probands. Am J Hum Genet (1981) 1.03

Path analysis in genetic epidemiology: a critique. Am J Hum Genet (1983) 1.02

Genetic analysis of the Stanford LRC family study data. II. Structured exploratory data analysis of lipids and lipoproteins. Am J Epidemiol (1981) 1.02

Significant dispersed recurrent DNA sequences in the Escherichia coli genome. Several new groups. J Mol Biol (1993) 1.01

The use of multiple alphabets in kappa-gene immunoglobulin DNA sequence comparisons. EMBO J (1985) 1.00

Distinctive charge configurations in proteins of the Epstein-Barr virus and possible functions. Proc Natl Acad Sci U S A (1988) 0.99

Multiple-alphabet amino acid sequence comparisons of the immunoglobulin kappa-chain constant domain. Proc Natl Acad Sci U S A (1985) 0.98

A phenotypic symmetric selection model for three loci, two alleles: the case of tight linkage. Theor Popul Biol (1976) 0.98

Random temporal variation in selection intensities acting on infinite diploid populations: diffusion method analysis. Theor Popul Biol (1975) 0.98

Theoretical studies on sex ratio evolution. Monogr Popul Biol (1986) 0.98

Representation of Nonepistatic selection models and analysis of multilocus Hardy-Weinberg Equilibrium configurations. J Math Biol (1979) 0.97

Gene frequency patterns in the Levene subdivided population model. Theor Popul Biol (1977) 0.96

How are close residues of protein structures distributed in primary sequence? Proc Natl Acad Sci U S A (1995) 0.96

Comparative statistics for DNA and protein sequences: single sequence analysis. Proc Natl Acad Sci U S A (1985) 0.94

Models of multifactorial inheritance: I. Multivariate formulations and basic convergence results. Theor Popul Biol (1979) 0.93

Sequence anomalies in the Cag7 gene of the Helicobacter pylori pathogenicity island. Proc Natl Acad Sci U S A (1999) 0.90

A class of indices to assess major-gene versus polygenic inheritance of distributive variables. Prog Clin Biol Res (1979) 0.89

Evidence for selective evolution in codon usage in conserved amino acid segments of human alphaherpesvirus proteins. J Mol Evol (1991) 0.87

Central Equilibria in Multilocus Systems. II. Bisexual Generalized Nonepistatic Selection Models. Genetics (1979) 0.87

Comparative statistics for DNA and protein sequences: multiple sequence analysis. Proc Natl Acad Sci U S A (1985) 0.86

Total aganglionic colon in an adult: first reported case. Ann Surg (1966) 0.86

Permutation methods for the structured exploratory data analysis (SEDA) of familial trait values. Am J Hum Genet (1984) 0.85

A new significant recurrent dyad pairing in Haemophilus influenzae. Trends Biochem Sci (1996) 0.85

The detection of particular genotypes in finite populations. I. Natural selection effects. Theor Popul Biol (1981) 0.84

On the optimal sex-ratio: a stability analysis based on a characterization for one-locus multiallele viability models. J Math Biol (1984) 0.84

On the optimal sex ratio. Proc Natl Acad Sci U S A (1983) 0.84

A criterion for stability--instability at fixation states involving an eigenvalue one with applications in population genetics. Theor Popul Biol (1982) 0.83

Some population genetic models combining artificial and natural selection pressures: II. two-locus theory. Theor Popul Biol (1975) 0.82

Structured exploratory data analysis (SEDA) of finger ridge-count inheritance: I. Major gene index, midparental correlation, and offspring-between-parents function in 125 south Indian families. Am J Phys Anthropol (1983) 0.82

Models of multifactorial inheritance: IV. Asymmetric transmission for a scalar phenotype. Theor Popul Biol (1979) 0.81

Comparative DNA sequence features in two long Escherichia coli contigs. Nucleic Acids Res (1993) 0.80

A comparative analysis of distinctive features of yeast protein sequences. Yeast (1993) 0.80

Structured exploratory data analysis (SEDA) of finger ridge-count inheritance: II. Association arrays in parent-offspring and sib-sib pairs. Am J Phys Anthropol (1983) 0.79

Misconceptions in "Trials of Structured Exploratory Data Analysis". Am J Hum Genet (1983) 0.79

The number of stable equilibria for the classical one-locus multiallele selection model. J Math Biol (1980) 0.79

On the evolution of altruism by kin selection. Proc Natl Acad Sci U S A (1984) 0.77

Applications of statistical criteria in protein sequence analysis: case study of yeast RNA polymerase II subunits. Comput Chem (1994) 0.77

Some population genetic models combining artificial and natural selection pressures. Proc Natl Acad Sci U S A (1974) 0.75

Analysis of biochemical genetic data on Jewish populations. III. The application of individual phenotype measurements for population comparisons. Am J Hum Genet (1982) 0.75

Preferential mating in symmetric multilocus systems. Proc Natl Acad Sci U S A (1981) 0.75

A study of familial resemblance for two cognitive psychometric tests by permutation analyses. Behav Genet (1985) 0.75

Models of multifactorial inheritance. VI. Formulas and properties of the vector phenotype equilibrium covariance matrix. Theor Popul Biol (1980) 0.75

Giant duplication of the jejunum. Case report. Am Surg (1968) 0.75

Permutation methods for the structured exploratory data analysis (SEDA) of total cholesterol measured in five Israeli populations. Am J Epidemiol (1985) 0.75