Comparative analyses of multi-species sequences from targeted genomic regions.

PubWeight™: 13.31‹?› | Rank: Top 0.1% | All-Time Top 10000

🔗 View Article (PMID 12917688)

Published in Nature on August 14, 2003

Authors

J W Thomas1, J W Touchman, R W Blakesley, G G Bouffard, S M Beckstrom-Sternberg, E H Margulies, M Blanchette, A C Siepel, P J Thomas, J C McDowell, B Maskeri, N F Hansen, M S Schwartz, R J Weber, W J Kent, D Karolchik, T C Bruen, R Bevan, D J Cutler, S Schwartz, L Elnitski, J R Idol, A B Prasad, S-Q Lee-Lin, V V B Maduro, T J Summers, M E Portnoy, N L Dietrich, N Akhter, K Ayele, B Benjamin, K Cariaga, C P Brinkley, S Y Brooks, S Granite, X Guan, J Gupta, P Haghighi, S-L Ho, M C Huang, E Karlins, P L Laric, R Legaspi, M J Lim, Q L Maduro, C A Masiello, S D Mastrian, J C McCloskey, R Pearson, S Stantripop, E E Tiongson, J T Tran, C Tsurgeon, J L Vogt, M A Walker, K D Wetherby, L S Wiggins, A C Young, L-H Zhang, K Osoegawa, B Zhu, B Zhao, C L Shu, P J De Jong, C E Lawrence, A F Smit, A Chakravarti, D Haussler, P Green, W Miller, E D Green

Author Affiliations

1: Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892,USA.

Articles citing this

(truncated to the top 100)

Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res (2005) 44.08

Relaxed phylogenetics and dating with confidence. PLoS Biol (2006) 37.68

The UCSC Table Browser data retrieval tool. Nucleic Acids Res (2004) 25.12

Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res (2004) 24.52

Distribution and intensity of constraint in mammalian genomic sequence. Genome Res (2005) 18.85

Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature (2007) 11.66

Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol (2004) 10.59

Identification and characterization of multi-species conserved sequences. Genome Res (2003) 10.18

Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci U S A (2004) 9.42

A high-resolution map of human evolutionary constraint using 29 mammals. Nature (2011) 8.67

Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci U S A (2005) 8.15

Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci U S A (2004) 7.42

Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res (2005) 7.09

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res (2007) 7.05

MAVID: constrained ancestral alignment of multiple sequences. Genome Res (2004) 5.83

Reliable prediction of regulator targets using 12 Drosophila genomes. Genome Res (2007) 4.89

Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res (2007) 4.73

A model of the statistical power of comparative genome sequence analysis. PLoS Biol (2005) 4.54

ECR Browser: a tool for visualizing and accessing data from comparisons of multiple vertebrate genomes. Nucleic Acids Res (2004) 4.53

Using genomic data to unravel the root of the placental mammal phylogeny. Genome Res (2007) 4.52

Characterization of evolutionary rates and constraints in three Mammalian genomes. Genome Res (2004) 4.45

An intermediate grade of finished genomic sequence suitable for comparative analyses. Genome Res (2004) 4.38

An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. Proc Natl Acad Sci U S A (2005) 4.38

Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res (2006) 4.32

Disruption of an AP-2alpha binding site in an IRF6 enhancer is associated with cleft lip. Nat Genet (2008) 4.04

Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics (2004) 4.00

The human phylome. Genome Biol (2007) 3.81

Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res (2008) 3.76

Regulatory potential scores from genome-wide three-way alignments of human, mouse, and rat. Genome Res (2004) 3.53

Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc Natl Acad Sci U S A (2004) 3.41

Computational prediction of transcription-factor binding site locations. Genome Biol (2003) 3.39

Defining functional DNA elements in the human genome. Proc Natl Acad Sci U S A (2014) 3.35

Conservation of core gene expression in vertebrate tissues. J Biol (2009) 2.84

Breakpoint graphs and ancestral genome reconstructions. Genome Res (2009) 2.78

A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res (2004) 2.73

Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes. PLoS Comput Biol (2008) 2.70

Automated whole-genome multiple alignment of rat, mouse, and human. Genome Res (2004) 2.64

Defining the mammalian CArGome. Genome Res (2005) 2.59

Comparison of human chromosome 21 conserved nongenic sequences (CNGs) with the mouse and dog genomes shows that their selective constraint is independent of their genic environment. Genome Res (2004) 2.58

What fraction of the human genome is functional? Genome Res (2011) 2.46

Pigs in sequence space: a 0.66X coverage pig genome survey based on shotgun sequencing. BMC Genomics (2005) 2.34

Confirming the phylogeny of mammals by use of large comparative sequence data sets. Mol Biol Evol (2008) 2.33

Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res (2004) 2.32

Absence of the TAP2 human recombination hotspot in chimpanzees. PLoS Biol (2004) 2.22

Synonymous mutations in CFTR exon 12 affect splicing and are not neutral in evolution. Proc Natl Acad Sci U S A (2005) 2.12

Phylogenomics of nonavian reptiles and the structure of the ancestral amniote genome. Proc Natl Acad Sci U S A (2007) 2.09

Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies. PLoS Genet (2005) 2.05

Reconstructing large regions of an ancestral mammalian genome in silico. Genome Res (2004) 2.01

Comparative genomics. PLoS Biol (2003) 1.97

Pegasoferae, an unexpected mammalian clade revealed by tracking ancient retroposon insertions. Proc Natl Acad Sci U S A (2006) 1.95

The scale of mutational variation in the murid genome. Genome Res (2005) 1.95

Early history of mammals is elucidated with the ENCODE multiple species sequencing data. PLoS Genet (2007) 1.92

Comparative sequencing provides insights about the structure and conservation of marsupial and monotreme genomes. Proc Natl Acad Sci U S A (2005) 1.88

A high utility integrated map of the pig genome. Genome Biol (2007) 1.88

Blueprint for a high-performance biomaterial: full-length spider dragline silk genes. PLoS One (2007) 1.81

Analysis of multiple genomic sequence alignments: a web resource, online tools, and lessons learned from analysis of mammalian SCL loci. Genome Res (2004) 1.80

Transposon-free regions in mammalian genomes. Genome Res (2005) 1.75

Swine Genome Sequencing Consortium (SGSC): a strategic roadmap for sequencing the pig genome. Comp Funct Genomics (2005) 1.75

Heterotachy in mammalian promoter evolution. PLoS Genet (2006) 1.71

Subtree power analysis and species selection for comparative genomics. Proc Natl Acad Sci U S A (2005) 1.69

Annotation of cis-regulatory elements by identification, subclassification, and functional assessment of multispecies conserved sequences. Proc Natl Acad Sci U S A (2005) 1.68

An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat Genet (2013) 1.68

Variable molecular clocks in hominoids. Proc Natl Acad Sci U S A (2006) 1.66

Human, mouse, and rat genome large-scale rearrangements: stability versus speciation. Genome Res (2004) 1.63

The complete genome sequence of Roseobacter denitrificans reveals a mixotrophic rather than photosynthetic metabolism. J Bacteriol (2006) 1.61

Gene and alternative splicing annotation with AIR. Genome Res (2005) 1.55

Photocleavable fluorescent nucleotides for DNA sequencing on a chip constructed by site-specific coupling chemistry. Proc Natl Acad Sci U S A (2004) 1.55

Arabidopsis intragenomic conserved noncoding sequence. Proc Natl Acad Sci U S A (2007) 1.52

Comparative analysis of protein coding sequences from human, mouse and the domesticated pig. BMC Biol (2005) 1.51

Evolutionary constraints in conserved nongenic sequences of mammals. Genome Res (2005) 1.50

A phylogenomic study of human, dog, and mouse. PLoS Comput Biol (2006) 1.49

Unusual DNA structures associated with germline genetic activity in Caenorhabditis elegans. Genetics (2006) 1.48

Recurrent duplication-driven transposition of DNA during hominoid evolution. Proc Natl Acad Sci U S A (2006) 1.47

Four-color DNA sequencing by synthesis on a chip using photocleavable fluorescent nucleotides. Proc Natl Acad Sci U S A (2005) 1.47

Genomic selective constraints in murid noncoding DNA. PLoS Genet (2006) 1.46

Adaptive evolution of conserved noncoding elements in mammals. PLoS Genet (2007) 1.45

Transposable elements donate lineage-specific regulatory sequences to host genomes. Cytogenet Genome Res (2005) 1.44

Highly syntenic regions in the genomes of soybean, Medicago truncatula, and Arabidopsis thaliana. BMC Plant Biol (2005) 1.43

Sequencing and comparative analysis of a conserved syntenic segment in the Solanaceae. Genetics (2008) 1.43

Allele-specific KRT1 expression is a complex trait. PLoS Genet (2006) 1.41

Pooled genomic indexing of rhesus macaque. Genome Res (2005) 1.41

Evolutionary history of mammalian transposons determined by genome-wide defragmentation. PLoS Comput Biol (2007) 1.38

A snapshot of CNVs in the pig genome. PLoS One (2008) 1.36

Large-scale sequencing of the CD33-related Siglec gene cluster in five mammalian species reveals rapid evolution by multiple mechanisms. Proc Natl Acad Sci U S A (2004) 1.36

Multiple groups of endogenous betaretroviruses in mice, rats, and other mammals. J Virol (2004) 1.33

Modulefinder: a tool for computational discovery of cis regulatory modules. Pac Symp Biocomput (2005) 1.33

Combining microarray-based genomic selection (MGS) with the Illumina Genome Analyzer platform to sequence diploid target regions. Ann Hum Genet (2009) 1.32

The power of phylogenetic comparison in revealing protein function. Proc Natl Acad Sci U S A (2005) 1.31

Marek's disease is a natural model for lymphomas overexpressing Hodgkin's disease antigen (CD30). Proc Natl Acad Sci U S A (2004) 1.30

Statistical evidence for conserved, local secondary structure in the coding regions of eukaryotic mRNAs and pre-mRNAs. Nucleic Acids Res (2005) 1.30

Microinversions in mammalian evolution. Proc Natl Acad Sci U S A (2006) 1.29

Position and distance specificity are important determinants of cis-regulatory motifs in addition to evolutionary conservation. Nucleic Acids Res (2007) 1.28

Progressive proximal expansion of the primate X chromosome centromere. Proc Natl Acad Sci U S A (2005) 1.27

DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants. Nucleic Acids Res (2005) 1.27

miREvo: an integrative microRNA evolutionary analysis platform for next-generation sequencing experiments. BMC Bioinformatics (2012) 1.27

Functional evolution of cis-regulatory modules at a homeotic gene in Drosophila. PLoS Genet (2009) 1.25

A dual origin of the Xist gene from a protein-coding gene and a set of transposable elements. PLoS One (2008) 1.23

Identification of evolutionary hotspots in the rodent genomes. Genome Res (2004) 1.21

Trans genomic capture and sequencing of primate exomes reveals new targets of positive selection. Genome Res (2011) 1.20

A common variant associated with dyslexia reduces expression of the KIAA0319 gene. PLoS Genet (2009) 1.20

Articles by these authors

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

Basic local alignment search tool. J Mol Biol (1990) 659.07

Initial sequencing and analysis of the human genome. Nature (2001) 212.86

Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res (1998) 106.16

Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res (1998) 96.63

Consed: a graphical tool for sequence finishing. Genome Res (1998) 59.36

MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics (1987) 54.39

A greedy algorithm for aligning DNA sequences. J Comput Biol (2000) 47.89

Optimal alignments in linear space. Comput Appl Biosci (1988) 38.10

Construction of multilocus genetic linkage maps in humans. Proc Natl Acad Sci U S A (1987) 35.83

Identification of the cystic fibrosis gene: genetic analysis. Science (1989) 33.61

The UCSC Genome Browser Database. Nucleic Acids Res (2003) 32.84

Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol (1994) 31.57

Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res (1998) 23.87

The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res (2007) 23.13

A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res (1998) 22.69

Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology. Comput Appl Biosci (1996) 19.74

Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A (2000) 19.39

Aligning two sequences within a specified diagonal band. Comput Appl Biosci (1992) 19.31

PipMaker--a web server for aligning two genomic DNA sequences. Genome Res (2000) 17.46

Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat Genet (1999) 17.09

Complete genome sequence of Salmonella enterica serovar Typhimurium LT2. Nature (2001) 16.89

Global water resources: vulnerability from climate change and population growth. Science (2000) 16.61

Structure of a cannabinoid receptor and functional expression of the cloned cDNA. Nature (1990) 13.60

A genetic linkage map of the human genome. Cell (1987) 13.37

Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics (2000) 13.33

The UCSC genome browser database: update 2007. Nucleic Acids Res (2006) 13.04

A physical map of the human genome. Nature (2001) 12.39

Global threats to human water security and river biodiversity. Nature (2010) 11.84

Improved splice site detection in Genie. J Comput Biol (1997) 11.57

A generalized hidden Markov model for the recognition of human genes in DNA. Proc Int Conf Intell Syst Mol Biol (1996) 11.45

The UCSC Genome Browser Database: update 2006. Nucleic Acids Res (2006) 11.05

Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature (2001) 10.96

Using Dirichlet mixture priors to derive hidden Markov models for protein families. Proc Int Conf Intell Syst Mol Biol (1993) 10.73

The UCSC Genome Browser Database: update 2009. Nucleic Acids Res (2008) 10.31

A molecular basis for cardiac arrhythmia: HERG mutations cause long QT syndrome. Cell (1995) 10.18

Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science (2000) 10.14

BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics (2010) 9.11

Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol (1998) 9.09

A ne method for studying gut transit times using radioopaque markers. Gut (1969) 8.97

Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res (1997) 8.49

Scoring pairwise genomic sequence alignments. Pac Symp Biocomput (2002) 8.42

New goals for the U.S. Human Genome Project: 1998-2003. Science (1998) 8.30

Assembly of the working draft of the human genome with GigAssembler. Genome Res (2001) 8.23

Human-mouse genome comparisons to locate regulatory sites. Nat Genet (2000) 8.08

Automated finishing with autofinish. Genome Res (2001) 7.97

Nonuniform recombination within the human beta-globin gene cluster. Am J Hum Genet (1984) 7.97

DNA duplication associated with Charcot-Marie-Tooth disease type 1A. Cell (1991) 7.93

Comparison of DNA sequences with protein sequences. Genomics (1997) 7.76

Genie--gene finding in Drosophila melanogaster. Genome Res (2000) 7.47

A DNA polymorphism discovery resource for research on human genetic variation. Genome Res (1998) 7.44

Genes galore: a summary of methods for accessing results from large-scale partial sequencing of anonymous Arabidopsis cDNA clones. Plant Physiol (1994) 7.42

High throughput fingerprint analysis of large-insert clones. Genome Res (1997) 6.91

Mapping sequenced E.coli genes by computer: software, strategies and examples. Nucleic Acids Res (1991) 6.29

Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res (1994) 6.29

Identification of a family of muscarinic acetylcholine receptor genes. Science (1987) 6.21

Analysis of expressed sequence tags indicates 35,000 human genes. Nat Genet (2000) 6.11

Rapid discrimination among individual DNA hairpin molecules at single-nucleotide resolution using an ion channel. Nat Biotechnol (2001) 6.07

Telomere elongation in immortal human cells without detectable telomerase activity. EMBO J (1995) 6.03

Integrating database homology in a probabilistic gene structure model. Pac Symp Biocomput (1997) 5.96

Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci (1995) 5.96

A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res (1994) 5.50

The C. elegans genome sequencing project: a beginning. Nature (1992) 5.36

High-throughput variation detection and genotyping using microarrays. Genome Res (2001) 5.24

arrow encodes an LDL-receptor-related protein essential for Wingless signalling. Nature (2000) 5.13

Socioeconomic status and psychiatric disorders: the causation-selection issue. Science (1992) 4.95

A new human prostate carcinoma cell line, 22Rv1. In Vitro Cell Dev Biol Anim (1999) 4.93

Large-scale comparative sequence analysis of the human and murine Bruton's tyrosine kinase loci reveals conserved regulatory domains. Genome Res (1997) 4.82

A systematic, high-resolution linkage of the cytogenetic and physical maps of the human genome. Nat Genet (2000) 4.78

OSP: a computer program for choosing PCR and DNA sequencing primers. PCR Methods Appl (1991) 4.72

A survey of expressed genes in Caenorhabditis elegans. Nat Genet (1992) 4.63

Vero cell toxins in Escherichia coli and related bacteria: transfer by phage and conjugation and toxic action in laboratory animals, chickens and pigs. J Gen Microbiol (1983) 4.52

Broken heart: a statistical study of increased mortality among widowers. Br Med J (1969) 4.49

Pendred syndrome is caused by mutations in a putative sulphate transporter gene (PDS). Nat Genet (1997) 4.43

RNA modeling using Gibbs sampling and stochastic context free grammars. Proc Int Conf Intell Syst Mol Biol (1994) 4.32

Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res (2001) 4.29

Transcriptome and genome conservation of alternative splicing events in humans and mice. Pac Symp Biocomput (2004) 4.25