LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.

PubWeight™: 23.03‹?› | Rank: Top 0.01% | All-Time Top 10000

🔗 View Article (PMC 430158)

Published in Genome Res on March 12, 2003

Authors

Michael Brudno1, Chuong B Do, Gregory M Cooper, Michael F Kim, Eugene Davydov, NISC Comparative Sequencing Program, Eric D Green, Arend Sidow, Serafim Batzoglou

Author Affiliations

1: Department of Computer Science, Stanford University, Stanford, California 94305-9010, USA.

Articles citing this

(truncated to the top 100)

MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res (2004) 168.89

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 75.09

Versatile and open software for comparing large genomes. Genome Biol (2004) 49.45

Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res (2005) 44.08

Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res (2004) 28.63

Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res (2004) 24.52

progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One (2010) 24.31

Distribution and intensity of constraint in mammalian genomic sequence. Genome Res (2005) 18.85

SeqAn an efficient, generic C++ library for sequence analysis. BMC Bioinformatics (2008) 17.31

VISTA: computational tools for comparative genomics. Nucleic Acids Res (2004) 13.52

Ensembl 2006. Nucleic Acids Res (2006) 11.66

Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol (2004) 10.59

Identification and characterization of multi-species conserved sequences. Genome Res (2003) 10.18

Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. Genome Res (2005) 8.38

Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res (2008) 7.35

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res (2007) 7.05

The DNA sequence of the human X chromosome. Nature (2005) 6.97

Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol (2006) 6.73

Query-dependent banding (QDB) for faster RNA similarity searches. PLoS Comput Biol (2007) 6.34

Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics (2012) 6.16

Whole-genome shotgun assembly and comparison of human genome assemblies. Proc Natl Acad Sci U S A (2004) 6.08

Fast statistical alignment. PLoS Comput Biol (2009) 5.92

MAVID: constrained ancestral alignment of multiple sequences. Genome Res (2004) 5.83

Genome analysis of the platypus reveals unique signatures of evolution. Nature (2008) 5.74

High-throughput sequence alignment using Graphics Processing Units. BMC Bioinformatics (2007) 5.56

Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res (2008) 5.12

MultiPipMaker and supporting tools: Alignments and analysis of multiple genomic DNA sequences. Nucleic Acids Res (2003) 4.97

oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res (2005) 4.52

Characterization of evolutionary rates and constraints in three Mammalian genomes. Genome Res (2004) 4.45

Large-scale turnover of functional transcription factor binding sites in Drosophila. PLoS Comput Biol (2006) 4.01

Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics (2004) 4.00

Close sequence comparisons are sufficient to identify human cis-regulatory elements. Genome Res (2006) 3.92

Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome Res (2005) 3.85

Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res (2007) 3.82

PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics (2004) 3.63

The Forkhead transcription factor Hcm1 regulates chromosome segregation genes and fills the S-phase gap in the transcriptional circuitry of the cell cycle. Genes Dev (2006) 3.58

The ENCODE Project at UC Santa Cruz. Nucleic Acids Res (2006) 3.57

Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. Genome Res (2003) 3.52

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences. BMC Bioinformatics (2005) 3.41

Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol (2004) 3.36

PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny. PLoS Comput Biol (2005) 3.35

Learning a prior on regulatory potential from eQTL data. PLoS Genet (2009) 3.31

A recalibrated molecular clock and independent origins for the cholera pandemic clones. PLoS One (2008) 3.16

Uncertainty in homology inferences: assessing and improving genomic sequence alignment. Genome Res (2007) 3.16

Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting. PLoS Genet (2006) 3.01

Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics (2003) 2.94

Graemlin: general and robust alignment of multiple large interaction networks. Genome Res (2006) 2.92

Mulan: multiple-sequence local alignment and visualization for studying function and evolution. Genome Res (2004) 2.91

Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar, and grape: CoGe with rosids. Plant Physiol (2008) 2.91

Comparative ICE genomics: insights into the evolution of the SXT/R391 family of ICEs. PLoS Genet (2009) 2.91

Conservation of core gene expression in vertebrate tissues. J Biol (2009) 2.84

Eukaryotic regulatory element conservation analysis and identification using comparative genomics. Genome Res (2004) 2.84

A survey of DNA motif finding algorithms. BMC Bioinformatics (2007) 2.79

TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis. Nucleic Acids Res (2005) 2.68

Sepsid even-skipped enhancers are functionally conserved in Drosophila despite lack of sequence conservation. PLoS Genet (2008) 2.67

Phylo: a citizen science approach for improving multiple sequence alignment. PLoS One (2012) 2.64

Automated whole-genome multiple alignment of rat, mouse, and human. Genome Res (2004) 2.64

Population genomics of the wild yeast Saccharomyces paradoxus: Quantifying the life cycle. Proc Natl Acad Sci U S A (2008) 2.56

eHive: an artificial intelligence workflow system for genomic analysis. BMC Bioinformatics (2010) 2.52

Implications of chimaeric non-co-linear transcripts. Nature (2009) 2.52

The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol (2014) 2.44

Multiple whole-genome alignments without a reference organism. Genome Res (2009) 2.31

Genome sequence of Blochmannia pennsylvanicus indicates parallel evolutionary trends among bacterial mutualists of insects. Genome Res (2005) 2.28

Extreme genomic variation in a natural population. Proc Natl Acad Sci U S A (2007) 2.24

Conservation of regulatory elements between two species of Drosophila. BMC Bioinformatics (2003) 2.06

The evolution of the DLK1-DIO3 imprinted domain in mammals. PLoS Biol (2008) 2.05

Frequent gain and loss of functional transcription factor binding sites. PLoS Comput Biol (2007) 2.04

Cactus: Algorithms for genome multiple sequence alignment. Genome Res (2011) 2.03

Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome. Genome Res (2006) 1.95

A correlation with exon expression approach to identify cis-regulatory elements for tissue-specific alternative splicing. Nucleic Acids Res (2007) 1.94

M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species. BMC Bioinformatics (2006) 1.92

The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences. Nucleic Acids Res (2004) 1.88

Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res (2006) 1.88

MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. Proc Natl Acad Sci U S A (2004) 1.85

Genome-wide prediction of transcription factor binding sites using an integrated model. Genome Biol (2010) 1.84

Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila. BMC Bioinformatics (2004) 1.82

A haplome alignment and reference sequence of the highly polymorphic Ciona savignyi genome. Genome Biol (2007) 1.82

Runx1-mediated hematopoietic stem-cell emergence is controlled by a Gata/Ets/SCL-regulated enhancer. Blood (2007) 1.82

Computational analysis and identification of an emergent human adenovirus pathogen implicated in a respiratory fatality. Virology (2010) 1.82

Analysis of multiple genomic sequence alignments: a web resource, online tools, and lessons learned from analysis of mammalian SCL loci. Genome Res (2004) 1.80

Sockeye: a 3D environment for comparative genomics. Genome Res (2004) 1.80

The genome of Eucalyptus grandis. Nature (2014) 1.76

Pegasys: software for executing and integrating analyses of biological sequences. BMC Bioinformatics (2004) 1.74

Cross-species comparison of Drosophila male accessory gland protein genes. Genetics (2005) 1.71

Heterotachy in mammalian promoter evolution. PLoS Genet (2006) 1.71

The PAZAR database of gene regulatory information coupled to the ORCA toolkit for the study of regulatory sequences. Nucleic Acids Res (2008) 1.66

Genome-wide promoter extraction and analysis in human, mouse, and rat. Genome Biol (2005) 1.64

A wide extent of inter-strain diversity in virulent and vaccine strains of alphaherpesviruses. PLoS Pathog (2011) 1.63

Molecular basis of the copulatory plug polymorphism in Caenorhabditis elegans. Nature (2008) 1.61

CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting. Genome Res (2003) 1.57

Enhancer identification through comparative genomics. Semin Cell Dev Biol (2007) 1.56

Dynamic evolution of oryza genomes is revealed by comparative genomic analysis of a genus-wide vertical data set. Plant Cell (2008) 1.55

Phylemon 2.0: a suite of web-tools for molecular evolution, phylogenetics, phylogenomics and hypotheses testing. Nucleic Acids Res (2011) 1.53

CGAT: a comparative genome analysis tool for visualizing alignments in the analysis of complex evolutionary changes between closely related genomes. BMC Bioinformatics (2006) 1.52

nGASP--the nematode genome annotation assessment project. BMC Bioinformatics (2008) 1.52

Mobile antibiotic resistance encoding elements promote their own diversity. PLoS Genet (2009) 1.49

Building genomic profiles for uncovering segmental homology in the twilight zone. Genome Res (2004) 1.48

Noncoding regulatory sequences of Ciona exhibit strong correspondence between evolutionary constraint and functional importance. Genome Res (2004) 1.47

Large-scale discovery of promoter motifs in Drosophila melanogaster. PLoS Comput Biol (2006) 1.46

Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments. BMC Bioinformatics (2006) 1.44

Articles cited by this

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 665.31

Basic local alignment search tool. J Mol Biol (1990) 659.07

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res (1994) 392.47

A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol (1970) 155.96

Identification of common molecular subsequences. J Mol Biol (1981) 130.53

BLAT--the BLAST-like alignment tool. Genome Res (2002) 126.78

T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol (2000) 57.88

SSAHA: a fast search method for large DNA databases. Genome Res (2001) 48.64

Alignment of whole genomes. Nucleic Acids Res (1999) 20.02

A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J Mol Biol (1987) 19.30

PipMaker--a web server for aligning two genomic DNA sequences. Genome Res (2000) 17.46

Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res (2002) 17.31

Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol (1996) 15.98

VISTA : visualizing global DNA sequence alignments of arbitrary length. Bioinformatics (2000) 12.50

DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics (1999) 12.22

A flexible method to align large numbers of biological sequences. J Mol Evol (1989) 11.34

Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. Genome Res (2000) 10.40

AVID: A global alignment program. Genome Res (2003) 10.06

Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res (2000) 8.28

On the complexity of multiple sequence alignment. J Comput Biol (1994) 8.00

Efficient multiple genome alignment. Bioinformatics (2002) 6.88

Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res (2000) 6.18

DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics (1998) 5.11

Comparison of genomic DNA sequences: solved and unsolved problems. Bioinformatics (2001) 4.91

Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics (2001) 3.36

Fast and sensitive alignment of large genomic sequences. Proc IEEE Comput Soc Bioinform Conf (2002) 2.84

Transcriptional regulation of the stem cell leukemia gene (SCL)--comparative analysis of five vertebrate SCL loci. Genome Res (2002) 2.77

Inference of functional regions in proteins by quantification of evolutionary constraints. Proc Natl Acad Sci U S A (2002) 2.54

Globin gene server: a prototype E-mail database server featuring extensive multiple alignments and data compilation for electronic genetic analysis. Genomics (1994) 2.36

ReAligner: a program for refining DNA sequence multi-alignments. J Comput Biol (1997) 2.25

Genome rearrangement with gene families. Bioinformatics (1999) 2.24

Positive and negative regulatory elements of the rabbit embryonic epsilon-globin gene revealed by an improved multiple alignment program and functional analysis. DNA Seq (1993) 1.66

A new approach to clustering the amino acids. J Theor Biol (1996) 1.62

Articles by these authors

Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 96.15

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 75.09

Mapping and sequencing of structural variation from eight human genomes. Nature (2008) 30.28

Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res (2004) 24.52

Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature (2004) 24.40

Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans. Nat Genet (2008) 20.73

Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc Natl Acad Sci U S A (2002) 20.48

Distribution and intensity of constraint in mammalian genomic sequence. Genome Res (2005) 18.85

Evolution of genes and genomes on the Drosophila phylogeny. Nature (2007) 18.01

Topographical and temporal diversity of the human skin microbiome. Science (2009) 15.96

A vision for the future of genomics research. Nature (2003) 14.06

Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat Genet (2010) 12.63

Prepublication data sharing. Nature (2009) 12.24

ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res (2005) 11.90

Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods (2008) 11.61

SHRiMP: accurate mapping of short color-space reads. PLoS Comput Biol (2009) 11.24

Identification and characterization of multi-species conserved sequences. Genome Res (2003) 10.18

The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC). Genome Res (2004) 9.18

ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res (2012) 9.13

Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS One (2007) 8.70

Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol (2010) 8.69

A high-resolution map of human evolutionary constraint using 29 mammals. Nature (2011) 8.67

A recurrent 15q13.3 microdeletion syndrome associated with mental retardation and seizures. Nat Genet (2008) 8.44

Focused evolution of HIV-1 neutralizing antibodies revealed by structures and deep sequencing. Science (2011) 7.92

A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Res (2008) 7.77

Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature (2005) 7.74

A diversity profile of the human skin microbiota. Genome Res (2008) 7.65

Exome sequencing identifies GRIN2A as frequently mutated in melanoma. Nat Genet (2011) 7.31

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res (2007) 7.05

The ClinSeq Project: piloting large-scale genome sequencing for research in genomic medicine. Genome Res (2009) 6.83

Population analysis of large copy number variants and hotspots of human genetic disease. Am J Hum Genet (2009) 6.79

A recurrent 16p12.1 microdeletion supports a two-hit model for severe developmental delay. Nat Genet (2010) 6.62

Systematic comparison of three genomic enrichment methods for massively parallel DNA sequencing. Genome Res (2010) 5.76

Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res (2005) 5.62

Determinants of nucleosome organization in primary human cells. Nature (2011) 5.50

Linking disease associations with regulatory information in the human genome. Genome Res (2012) 5.47

Co-evolution of a broadly neutralizing HIV-1 antibody and founder virus. Nature (2013) 5.35

Jarid2/Jumonji coordinates control of PRC2 enzymatic activity and target gene occupancy in pluripotent cells. Cell (2009) 5.26

MultiPipMaker and supporting tools: Alignments and analysis of multiple genomic DNA sequences. Nucleic Acids Res (2003) 4.97

A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk. Nature (2005) 4.92

Implementing genomic medicine in the clinic: the future is here. Genet Med (2013) 4.89

Glocal alignment: finding rearrangements during alignment. Bioinformatics (2003) 4.88

Initial sequence and comparative analysis of the cat genome. Genome Res (2007) 4.67

Massively parallel functional dissection of mammalian enhancers in vivo. Nat Biotechnol (2012) 4.51

Single-nucleotide evolutionary constraint scores highlight disease-causing mutations. Nat Methods (2010) 4.51

Characterization of evolutionary rates and constraints in three Mammalian genomes. Genome Res (2004) 4.45

An intermediate grade of finished genomic sequence suitable for comparative analyses. Genome Res (2004) 4.38

An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. Proc Natl Acad Sci U S A (2005) 4.38

CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics (2006) 4.31

Closing gaps in the human genome with fosmid resources generated from multiple individuals. Nat Genet (2007) 4.30

Comprehensive research synopsis and systematic meta-analyses in Parkinson's disease genetics: The PDGene database. PLoS Genet (2012) 4.27

The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Science (2013) 4.07

Disruption of an AP-2alpha binding site in an IRF6 enhancer is associated with cleft lip. Nat Genet (2008) 4.04

Developmental pathway for potent V1V2-directed HIV-neutralizing antibodies. Nature (2014) 3.73

Polymorphisms of the HNF1A gene encoding hepatocyte nuclear factor-1 alpha are associated with C-reactive protein. Am J Hum Genet (2008) 3.61

Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements. Genome Res (2012) 3.61

Glycyl tRNA synthetase mutations in Charcot-Marie-Tooth disease type 2D and distal spinal muscular atrophy type V. Am J Hum Genet (2003) 3.55

Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. Genome Res (2003) 3.52

Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet (2003) 3.52

Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc Natl Acad Sci U S A (2004) 3.41

Defining functional DNA elements in the human genome. Proc Natl Acad Sci U S A (2014) 3.35

Research ethics. The complexities of genomic identifiability. Science (2013) 3.23

The DNA sequence of human chromosome 7. Nature (2003) 3.18

TTC21B contributes both causal and modifying alleles across the ciliopathy spectrum. Nat Genet (2011) 3.06

Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics (2003) 2.94