Search CORE

116 research outputs found

MISHIMA - a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

Author: ACC Shih
AL Delcher
B Morgenstern
C Notredame
D Mikhailov
DF Feng
DG Higgins
DJ Lipman
F Corpet
GJ Barton
J Cheetham
J Stoye
JD Thompson
K Katoh
K Kryukov
K Reinert
KB Li
Kirill Kryukov
M Brudno
M Brudno
M Brudno
M Kimura
N Bray
Naruya Saitou
O Gotoh
RC Edgar
U Tonges
WR Taylor
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences. Results We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences. Conclusions MISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species <it>Helicobacter pylori </it>(about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

inGeno – an integrated genome and ortholog viewer for improved genome to genome comparisons

Author: ACE Darling
AL Delcher
C Buchrieser
Chunguang Liang
GD Armstrong
H Mangalam
J Hacker
JA Vazquez-Boland
JA Vazquez-Boland
K Hayashi
KM Chao
M Brudno
M Brudno
M Hoebeke
M Riley
N Bray
O Dussurget
P Glaser
R Engels
S Kurtz
S Needleman
S Schwartz
T Hayashi
T Smith
Thomas Dandekar
TJ Carver
WJ Kent
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Systematic genome comparisons are an important tool to reveal gene functions, pathogenic features, metabolic pathways and genome evolution in the era of post-genomics. Furthermore, such comparisons provide important clues for vaccines and drug development. Existing genome comparison software often lacks accurate information on orthologs, the function of similar genes identified and genome-wide reports and lists on specific functions. All these features and further analyses are provided here in the context of a modular software tool "inGeno" written in Java with Biojava subroutines. RESULTS: InGeno provides a user-friendly interactive visualization platform for sequence comparisons (comprehensive reciprocal protein – protein comparisons) between complete genome sequences and all associated annotations and features. The comparison data can be acquired from several different sequence analysis programs in flexible formats. Automatic dot-plot analysis includes output reduction, filtering, ortholog testing and linear regression, followed by smart clustering (local collinear blocks; LCBs) to reveal similar genome regions. Further, the system provides genome alignment and visualization editor, collinear relationships and strain-specific islands. Specific annotations and functions are parsed, recognized, clustered, logically concatenated and visualized and summarized in reports. CONCLUSION: As shown in this study, inGeno can be applied to study and compare in particular prokaryotic genomes against each other (gram positive and negative as well as close and more distantly related species) and has been proven to be sensitive and accurate. This modular software is user-friendly and easily accommodates new routines to meet specific user-defined requirements

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

CGAT: a comparative genome analysis tool for visualizing alignments in the analysis of complex evolutionary changes between closely related genomes

Author: A Chinen
A Nobusato
A van Belkum
AL Delcher
AL Delcher
AL Delcher
B Gottgens
B Ma
C Josenhans
D Gusfield
D Romero
DA Nix
DA Pollard
E Gilson
F Kunst
FR Blattner
G Levinson
H Takami
I Uchiyama
I Uchiyama
Ichizo Kobayashi
Ikuo Uchiyama
J Parkhill
J Yang
JF Tomb
JH Choi
JM Claverie
K Ishikawa
KA Frazer
M Brudno
M Brudno
M Brudno
M Kawai
M Kawai
MY Leung
N Bray
N Jareborg
N Jareborg
NA Moran
NJ Saunders
P Siguier
RA Alm
S Karlin
S Schwartz
S Schwartz
SB Needleman
SF Altschul
T Hayashi
T Tsuru
TJ Carver
Toshio Higuchi
U Dobrindt
W Huang
WJ Kent
WJ Kent
WR Pearson
Z Ning
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The recent accumulation of closely related genomic sequences provides a valuable resource for the elucidation of the evolutionary histories of various organisms. However, although numerous alignment calculation and visualization tools have been developed to date, the analysis of complex genomic changes, such as large insertions, deletions, inversions, translocations and duplications, still presents certain difficulties. RESULTS: We have developed a comparative genome analysis tool, named CGAT, which allows detailed comparisons of closely related bacteria-sized genomes mainly through visualizing middle-to-large-scale changes to infer underlying mechanisms. CGAT displays precomputed pairwise genome alignments on both dotplot and alignment viewers with scrolling and zooming functions, and allows users to move along the pre-identified orthologous alignments. Users can place several types of information on this alignment, such as the presence of tandem repeats or interspersed repetitive sequences and changes in G+C contents or codon usage bias, thereby facilitating the interpretation of the observed genomic changes. In addition to displaying precomputed alignments, the viewer can dynamically calculate the alignments between specified regions; this feature is especially useful for examining the alignment boundaries, as these boundaries are often obscure and can vary between programs. Besides the alignment browser functionalities, CGAT also contains an alignment data construction module, which contains various procedures that are commonly used for pre- and post-processing for large-scale alignment calculation, such as the split-and-merge protocol for calculating long alignments, chaining adjacent alignments, and ortholog identification. Indeed, CGAT provides a general framework for the calculation of genome-scale alignments using various existing programs as alignment engines, which allows users to compare the outputs of different alignment programs. Earlier versions of this program have been used successfully in our research to infer the evolutionary history of apparently complex genome changes between closely related eubacteria and archaea. CONCLUSION: CGAT is a practical tool for analyzing complex genomic changes between closely related genomes using existing alignment programs and other sequence analysis tools combined with extensive manual inspection

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

High-throughput sequence alignment using Graphics Processing Units

Author: AL Delcher
AL Delcher
Amitabh Varshney
Arthur L Delcher
C Shaffer
Cole Trapnell
D Gusfield
E Ukkonen
EW Myers
I Buck
J Mellor-Crummey
JD Owens
M Brudno
M Charalambous
M Hohl
M Pop
Michael C Schatz
MJ Harris
NK Govindaraju
nVidia
P Weiner
S Kurtz
S Kurtz
SF Atschul
W Liu
W Pearson
WJ Dally
Y Juekuan
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and <it>de novo </it>genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.</p

CiteSeerX

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Repository at the University of Maryland

Multiple organism algorithm for finding ultraconserved elements

Author: A Sandelin
A Siepel
A Woolfe
AL Delcher
AL Delcher
B Ma
CF Cheung
D Gusfield
D Lawson
EA Glazov
EH Margulies
G Bejerano
Greg Madey
HW Mewes
JC Venter
JZ Ni
LD Stein
M Brudno
MI Abouelhoda
N Bray
Neil F Lobo
P Ferragina
RA Holt
S Kurtz
S Kurtz
S Schwartz
Scott Christley
SF Altschul
T Tran
TJP Hubbard
U Manber
WJ Kent
WJ Kent
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Ultraconserved elements are nucleotide or protein sequences with 100% identity (no mismatches, insertions, or deletions) in the same organism or between two or more organisms. Studies indicate that these conserved regions are associated with micro RNAs, mRNA processing, development and transcription regulation. The identification and characterization of these elements among genomes is necessary for the further understanding of their functionality. Results We describe an algorithm and provide freely available software which can find all of the ultraconserved sequences between genomes of multiple organisms. Our algorithm takes a combinatorial approach that finds all sequences without requiring the genomes to be aligned. The algorithm is significantly faster than BLAST and is designed to handle very large genomes efficiently. We ran our algorithm on several large comparative analyses to evaluate its effectiveness; one compared 17 vertebrate genomes where we find 123 ultraconserved elements longer than 40 bps shared by all of the organisms, and another compared the human body louse, <it>Pediculus humanus humanus</it>, against itself and select insects to find thousands of non-coding, potentially functional sequences. Conclusion Whole genome comparative analysis for multiple organisms is both feasible and desirable in our search for biological knowledge. We argue that bioinformatic programs should be forward thinking by assuming analysis on multiple (and possibly large) genomes in the design and implementation of algorithms. Our algorithm shows how a compromise design with a trade-off of disk space versus memory space allows for efficient computation while only requiring modest computer resources, and at the same time providing benefits not available with other software.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Highly conserved elements discovered in vertebrates are present in non-syntenic loci of tunicates, act as enhancers and can be transcribed during development

Author: Agnes Roure
Al-Shahrour
Alfano
Aparicio
Aravind
Basch
Bejerano
Benjamini
Bianchi
Britten
Brosius
Brudno
Brudno
Brudno
Bryne
Cameron
Christensen
Chuzhanova
Corbo
Danilo Licastro
De Santa
Delsuc
Dennis
Denoeud
Dermitzakis
Djebali
Dong
Dunham
Eddy
Elia Stupka
Euan R. Brown
Ewan Birney
Ferenc Müller
Ferrier
Francesca Petrera
Frazer
Gabriele Amore
García-Bellido
Gehrig
Götz
Hannenhalli
Holland
Hubbard
Hufton
Hufton
Hughes
Ihmels
Ikuta
Ikuta
Imai
Jung
Karali
Kawakami
Kent
Kermekchiev
Kikuta
Kim
Kirchhamer
Lemaire
Lenhard
Licastro
Lowe
Manzanares
Marco De Simone
Marco Ferg
Margulies
Marion Gueroult-Bellone
Mattick
Mayor
Meyer
Miura
Miwata
Natale
Nicola Meola
Oda-Ishii
Olinski
Patrick Lemaire
Pauli
Pennacchio
Plessy
Prud’homme
Putnam
Remo Sanges
Roure
Royo
Rozen
Sandberg
Sandro Banfi
Sanges
Sobral
Stajich
Stephen
Swaraj Basu
Tassy
Tsong
Ueda
Ulitsky
Uwe Strähle
Vavouri
Vavouri
Vavouri
Vilella
Visel
Woolfe
Yavor Hadzhiev
Zuckerkandl
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

Co-option of cis-regulatory modules has been suggested as a mechanism for the evolution of expression sites during development. However, the extent and mechanisms involved in mobilization of cisregulatory modules remains elusive. To trace the history of non-coding elements, which may represent candidate ancestral cis-regulatory modules affirmed during chordate evolution, we have searched for conserved elements in tunicate and vertebrate (Olfactores) genomes. We identified, for the first time, 183 non-coding sequences that are highly conserved between the two groups. Our results show that all but one element are conserved in non-syntenic regions between vertebrate and tunicate genomes, while being syntenic among vertebrates. Nevertheless, in all the groups, they are significantly associated with transcription factors showing specific functions fundamental to animal development, such as multicellular organism development and sequence-specific DNA binding. The majority of these regions map onto ultraconserved elements and we demonstrate that they can act as functional enhancers within the organism of origin, as well as in cross-transgenesis experiments, and that they are transcribed in extant species of Olfactores. We refer to the elements as 'Olfactores conserved non-coding elements'. \uc2\ua9 The Author(s) 2013. Published by Oxford University Press

Heriot Watt Pure

Crossref

University of Birmingham Research Portal

PubMed Central

Sissa Digital Library

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement

Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms.We describe a new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss (flux). We demonstrate that the new method can accurately align regions conserved in some, but not all, of the genomes, an important case not handled by our previous work. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score, which facilitates accurate detection of rearrangement breakpoints when genomes have unequal gene content. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The new genome alignment algorithm demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. We apply the new algorithm to a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows us to extend the previously defined concepts of core- and pan-genomes to include not only annotated genes, but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. We document substantial population-level variability among these organisms driven by segmental gain and loss. Interestingly, much variability lies in intergenic regions, suggesting that the Enterobacteriacae may exhibit regulatory divergence.The multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

OPUS - University of Technology Sydney

PubMed Central

Identification and Classification of Conserved RNA Secondary Structures in the Human Genome

Author: Adam Siepel
Angrand PO Apiou F, Stewart AF, Dutrillaux B, Losson R, et al.
Aparicio S Chapman J, Stupka E, Putnam N, Chia JM, et al.
Bentwich I Avniel A, Karov Y, Aharonov R, Gilad S, et al.
Berezikov E Guryev V, van de Belt J, Wienholds E, Plasterk RH, et al.
Berry MJ Banu L, Chen YY, Mandel SJ, Kieffer JD, et al.
Blanchette M Kent WJ, Riemer C, Elnitski L, Smit AF, et al.
Bompfünewerer AF Flamm C, Fried C, Fritzsch G, Hofacker IL, et al.
Brudno M Do CB, Cooper GM, Kim MF, Davydov E, et al.
Chimpanzee Sequencing and Analysis Consortium
David Haussler
Eric S Lander
Gibbs RA Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, et al.
Gill Bejerano
Gregory RI Yan KP, Amuthan G, Chendrimada T, Doratotaj B, et al.
Griffiths-Jones S Moxon S, Marshall M, Khanna A, Eddy SR, et al.
Higuchi M Maas S, Single FN, Hartner J, Rozov A, et al.
Hillier LW Miller W, Birney E, Warren W, Hardison RC, et al.
Howard MT Aggarwal G, Anderson CB, Khatri S, Flanigan KM, et al.
International Human Genome Sequencing Consortium
Jakob Skou Pedersen
Jim Kent
Kate Rosenbloom
Kent WJ Sugnet CW, Furey TS, Roskin KM, Pringle TH, et al.
Kerstin Lindblad-Toh
Kryukov GV Castellano S, Novoselov SV, Lobanov AV, Zehtab O, et al.
Lagos-Quintana M Rauhut R, Yalcin A, Meyer J, Lendeckel W, et al.
Lim LP Lau NC, Weinstein EG, Abdelhakim A, Yekta S, et al.
Matsufuji S Matsufuji T, Miyazaki Y, Murakami Y, Atkins JF, et al.
Pahl PM Hodges YK, Meltesen L, Perryman MB, Horwitz KB, et al.
Richard Durbin
Schwartz S Kent WJ, Smit A, Zhang Z, Baertsch R, et al.
Siepel A Bejerano G, Pedersen JS, Hinrichs AS, Hou M, et al.
Waterston RH Lindblad-Toh K, Birney E, Rogers J, Abril JF, et al.
Webb Miller
Xie X Lu J, Kulbokas EJ, Golub TR, Mootha V, et al.
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3′UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization

Public Library of Science (PLOS)

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Copenhagen University Research Information System

eScholarship - University of California

DNA Methylation Signature for EZH2 Functionally Classifies Sequence Variants in Three PRC2 Complex Genes.

Author: Bird LM
Brudno M
Brzezinski J
Caluseriu O
Chater-Diehl E
Chitayat D
Choufani S
Chung BHY
Clericuzio C
Cohen ASA
Cushing T
Cyrus S
Cytrynbaum C
Flinter F
Garg K
Gibson WT
Goodman S
Iascone M
Imagawa E
Kerr B
Lynch SA
Machado J
Matsumoto N
McConnell V
Mendoza-Londono R
Ming LH
Okamoto N
Scherer SW
Splitt M
Tatton-Brown K
Temple IK
Tenconi R
Testa G
Turinsky AL
Vitriolo A
Wang T
Weksberg R
White SM
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Weaver syndrome (WS), an overgrowth/intellectual disability syndrome (OGID), is caused by pathogenic variants in the histone methyltransferase EZH2, which encodes a core component of the Polycomb repressive complex-2 (PRC2). Using genome-wide DNA methylation (DNAm) data for 187 individuals with OGID and 969 control subjects, we show that pathogenic variants in EZH2 generate a highly specific and sensitive DNAm signature reflecting the phenotype of WS. This signature can be used to distinguish loss-of-function from gain-of-function missense variants and to detect somatic mosaicism. We also show that the signature can accurately classify sequence variants in EED and SUZ12, which encode two other core components of PRC2, and predict the presence of pathogenic variants in undiagnosed individuals with OGID. The discovery of a functionally relevant signature with utility for diagnostic classification of sequence variants in EZH2, EED, and SUZ12 supports the emerging paradigm shift for implementation of DNAm signatures into diagnostics and translational research

Southampton (e-Prints Soton)

AIR Universita degli studi di Milano

eScholarship - University of California

St George's Online Research Archive

University of Melbourne Institutional Repository

The Evolution of Gene Expression QTL in Saccharomyces cerevisiae

Author: A Whitehead
A Whitehead
AL Hughes
AL Hughes
B Heaton
B Lemos
C Neuhauser
CR Landry
CT Harbison
D Muhlrad
D Muhlrad
DM Ruderfer
DR Denver
E Petretto
EE Schadt
EJ Chesler
G Yvert
H Lan
J Ronald
J Ronald
James Ronald
JC Fay
JC Fay
JD Storey
JD Storey
JD Thompson
Joshua M. Akey
JP Townsend
L Bystrykh
M Brudno
M Kellis
M Morley
Matthew Hahn
MW Hahn
N Bing
N Hubner
P Cliften
P Khaitovich
P Khaitovich
RB Brem
RB Brem
RK Mortimer
SA Monks
SA Rifkin
SA Rifkin
SI Lee
SM Krone
SW Doniger
T Ohta
WS Wong
Y Gilad
Z Gu
Publication venue: Public Library of Science
Publication date: 01/08/2007
Field of study

Understanding the evolutionary forces that influence patterns of gene expression variation will provide insights into the mechanisms of evolutionary change and the molecular basis of phenotypic diversity. To date, studies of gene expression evolution have primarily been made by analyzing how gene expression levels vary within and between species. However, the fundamental unit of heritable variation in transcript abundance is the underlying regulatory allele, and as a result it is necessary to understand gene expression evolution at the level of DNA sequence variation. Here we describe the evolutionary forces shaping patterns of genetic variation for 1206 cis-regulatory QTL identified in a cross between two divergent strains of Saccharomyces cerevisiae. We demonstrate that purifying selection against mildly deleterious alleles is the dominant force governing cis-regulatory evolution in S. cerevisiae and estimate the strength of selection. We also find that essential genes and genes with larger codon bias are subject to slightly stronger cis-regulatory constraint and that positive selection has played a role in the evolution of major trans-acting QTL

Crossref

Directory of Open Access Journals

PubMed Central