Search CORE

332 research outputs found

WAMI: a web server for the analysis of minisatellite maps

Author: Abouelhoda Mohamed
El-Kalioby Mohamed
Giegerich Robert
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abouelhoda M, El-Kalioby M, Giegerich R. WAMI: a web server for the analysis of minisatellite maps. BMC Evolutionary Biology. 2010;10(1): 167.Background Minisatellites are genomic loci composed of tandem arrays of short repetitive DNA segments. A minisatellite map is a sequence of symbols that represents the tandem repeat array such that the set of symbols is in one-to-one correspondence with the set of distinct repeats. Due to variations in repeat type and organization as well as copy number, the minisatellite maps have been widely used in forensic and population studies. In either domain, researchers need to compare the set of maps to each other, to build phylogenetic trees, to spot structural variations, and to study duplication dynamics. Efficient algorithms for these tasks are required to carry them out reliably and in reasonable time. Results In this paper we present WAMI, a web-server for the analysis of minisatellite maps. It performs the above mentioned computational tasks using efficient algorithms that take the model of map evolution into account. The WAMI interface is easy to use and the results of each analysis task are visualized. Conclusions To the best of our knowledge, WAMI is the first server providing all these computational facilities to the minisatellite community. The WAMI web-interface and the source code of the underlying programs are available at http://www.nubios.nileu.edu.eg/tools/wam

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

CoCoNUT: an efficient system for the comparison and analysis of genomes

Author: A Darling
A Kasprzyk
B Haas
B Ma
B Mau
B Morgenstern
B Raphael
C Wawra
DR Bentley
E Mardis
E Ohlebusch
E Passarge
E Sonnhammer
Enno Ohlebusch
G Bourque
G Gremme
I Ovcharenko
J Krumsiek
J Peterson
J Thompson
L Florea
M Abouelhoda
M Abouelhoda
M Abouelhoda
M Abouelhoda
M Abouelhoda
M Blanchette
M Brudno
M Clamp
M Höhl
M Kellis
M Margulies
Mohamed I Abouelhoda
P Chain
R Staden
S Altschul
S Karlin
S Kurtz
S Ranganathan
S Schwartz
S Schwartz
S Shibuya
Stefan Kurtz
T Treangen
T Vision
T Wu
The Arabidopsis Genome Initiative
W Kent
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Wave Energy: a Pacific Perspective

Author: D. Gusfield
D. Okanohara
G. Manzini
J. Fischer
J. Kärkkäinen
J. Kärkkäinen
K. Sadakane
M.I. Abouelhoda
P. Ferragina
R. Dementiev
R. Sinha
S.J. Puglisi
S.J. Puglisi
T. Kasai
U. Manber
V. Mäkinen
Publication venue: The Royal Society
Publication date: 01/01/2009
Field of study

This is the author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by The Royal Society and can be found at: http://rsta.royalsocietypublishing.org/.This paper illustrates the status of wave energy development in Pacific Rim countries by characterizing the available resource and introducing the region‟s current and potential future leaders in wave energy converter development. It also describes the existing licensing and permitting process as well as potential environmental concerns. Capabilities of Pacific Ocean testing facilities are described in addition to the region‟s vision of the future of wave energy

CiteSeerX

Crossref

ScholarsArchive@OSU

RMIT Research Repository

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Scheduling Jobs in Flowshops with the Introduction of Additional Machines in the Future

Author: A. Apostolico
B. Smyth
C.J. Colbourn
D. Gusfield
E. Ukkonen
E.M. McCreight
G. Manzini
G. Manzini
J. Fischer
J. Fischer
M.A. Bender
M.I. Abouelhoda
S. Burkhardt
S.J. Puglisi
T. Kasai
U. Manber
Publication venue: Elsevier
Publication date: 01/01/2008
Field of study

This is the author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by Elsevier and can be found at: http://www.journals.elsevier.com/expert-systems-with-applications/.The problem of scheduling jobs to minimize total weighted tardiness in flowshops,\ud with the possibility of evolving into hybrid flowshops in the future, is investigated in\ud this paper. As this research is guided by a real problem in industry, the flowshop\ud considered has considerable flexibility, which stimulated the development of an\ud innovative methodology for this research. Each stage of the flowshop currently has\ud one or several identical machines. However, the manufacturing company is planning\ud to introduce additional machines with different capabilities in different stages in the\ud near future. Thus, the algorithm proposed and developed for the problem is not only\ud capable of solving the current flow line configuration but also the potential new\ud configurations that may result in the future. A meta-heuristic search algorithm based\ud on Tabu search is developed to solve this NP-hard, industry-guided problem. Six\ud different initial solution finding mechanisms are proposed. A carefully planned\ud nested split-plot design is performed to test the significance of different factors and\ud their impact on the performance of the different algorithms. To the best of our\ud knowledge, this research is the first of its kind that attempts to solve an industry-guided\ud problem with the concern for future developments

CiteSeerX

Crossref

ScholarsArchive@OSU

RMIT Research Repository

Streaming Support for Data Intensive Cloud-Based Sequence Analysis

Author: Abouelhoda Mohamed
Bruggmann Rémy
El-Kalioby Mohamed
Issa Shadi A.
Kienzler Romeo
Tonellato Peter J.
Wall Dennis
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Bern Open Repository and Information System (BORIS)

On the suitability of suffix arrays for lempel-ziv data compression

Author: D. Gusfield
D. Salomon
E. McCreight
E. Ukkonen
J. Karkainen
J. Storer
J. Ziv
K. Sadakane
M. Abouelhoda
U. Manber
Publication venue: Springer-Verlag Berlin
Publication date: 01/01/2009
Field of study

Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used nowadays. Regarding time and memory requirements, LZ encoding is much more demanding than decoding. In order to speed up the encoding process, efficient data structures, like suffix trees, have been used. In this paper, we explore the use of suffix arrays to hold the dictionary of the LZ encoder, and propose an algorithm to search over it. We show that the resulting encoder attains roughly the same compression ratios as those based on suffix trees. However, the amount of memory required by the suffix array is fixed, and much lower than the variable amount of memory used by encoders based on suffix trees (which depends on the text to encode). We conclude that suffix arrays, when compared to suffix trees in terms of the trade-off among time, memory, and compression ratio, may be preferable in scenarios (e.g., embedded systems) where memory is at a premium and high speed is not critical

Repositório Científico do Instituto Politécnico de Lisboa

Crossref

Lightweight Lempel-Ziv Parsing

Author: D. Okanohara
D. Okanohara
E. Ohlebusch
E. Ohlebusch
G. Chen
G. Navarro
G. Navarro
J. Barbay
J. Fischer
J. Kärkkäinen
J. Ziv
M. Crochemore
M.I. Abouelhoda
P. Ferragina
P. Ferragina
R. Cánovas
S. Kreft
S. Kuruppu
T. Gagie
T. Kasai
T. Starikovskaya
U. Manber
W.I. Chang
Publication venue
Publication date: 01/01/2013
Field of study

We introduce a new approach to LZ77 factorization that uses O(n/d) words of working space and O(dn) time for any d >= 1 (for polylogarithmic alphabet sizes). We also describe carefully engineered implementations of alternative approaches to lightweight LZ77 factorization. Extensive experiments show that the new algorithm is superior in most cases, particularly at the lowest memory levels and for highly repetitive data. As a part of the algorithm, we describe new methods for computing matching statistics which may be of independent interest.Comment: 12 page

arXiv.org e-Print Archive

Crossref

On finding minimal absent words

Author: Armando J Pinho
C Acquisti
D Gusfield
DK Kim
DK Kim
E Ukkonen
EM McCreight
F Shi
G Hampikian
J Herold
J Kärkkäinen
João MOS Rodrigues
M Burrows
MI Abouelhoda
MI Abouelhoda
P Weiner
Paulo JSG Ferreira
S Kurtz
Sara P Garcia
T Kasai
U Manber
U Manber
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The problem of finding the shortest absent words in DNA data has been recently addressed, and algorithms for its solution have been described. It has been noted that longer absent words might also be of interest, but the existing algorithms only provide generic absent words by trivially extending the shortest ones. Results We show how absent words relate to the repetitions and structure of the data, and define a new and larger class of absent words, called minimal absent words, that still captures the essential properties of the shortest absent words introduced in recent works. The words of this new class are minimal in the sense that if their leftmost or rightmost character is removed, then the resulting word is no longer an absent word. We describe an algorithm for generating minimal absent words that, in practice, runs in approximately linear time. An implementation of this algorithm is publicly available at <url>ftp://www.ieeta.pt/~ap/maws</url>. Conclusion Because the set of minimal absent words that we propose is much larger than the set of the shortest absent words, it is potentially more useful for applications that require a richer variety of absent words. Nevertheless, the number of minimal absent words is still manageable since it grows at most linearly with the string size, unlike generic absent words that grow exponentially. Both the algorithm and the concepts upon which it depends shed additional light on the structure of absent words and complement the existing studies on the topic.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

SeqAn An efficient, generic C++ library for sequence analysis

Author: A Darling
A Fabri
A Halpern
Andreas Döring
C Notredame
D Butt
D Vandevoorde
David Weese
DS Hirschberg
EW Myers
EW Myers
G Myers
G Navarro
J Dutheil
J Kececioglu
J Stajich
JC Venter
K Czarnecki
K Mehlhorn
Knut Reinert
M Abouelhoda
M Abouelhoda
M Brudno
M Höhl
M Li
M Pocock
M Wilson
MH Austern
MH Overmars
MI Abouelhoda
N Saitou
O Gotoh
P Bieganski
P Weiner
R Giegerich
RJ Mural
S Burkhardt
S Burkhardt
S Kurtz
SB Needleman
SF Altschul
TH Cormen
Tobias Rausch
U Manber
W Vahrson
WR Pitt
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. Results To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use. Conclusion We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central