Search CORE

36 research outputs found

Kohdista: An efficient method to index and query possible Rmap alignments : Algorithms for Molecular Biology

Author: Boucher C.
Muggli M.D.
Puglisi S.J.
Publication venue
Publication date: 01/01/2019
Field of study

Background: Genome-wide optical maps are ordered high-resolution restriction maps that give the position of occurrence of restriction cut sites corresponding to one or more restriction enzymes. These genome-wide optical maps are assembled using an overlap-layout-consensus approach using raw optical map data, which are referred to as Rmaps. Due to the high error-rate of Rmap data, finding the overlap between Rmaps remains challenging. Results: We present Kohdista, which is an index-based algorithm for finding pairwise alignments between single molecule maps (Rmaps). The novelty of our approach is the formulation of the alignment problem as automaton path matching, and the application of modern index-based data structures. In particular, we combine the use of the Generalized Compressed Suffix Array (GCSA) index with the wavelet tree in order to build Kohdista. We validate Kohdista on simulated E. coli data, showing the approach successfully finds alignments between Rmaps simulated from overlapping genomic regions. Conclusion: we demonstrate Kohdista is the only method that is capable of finding a significant number of high quality pairwise Rmap alignments for large eukaryote organisms in reasonable time. © 2019 The Author(s).Peer reviewe

Helsingin yliopiston digitaalinen arkisto

A succinct solution to Rmap alignment

Author: Boucher C.
Muggli M.D.
Puglisi S.J.
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2018
Field of study

Peer reviewe

Dagstuhl Research Online Publication Server

Helsingin yliopiston digitaalinen arkisto

Wave Energy: a Pacific Perspective

Author: D. Gusfield
D. Okanohara
G. Manzini
J. Fischer
J. Kärkkäinen
J. Kärkkäinen
K. Sadakane
M.I. Abouelhoda
P. Ferragina
R. Dementiev
R. Sinha
S.J. Puglisi
S.J. Puglisi
T. Kasai
U. Manber
V. Mäkinen
Publication venue: The Royal Society
Publication date: 01/01/2009
Field of study

This is the author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by The Royal Society and can be found at: http://rsta.royalsocietypublishing.org/.This paper illustrates the status of wave energy development in Pacific Rim countries by characterizing the available resource and introducing the region‟s current and potential future leaders in wave energy converter development. It also describes the existing licensing and permitting process as well as potential environmental concerns. Capabilities of Pacific Ocean testing facilities are described in addition to the region‟s vision of the future of wave energy

CiteSeerX

Crossref

ScholarsArchive@OSU

RMIT Research Repository

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Computing the antiperiod(s) of a string

Author: Alamro Hayam
Badkobeh G.
Belazzougui D.
Iliopoulos C.S.
Puglisi S.J.
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2019
Field of study

A string S[1, n] is a power (or repetition or tandem repeat) of order k and period n/k, if it can be decomposed into k consecutive identical blocks of length n/k. Powers and periods are fundamental structures in the study of strings and algorithms to compute them efficiently have been widely studied. Recently, Fici et al. (Proc. ICALP 2016) introduced an antipower of order k to be a string composed of k distinct blocks of the same length, n/k, called the antiperiod. An arbitrary string will have antiperiod t if it is prefix of an antipower with antiperiod t. In this paper, we describe efficient algorithm for computing the smallest antiperiod of a string S of length n in O(n) time. We also describe an algorithm to compute all the antiperiods of S that runs in O(n log n) time. © Hayam Alamro, Golnaz Badkobeh, Djamal Belazzougui, Costas S. Iliopoulos, and Simon J. Puglisi.Peer reviewe

Goldsmiths Research Online

Dagstuhl Research Online Publication Server

Helsingin yliopiston digitaalinen arkisto

King's Research Portal

Suffix arrays: what are they good for?

Author: Puglisi S.J.
Smyth W.F.
Turpin A.
Publication venue
Publication date: 01/01/2006
Field of study

Recently the theoretical community has displayed a flurry of interest in suffix arrays, and compressed suffix arrays. New, asymptotically optimal algorithms for construction, search, and compression of suffix arrays have been proposed. In this talk we will present our investigations into the practicalities of these latest developments. In particular, we investigate whether suffix arrays can indeed replace inverted files, as suggested in recent literature on suffix arrays

Research Repository

Some restrictions on periodicity in strings

Author: Puglisi S.J.
Smyth W.F.
Turpin A.
Publication venue
Publication date: 01/01/2005
Field of study

Given a string x = x[1..n], a repetition of period p in x is a substring ur = x[i..i+rp−1], p = |u|, r ≥ 2, where neither u = x[i..i+p−1] nor x[i..i+(r+1)p−1] is a repetition. The maximum number of repetitions in any string x is well known to be Θ(n log n). A run or maximal periodicity of period p in x is a substring urt = x[i..i+rp+|t|−1] of x, where ur is a repetition, t a proper prefix of u, and no repetition of period p begins at position i−1 of x or ends at position i+rp+|t|. In 2000 Kolpakov & Kucherov showed that the maximum number ρ(n) of runs in any string x is O(n), but their proof was nonconstructive and provided no specific constant of proportionality. At the same time, they presented experimental data strongly suggesting that ρ(n) < n. that the maximum any string x again encourages the belief that in fact σ(n) < n. Recently, Fan et al.(“A new periodicity lemma”, Sixteenth Annual Symp. Combin. Pattern Matching, 2005) took a first step toward proving these conjectures, by presenting results that establish limitations on the number of squares of a specified range of periods that can occur over a specified range of positions in x. In this paper, we further tighten these restrictions by showing how the existence of two squares u and v (v longer than u) at the same position i in x limits the occurrence of smaller squares with period w ∈ (|v| − |u|, |u|) in the neighborhood around i

Research Repository

On the maximal sum of exponents of runs in a string

Author: D. Gusfield
F. Franek
J. Berstel
J. Simpson
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Giraud
M. Lothaire
R.M. Kolpakov
S.J. Puglisi
W. Rytter
W. Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/03/2010
Field of study

A run is an inclusion maximal occurrence in a string (as a subinterval) of a repetition

v

with a period

p

such that

2p \le |v|

. The exponent of a run is defined as

|v|/p

and is

\ge 2

. We show new bounds on the maximal sum of exponents of runs in a string of length

n

. Our upper bound of

4.1n

is better than the best previously known proven bound of

5.6n

by Crochemore & Ilie (2008). The lower bound of

2.035n

, obtained using a family of binary words, contradicts the conjecture of Kolpakov & Kucherov (1999) that the maximal sum of exponents of runs in a string of length

n

is smaller than

2n

Comment: 7 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref

Elsevier - Publisher Connector

King's Research Portal

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Scheduling Jobs in Flowshops with the Introduction of Additional Machines in the Future

Author: A. Apostolico
B. Smyth
C.J. Colbourn
D. Gusfield
E. Ukkonen
E.M. McCreight
G. Manzini
G. Manzini
J. Fischer
J. Fischer
M.A. Bender
M.I. Abouelhoda
S. Burkhardt
S.J. Puglisi
T. Kasai
U. Manber
Publication venue: Elsevier
Publication date: 01/01/2008
Field of study

This is the author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by Elsevier and can be found at: http://www.journals.elsevier.com/expert-systems-with-applications/.The problem of scheduling jobs to minimize total weighted tardiness in flowshops,\ud with the possibility of evolving into hybrid flowshops in the future, is investigated in\ud this paper. As this research is guided by a real problem in industry, the flowshop\ud considered has considerable flexibility, which stimulated the development of an\ud innovative methodology for this research. Each stage of the flowshop currently has\ud one or several identical machines. However, the manufacturing company is planning\ud to introduce additional machines with different capabilities in different stages in the\ud near future. Thus, the algorithm proposed and developed for the problem is not only\ud capable of solving the current flow line configuration but also the potential new\ud configurations that may result in the future. A meta-heuristic search algorithm based\ud on Tabu search is developed to solve this NP-hard, industry-guided problem. Six\ud different initial solution finding mechanisms are proposed. A carefully planned\ud nested split-plot design is performed to test the significance of different factors and\ud their impact on the performance of the different algorithms. To the best of our\ud knowledge, this research is the first of its kind that attempts to solve an industry-guided\ud problem with the concern for future developments

CiteSeerX

Crossref

ScholarsArchive@OSU

RMIT Research Repository

Survival-Time Distribution for Inelastic Collapse

Author: A. Puglisi
Alan J. Bray
B. Derrida
B. Derrida
B. Derrida
B. Yurke
C. W. Gardiner
D.R.M. Williams
E. Ben-Naim
G. Peng
J. Cardy
J. Krug
J.A. McFadden
M. Howard
M. Marcos-Martin
M.R. Swift
Michael R. Swift
P. Langevin
S.J. Cornell
S.J. Cornell
S.N. Majumdar
S.N. Majumdar
T.C. Lubensky
T.W. Burkhardt
W.Y. Tam
Publication venue: 'American Physical Society (APS)'
Publication date: 30/11/1998
Field of study

In a recent publication [PRL {\bf 81}, 1142 (1998)] it was argued that a randomly forced particle which collides inelastically with a boundary can undergo inelastic collapse and come to rest in a finite time. Here we discuss the survival probability for the inelastic collapse transition. It is found that the collapse-time distribution behaves asymptotically as a power-law in time, and that the exponent governing this decay is non-universal. An approximate calculation of the collapse-time exponent confirms this behaviour and shows how inelastic collapse can be viewed as a generalised persistence phenomenon.Comment: 4 pages, RevTe

arXiv.org e-Print Archive

Crossref

On the maximal number of cubic subwords in a string

Author: A. Apostolico
A. Thue
A.S. Freankel
C.S. Iliopoulos
D. Damanik
L. Ilie
L. Ilie
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Giraud
M. Lothaire
M.G. Main
M.G. Main
N.J. Fine
P. Baturo
R.M. Kolpakov
S.J. Puglisi
W. Rytter
W. Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We investigate the problem of the maximum number of cubic subwords (of the form

www

) in a given word. We also consider square subwords (of the form

ww

). The problem of the maximum number of squares in a word is not well understood. Several new results related to this problem are produced in the paper. We consider two simple problems related to the maximum number of subwords which are squares or which are highly repetitive; then we provide a nontrivial estimation for the number of cubes. We show that the maximum number of squares

xx

such that

x

is not a primitive word (nonprimitive squares) in a word of length

n

is exactly

\lfloor \frac{n}{2}\rfloor - 1

, and the maximum number of subwords of the form

x^k

, for

k\ge 3

, is exactly

n-2

. In particular, the maximum number of cubes in a word is not greater than

n-2

either. Using very technical properties of occurrences of cubes, we improve this bound significantly. We show that the maximum number of cubes in a word of length

n

is between

(1/2)n

and

(4/5)n

. (In particular, we improve the lower bound from the conference version of the paper.)Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

Crossref