36 research outputs found

    Kohdista: An efficient method to index and query possible Rmap alignments : Algorithms for Molecular Biology

    Get PDF
    Background: Genome-wide optical maps are ordered high-resolution restriction maps that give the position of occurrence of restriction cut sites corresponding to one or more restriction enzymes. These genome-wide optical maps are assembled using an overlap-layout-consensus approach using raw optical map data, which are referred to as Rmaps. Due to the high error-rate of Rmap data, finding the overlap between Rmaps remains challenging. Results: We present Kohdista, which is an index-based algorithm for finding pairwise alignments between single molecule maps (Rmaps). The novelty of our approach is the formulation of the alignment problem as automaton path matching, and the application of modern index-based data structures. In particular, we combine the use of the Generalized Compressed Suffix Array (GCSA) index with the wavelet tree in order to build Kohdista. We validate Kohdista on simulated E. coli data, showing the approach successfully finds alignments between Rmaps simulated from overlapping genomic regions. Conclusion: we demonstrate Kohdista is the only method that is capable of finding a significant number of high quality pairwise Rmap alignments for large eukaryote organisms in reasonable time. © 2019 The Author(s).Peer reviewe

    A succinct solution to Rmap alignment

    Get PDF
    Peer reviewe

    Wave Energy: a Pacific Perspective

    Get PDF
    This is the author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by The Royal Society and can be found at: http://rsta.royalsocietypublishing.org/.This paper illustrates the status of wave energy development in Pacific Rim countries by characterizing the available resource and introducing the region‟s current and potential future leaders in wave energy converter development. It also describes the existing licensing and permitting process as well as potential environmental concerns. Capabilities of Pacific Ocean testing facilities are described in addition to the region‟s vision of the future of wave energy

    Computing the antiperiod(s) of a string

    Get PDF
    A string S[1, n] is a power (or repetition or tandem repeat) of order k and period n/k, if it can be decomposed into k consecutive identical blocks of length n/k. Powers and periods are fundamental structures in the study of strings and algorithms to compute them efficiently have been widely studied. Recently, Fici et al. (Proc. ICALP 2016) introduced an antipower of order k to be a string composed of k distinct blocks of the same length, n/k, called the antiperiod. An arbitrary string will have antiperiod t if it is prefix of an antipower with antiperiod t. In this paper, we describe efficient algorithm for computing the smallest antiperiod of a string S of length n in O(n) time. We also describe an algorithm to compute all the antiperiods of S that runs in O(n log n) time. © Hayam Alamro, Golnaz Badkobeh, Djamal Belazzougui, Costas S. Iliopoulos, and Simon J. Puglisi.Peer reviewe

    Suffix arrays: what are they good for?

    Get PDF
    Recently the theoretical community has displayed a flurry of interest in suffix arrays, and compressed suffix arrays. New, asymptotically optimal algorithms for construction, search, and compression of suffix arrays have been proposed. In this talk we will present our investigations into the practicalities of these latest developments. In particular, we investigate whether suffix arrays can indeed replace inverted files, as suggested in recent literature on suffix arrays

    Some restrictions on periodicity in strings

    Get PDF
    Given a string x = x[1..n], a repetition of period p in x is a substring ur = x[i..i+rp−1], p = |u|, r ≥ 2, where neither u = x[i..i+p−1] nor x[i..i+(r+1)p−1] is a repetition. The maximum number of repetitions in any string x is well known to be Θ(n log n). A run or maximal periodicity of period p in x is a substring urt = x[i..i+rp+|t|−1] of x, where ur is a repetition, t a proper prefix of u, and no repetition of period p begins at position i−1 of x or ends at position i+rp+|t|. In 2000 Kolpakov & Kucherov showed that the maximum number ρ(n) of runs in any string x is O(n), but their proof was nonconstructive and provided no specific constant of proportionality. At the same time, they presented experimental data strongly suggesting that ρ(n) < n. that the maximum any string x again encourages the belief that in fact σ(n) < n. Recently, Fan et al.(“A new periodicity lemma”, Sixteenth Annual Symp. Combin. Pattern Matching, 2005) took a first step toward proving these conjectures, by presenting results that establish limitations on the number of squares of a specified range of periods that can occur over a specified range of positions in x. In this paper, we further tighten these restrictions by showing how the existence of two squares u and v (v longer than u) at the same position i in x limits the occurrence of smaller squares with period w ∈ (|v| − |u|, |u|) in the neighborhood around i

    On the maximal sum of exponents of runs in a string

    Get PDF
    A run is an inclusion maximal occurrence in a string (as a subinterval) of a repetition vv with a period pp such that 2pv2p \le |v|. The exponent of a run is defined as v/p|v|/p and is 2\ge 2. We show new bounds on the maximal sum of exponents of runs in a string of length nn. Our upper bound of 4.1n4.1n is better than the best previously known proven bound of 5.6n5.6n by Crochemore & Ilie (2008). The lower bound of 2.035n2.035n, obtained using a family of binary words, contradicts the conjecture of Kolpakov & Kucherov (1999) that the maximal sum of exponents of runs in a string of length nn is smaller than 2n2nComment: 7 pages, 1 figur

    Scheduling Jobs in Flowshops with the Introduction of Additional Machines in the Future

    Get PDF
    This is the author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by Elsevier and can be found at: http://www.journals.elsevier.com/expert-systems-with-applications/.The problem of scheduling jobs to minimize total weighted tardiness in flowshops,\ud with the possibility of evolving into hybrid flowshops in the future, is investigated in\ud this paper. As this research is guided by a real problem in industry, the flowshop\ud considered has considerable flexibility, which stimulated the development of an\ud innovative methodology for this research. Each stage of the flowshop currently has\ud one or several identical machines. However, the manufacturing company is planning\ud to introduce additional machines with different capabilities in different stages in the\ud near future. Thus, the algorithm proposed and developed for the problem is not only\ud capable of solving the current flow line configuration but also the potential new\ud configurations that may result in the future. A meta-heuristic search algorithm based\ud on Tabu search is developed to solve this NP-hard, industry-guided problem. Six\ud different initial solution finding mechanisms are proposed. A carefully planned\ud nested split-plot design is performed to test the significance of different factors and\ud their impact on the performance of the different algorithms. To the best of our\ud knowledge, this research is the first of its kind that attempts to solve an industry-guided\ud problem with the concern for future developments

    Survival-Time Distribution for Inelastic Collapse

    Full text link
    In a recent publication [PRL {\bf 81}, 1142 (1998)] it was argued that a randomly forced particle which collides inelastically with a boundary can undergo inelastic collapse and come to rest in a finite time. Here we discuss the survival probability for the inelastic collapse transition. It is found that the collapse-time distribution behaves asymptotically as a power-law in time, and that the exponent governing this decay is non-universal. An approximate calculation of the collapse-time exponent confirms this behaviour and shows how inelastic collapse can be viewed as a generalised persistence phenomenon.Comment: 4 pages, RevTe

    On the maximal number of cubic subwords in a string

    Full text link
    We investigate the problem of the maximum number of cubic subwords (of the form wwwwww) in a given word. We also consider square subwords (of the form wwww). The problem of the maximum number of squares in a word is not well understood. Several new results related to this problem are produced in the paper. We consider two simple problems related to the maximum number of subwords which are squares or which are highly repetitive; then we provide a nontrivial estimation for the number of cubes. We show that the maximum number of squares xxxx such that xx is not a primitive word (nonprimitive squares) in a word of length nn is exactly n21\lfloor \frac{n}{2}\rfloor - 1, and the maximum number of subwords of the form xkx^k, for k3k\ge 3, is exactly n2n-2. In particular, the maximum number of cubes in a word is not greater than n2n-2 either. Using very technical properties of occurrences of cubes, we improve this bound significantly. We show that the maximum number of cubes in a word of length nn is between (1/2)n(1/2)n and (4/5)n(4/5)n. (In particular, we improve the lower bound from the conference version of the paper.)Comment: 14 page
    corecore