Search CORE

39 research outputs found

Sequence searching allowing for non-overlapping adjacent unbalanced translocations

Author: Cantone D.
Faro S.
Pavone A.
Publication venue: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Publication date: 01/01/2020
Field of study

Unbalanced translocations are among the most frequent chromosomal alterations, accounted for 30% of all losses of heterozygosity, a major genetic event causing inactivation of tumor suppressor genes. Despite of their central role in genomic sequence analysis, little attention has been devoted to the problem of matching sequences allowing for this kind of chromosomal alteration. In this paper we investigate the approximate string matching problem when the edit operations are non-overlapping unbalanced translocations of adjacent factors. In particular, we first present a O(nm3)-time and O(m2)-space algorithm based on the dynamic-programming approach. Then we improve our first result by designing a second solution which makes use of the Directed Acyclic Word Graph of the pattern. In particular, we show that under the assumptions of equiprobability and independence of characters, our algorithm has a O(n log2σ m) average time complexity, for an alphabet of size σ, still maintaining the O(nm3)-time and the O(m2)-space complexity in the worst case. To the best of our knowledge this is the first solution in literature for the approximate string matching problem allowing for unbalanced translocations of factors

Archivio istituzionale della ricerca - Università di Palermo

Sequence Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations

Author: Cantone Domenico
Faro Simone
Pavone Arianna
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)
Publication date: 01/01/2020
Field of study

Dagstuhl Research Online Publication Server

String Periods in the Order-Preserving Model

Author: Gourdel Garance
Kociumaka Tomasz
Radoszewski Jakub
Rytter Wojciech
Shur Arseny
Walen Tomasz
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 35th Symposium on Theoretical Aspects of Computer Science (STACS 2018)
Publication date: 01/01/2018
Field of study

The order-preserving model (op-model, in short) was introduced quite recently but has already attracted significant attention because of its applications in data analysis. We introduce several types of periods in this setting (op-periods). Then we give algorithms to compute these periods in time O(n), O(n log log n), O(n log^2 log n/log log log n), O(n log n) depending on the type of periodicity. In the most general variant the number of different periods can be as big as Omega(n^2), and a compact representation is needed. Our algorithms require novel combinatorial insight into the properties of such periods

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Finding an Optimal Alphabet Ordering for Lyndon Factorization Is Hard

Author: Gibney Daniel
Thankachan Sharma V.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

String periods in the order-preserving model

Author: Gourdel G.
Kociumaka T.
Radoszewski J.
Rytter W.
Shur A.
Waleń T.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

In the order-preserving model, two strings match if they share the same relative order between the characters at the corresponding positions. This model is quite recent, but it has already attracted significant attention because of its applications in data analysis. We introduce several types of periods in this setting (op-periods). Then we give algorithms to compute these periods in time O(n), O(nlog⁡log⁡n), O(nlog2⁡log⁡n/log⁡log⁡log⁡n), O(nlog⁡n) depending on the type of periodicity. In the most general variant, the number of different op-periods can be as big as Ω(n2), and a compact representation is needed. Our algorithms require novel combinatorial insight into the properties of op-periods. In particular, we characterize the Fine–Wilf property for coprime op-periods. © 2019 Elsevier Inc.Supported by ISF grants no. 824/17 and 1278/16 and by an ERC grant MPM under the EU's Horizon 2020 Research and Innovation Programme (grant no. 683064).Supported by the Ministry of Science and Higher Education of the Russian Federation, project 1.3253.2017.A part of this work was done during the workshop StringMasters in Warsaw 2017 that was sponsored by the Warsaw Center of Mathematics and Computer Science. The authors thank the participants of the workshop, especially Hideo Bannai and Shunsuke Inenaga, for helpful discussions

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Weak factor automata : the failure of failure factor oracles?

Author: Cleophas L.G.W.A. (Loek)
Kourie Derrick G.
Watson Bruce William
Publication venue: Computer Society of South Africa
Publication date: 01/08/2014
Field of study

In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One e cient, compact representation is the factor oracle (FO). At the same time, any classical deterministic nite automaton (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace multiple symbol transitions, potentially yielding a more compact representation. We combine the two ideas and directly construct a failure factor oracle (FFO) from a given sequence, in contrast to ex post facto transformation to an FDFA. The algorithm is suitable for both short and long sequences. We empirically compared the resulting FFOs and FOs on number of transitions for many DNA sequences of lengths 4 - 512, showing gains of up to 10% in total number of transitions, with failure transitions also taking up less space than symbol transitions. The resulting FFOs can be used for indexing, as well as in a variant of the FO-using backward oracle matching algorithm. We discuss and classify this pattern matching algorithm in terms of the keyword pattern matching taxonomies of Watson, Cleophas and Zwaan. We also empirically compared the use of FOs and FFOs in such backward reading pattern matching algorithms, using both DNA and natural language (English) data sets. The results indicate that the decrease in pattern matching performance of an algorithm using an FFO instead of an FO may outweigh the gain in representation space by using an FFO instead of an FO.http://www.journals.co.za/ej/ejour_comp.htmlam201

Directory of Open Access Journals

Stellenbosch University SUNScholar Repository

UPSpace at the University of Pretoria

Algorithms for Order-Preserving Matching

Author: Chhabra Tamanna
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

String matching is a widely studied problem in Computer Science. There have been many recent developments in this field. One fascinating problem considered lately is the order-preserving matching (OPM) problem. The task is to find all the substrings in the text which have the same length and relative order as the pattern, where the relative order is the numerical order of the numbers in a string. The problem finds its applications in the areas involving time series or series of numbers. More specifically, it is useful for those who are interested in the relative order of the pattern and not in the pattern itself. For example, it can be used by analysts in a stock market to study movements of prices. In addition to the OPM problem, we also studied its approximate variation. In approximate order-preserving matching, we search for those substrings in the text which have relative order similar to the pattern, i.e., relative order of the pattern matches with at most k mismatches. With respect to applications of order-preserving matching, approximate search is more meaningful than exact search. We developed various advanced solutions for the problem and its variant. Special emphasis was laid on the practical efficiency of the solutions. Particularly, we introduced a simple solution for the OPM problem using filtration. We proved experimentally that our method was effective and faster than the previous solutions for the problem. In addition, we combined the Single Instruction Multiple Data (SIMD) instruction set architecture with filtration to develop competent solutions which were faster than our previous solution. Moreover, we proposed another efficient solution without filtration using the SIMD architecture. We also presented an offline solution based on the FM-index scheme. Furthermore, we proposed practical solutions for the approximate order-preserving matching problem and one of the solutions was the first sublinear solution on average for the problem

Aaltodoc Publication Archive

Recommended from our members

Text Indexing for Long Patterns: Anchors are All you Need

Author: Ayad L
Loukides G
Pissis S
Publication venue: Association for Computing Machinery
Publication date: 10/07/2023
Field of study

PVLDB Artifact Availability: The source code, data, and/or other artifacts have been made available at https://github.com/lorrainea/BDA- index.Copyright © 2023 the owner/author(s). In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to represent such string datasets in a compact form but also to simultaneously enable fast pattern matching queries. This is the classic text indexing problem. The four absolute measures anyone should pay attention to when designing or implementing a text index are: (i) index space; (ii) query time; (iii) construction space; and (iv) construction time. Unfortunately, however, most (if not all) widely-used indexes (e.g., suffix tree, suffix array, or their compressed counterparts) are not optimized for all four measures simultaneously, as it is difficult to have the best of all four worlds. Here, we take an important step in this direction by showing that text indexing with locally consistent anchors (lc-anchors) offers remarkably good performance in all four measures, when we have at hand a lower bound l on the length of the queried patterns --- which is arguably a quite reasonable assumption in practical applications. Specifically, we improve on the construction of the index proposed by Loukides and Pissis, which is based on bidirectional string anchors (bd-anchors), a new type of lc-anchors, by: (i) designing an average-case linear-time algorithm to compute bd-anchors; and (ii) developing a semi-external-memory implementation to construct the index in small space using near-optimal work. We then present an extensive experimental evaluation, based on the four measures, using real benchmark datasets. The results show that, for long patterns, the index constructed using our improved algorithms compares favorably to all classic indexes: (compressed) suffix tree; (compressed) suffix array; and the FM-index.European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreements No 872539 and 956229, respectively; and by UKRI through REPHRAIN (EP/V011189/1)

Brunel University Research Archive