Search CORE

13 research outputs found

Combinatorial RNA Design Designability and Structure-Approximating Algorithm in Watson-Crick and Nussinov-Jacobson Energy Models

Author: Haleš Jozef
Héliou Alice
Maňuch Ján
Ponty Yann
Stacho Ladislav
Publication venue
Publication date: 01/01/2016
Field of study

We consider the Combinatorial RNA Design problem, a minimal instance of RNA design where one must produce an RNA sequence that adopts a given secondary structure as its minimal free-energy structure. We consider two free-energy models where the contributions of base pairs are additive and independent: the purely combinatorial Watson-Crick model, which only allows equally-contributing A -- U and C -- G base pairs, and the real-valued Nussinov-Jacobson model, which associates arbitrary energies to A -- U, C -- G and G -- U base pairs. We first provide a complete characterization of designable structures using restricted alphabets and, in the four-letter alphabet, provide a complete characterization for designable structures without unpaired bases. When unpaired bases are allowed, we characterize extensive classes of (non-)designable structures, and prove the closure of the set of designable structures under the stutter operation. Membership of a given structure to any of the classes can be tested in

\Theta

(n) time, including the generation of a solution sequence for positive instances. Finally, we consider a structure-approximating relaxation of the design, and provide a

\Theta

(n) algorithm which, given a structure S that avoids two trivially non-designable motifs, transforms S into a designable structure constructively by adding at most one base-pair to each of its stems.Comment: To appea

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Polytechnique

emMAW : computing minimal absent words in external memory

Author: Héliou Alice
Pissis Solon P.
Puglisi Simon J.
Publication venue
Publication date: 01/01/2017
Field of study

Motivation: The biological significance of minimal absent words has been investigated in genomes of organisms from all domains of life. For instance, three minimal absent words of the human genome were found in Ebola virus genomes. There exists an O(n)-time and O(n)-space algorithm for computing all minimal absent words of a sequence of length n on a fixed-sized alphabet based on suffix arrays. A standard implementation of this algorithm, when applied to a large sequence of length n, requires more than 20n bytes of RAM. Such memory requirements are a significant hurdle to the computation of minimal absent words in large datasets. Results: We present emMAW, the first external-memory algorithm for computing minimal absent words. A free open-source implementation of our algorithm is made available. This allows for computation of minimal absent words on far bigger data sets than was previously possible. Our implementation requires less than 3 h on a standard workstation to process the full human genome when as little as 1GB of RAM is made available. We stress that our implementation, despite making use of external memory, is fast; indeed, even on relatively smaller datasets when enough RAM is available to hold all necessary data structures, it is less than two times slower than state-of-theart internal-memory implementations.Peer reviewe

Crossref

INRIA a CCSD electronic archive server

Helsingin yliopiston digitaalinen arkisto

King's Research Portal

HAL-Polytechnique

Parallelising the Computation of Minimal Absent Words

Author: Barton Carl
Héliou Alice
Mouchard Laurent
Pissis Solon P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/04/2015
Field of study

International audienceAn absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their computation also provides a fast alternative for measuring approximation in sequence comparison. There exists an O(n)-time and O(n)-space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix array (Barton et al., 2014). An implementation of this algorithm was also provided by the authors and is currently the fastest available. In this article, we present a new O(n)-time and O(n)-space algorithm for computing all minimal absent words; it has the desirable property that, given the indexing data structure at hand, the computation of minimal absent words can be executed in parallel. Experimental results show that a mul-tiprocessing implementation of this algorithm can accelerate the overall computation by more than a factor of two compared to state-of-the-art approaches. By excluding the indexing data structure construction time, we show that the implementation achieves near-optimal speed-ups

HAL - Normandie Université

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

King's Research Portal

HAL-Polytechnique

HAL-Rennes 1

Efficient dynamic range minimum query

Author: Héliou Alice
Léonard Martine
Mouchard Laurent
Salson Mikael
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

International audienceThe Range Minimum Query problem consists in answering efficiently to a simple question: " what is the minimal element appearing between two specified indices of a given array? ". In this paper we present a novel approach that offers a satisfying trade-off between time and space. Moreover we show how the structure can be easily maintained whenever an insertion, modification or deletion modifies the array

HAL - Normandie Université

Crossref

INRIA a CCSD electronic archive server

Absent words in a sliding window with applications

Author: Crochemore Maxime
Héliou Alice
Kucherov Gregory
Mouchard Laurent
Pissis Solon
Ramusat Yann
Publication venue: Elsevier
Publication date: 01/09/2019
Field of study

International audienceAn absent word of a word y is a word that does not occur in y. It is then called minimal if all its proper factors occur in y. In fact, minimal absent words (MAWs) provide useful information about y and thus have several applications. In this paper, we propose an algorithm that maintains the set of MAWs of a fixed-length window sliding over y online. Our algorithm represents MAWs through nodes of the suffix tree. Specifically, the suffix tree of the sliding window is maintained using modified Senft's algorithm (Senft, 2005), itself generalizing Ukkonen's online algorithm (Ukkonen, 1995). We then apply this algorithm to the approximate pattern-matching problem under the Length Weighted Index distance (Chairungsee and Crochemore, 2012). This results in an online -time algorithm for finding approximate occurrences of a word x in y, , where σ is the alphabet size

HAL - Normandie Université

CWI's Institutional Repository

INRIA a CCSD electronic archive server

King's Research Portal

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM