Search CORE

22,514 research outputs found

Designing RNA secondary structures is hard

Author: Bonnet Edouard
Rzążewski Paweł
Sikora Florian
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/04/2018
Field of study

An RNA sequence is a word over an alphabet on four elements {A, C, G, U} called bases. RNA sequences fold into secondary structures where some bases match one another while others remain unpaired. Pseudoknot-free secondary structures can be represented as well-parenthesized expressions with additional dots, where pairs of matching parentheses symbolize paired bases and dots, unpaired bases. The two fundamental problems in RNA algorithmic are to predict how sequences fold within some model of energy and to design sequences of bases which will fold into targeted secondary structures. Predicting how a given RNA sequence folds into a pseudoknot-free secondary structure is known to be solvable in cubic time since the eighties and in truly subcubic time by a recent result of Bringmann et al. (FOCS 2016), whereas Lyngsø has shown it is NP-complete if pseudoknots are allowed (ICALP 2004). As a stark contrast, it is unknown whether or not designing a given RNA secondary structure is a tractable task; this has been raised as a challenging open question by Anne Condon (ICALP 2003). Because of its crucial importance in a number of fields such as pharmaceutical research and biochemistry, there are dozens of heuristics and software libraries dedicated to RNA secondary structure design. It is therefore rather surprising that the computational complexity of this central problem in bioinformatics has been unsettled for decades. In this paper we show that, in the simplest model of energy which is the Watson-Crick model the design of secondary structures is NP-complete if one adds natural constraints of the form: index i of the sequence has to be labeled by base b. This negative result suggests that the same lower bound holds for more realistic models of energy. It is noteworthy that the additional constraints are by no means artificial: they are provided by all the RNA design pieces of software and they do correspond to the actual practice (see for example the instances of the EteRNA project). Our reduction from a variant of 3-Sat has as main ingredients: arches of parentheses of different widths, a linear order interleaving variables and clauses, and an intended rematching strategy which increases the number of pairs iff the three literals of a same clause are not satisfied. The correctness of the construction is also quite intricate; it relies on the polynomial algorithm for the design of saturated structures – secondary structures without dots – by Haleš et al. (Algorithmica 2016), counting arguments, and a concise case analysis

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Middlesex University Research Repository

Hal-Diderot

Designing RNA secondary structures is hard

Author: Bonnet E.
Bonnet E.
Rzążewski P.
Rzążewski P.
Sikora F.
Sikora F.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2018
Field of study

Middlesex University Research Repository

Designing RNA Secondary Structures is Hard

Author: Bonnet Edouard
Rzążewski Paweł
Sikora Florian
Publication venue: HAL CCSD
Publication date: 21/04/2018
Field of study

International audienceAn RNA sequence is a word over an alphabet on four elements {A, C, G, U } called bases. RNA sequences fold into secondary structures where some bases pair with one another while others remain unpaired. Pseudoknot-free secondary structures can be represented as well-parenthesized expressions with additional dots, where pairs of matching parentheses symbolize paired bases and dots, unpaired bases. The two fundamental problems in RNA algorithmic are to predict how sequences fold within some model of energy and to design sequences of bases which will fold into targeted secondary structures. Predicting how a given RNA sequence folds into a pseudoknot-free secondary structure is known to be solvable in cubic time since the eighties and in truly subcubic time by a recent result of Bringmann et al. (FOCS 2016), whereas Lyngsø has shown it is NP-complete if pseudoknots are allowed (ICALP 2004). As a stark contrast, it is unknown whether or not designing a given RNA secondary structure is a tractable task; this has been raised as a challenging open question by Anne Condon (ICALP 2003). Because of its crucial importance in a number of fields such as pharmaceutical research and biochemistry, there are dozens of heuristics and software libraries dedicated to RNA secondary structure design. It is therefore rather surprising that the computational complexity of this central problem in bioinformatics has been unsettled for decades. In this paper we show that, in the simplest model of energy which is the Watson-Crick model the design of secondary structures is NP-complete if one adds natural constraints of the form: index i of the sequence has to be labeled by base b. This negative result suggests that the same lower bound holds for more realistic models of energy. It is noteworthy that the additional constraints are by no means artificial: they are provided by all the RNA design pieces of software and they do correspond to the actual practice (see for example the instances of the EteRNA project). Our reduction from a variant of 3-Sat has as main ingredients: arches of parentheses of different widths, a linear order interleaving variables and clauses, and an intended rematching strategy which increases the number of pairs iff the three literals of a same clause are false. The correctness of the construction is also quite intricate; it relies on the polynomial algorithm for the design of saturated structures-secondary structures without dots-by Haleš et al. (Algorithmica 2016), counting arguments, and a concise case analysis

INRIA a CCSD electronic archive server

Flexible RNA design under structure and sequence constraints using formal languages

Author: Denise Alain
Ponty Yann
Vialette Stéphane
Waldispühl Jérôme
Zhang Yi
Zhou Yu
Publication venue
Publication date: 01/08/2013
Field of study

The problem of RNA secondary structure design (also called inverse folding) is the following: given a target secondary structure, one aims to create a sequence that folds into, or is compatible with, a given structure. In several practical applications in biology, additional constraints must be taken into account, such as the presence/absence of regulatory motifs, either at a specific location or anywhere in the sequence. In this study, we investigate the design of RNA sequences from their targeted secondary structure, given these additional sequence constraints. To this purpose, we develop a general framework based on concepts of language theory, namely context-free grammars and finite automata. We efficiently combine a comprehensive set of constraints into a unifying context-free grammar of moderate size. From there, we use generic generic algorithms to perform a (weighted) random generation, or an exhaustive enumeration, of candidate sequences. The resulting method, whose complexity scales linearly with the length of the RNA, was implemented as a standalone program. The resulting software was embedded into a publicly available dedicated web server. The applicability demonstrated of the method on a concrete case study dedicated to Exon Splicing Enhancers, in which our approach was successfully used in the design of \emph{in vitro} experiments.Comment: ACM BCB 2013 - ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics (2013

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL-Polytechnique

HAL - UPEC / UPEM