11 research outputs found
RNAiFold2T: Constraint Programming design of thermo-IRES switches
Motivation: RNA thermometers (RNATs) are cis-regulatory ele- ments that
change secondary structure upon temperature shift. Often involved in the
regulation of heat shock, cold shock and virulence genes, RNATs constitute an
interesting potential resource in synthetic biology, where engineered RNATs
could prove to be useful tools in biosensors and conditional gene regulation.
Results: Solving the 2-temperature inverse folding problem is critical for RNAT
engineering. Here we introduce RNAiFold2T, the first Constraint Programming
(CP) and Large Neighborhood Search (LNS) algorithms to solve this problem.
Benchmarking tests of RNAiFold2T against existent programs (adaptive walk and
genetic algorithm) inverse folding show that our software generates two orders
of magnitude more solutions, thus allow- ing ample exploration of the space of
solutions. Subsequently, solutions can be prioritized by computing various
measures, including probability of target structure in the ensemble, melting
temperature, etc. Using this strategy, we rationally designed two thermosensor
internal ribosome entry site (thermo-IRES) elements, whose normalized
cap-independent transla- tion efficiency is approximately 50% greater at 42?C
than 30?C, when tested in reticulocyte lysates. Translation efficiency is lower
than that of the wild-type IRES element, which on the other hand is fully
resistant to temperature shift-up. This appears to be the first purely
computational design of functional RNA thermoswitches, and certainly the first
purely computational design of functional thermo-IRES elements. Availability:
RNAiFold2T is publicly available as as part of the new re- lease RNAiFold3.0 at
https://github.com/clotelab/RNAiFold and http:
//bioinformatics.bc.edu/clotelab/RNAiFold, which latter has a web server as
well. The software is written in C++ and uses OR-Tools CP search engine.Comment: 24 pages, 5 figures, Intelligent Systems for Molecular Biology (ISMB
2016), to appear in journal Bioinformatics 201
Generative Tertiary Structure-based RNA Design
Learning from 3D biological macromolecules with artificial intelligence
technologies has been an emerging area. Computational protein design, known as
the inverse of protein structure prediction, aims to generate protein sequences
that will fold into the defined structure. Analogous to protein design, RNA
design is also an important topic in synthetic biology, which aims to generate
RNA sequences by given structures. However, existing RNA design methods mainly
focus on the secondary structure, ignoring the informative tertiary structure,
which is commonly used in protein design. To explore the complex coupling
between RNA sequence and 3D structure, we introduce an RNA tertiary structure
modeling method to efficiently capture useful information from the 3D structure
of RNA. For a fair comparison, we collect abundant RNA data and split the data
according to tertiary structures. With the standard dataset, we conduct a
benchmark by employing structure-based protein design approaches with our RNA
tertiary structure modeling method. We believe our work will stimulate the
future development of tertiary structure-based RNA design and bridge the gap
between the RNA 3D structures and sequences
RNAiFold 2.0: a web server and software to design custom and Rfam-based RNA molecules.
Several algorithms for RNA inverse folding have been used to design synthetic riboswitches, ribozymes and thermoswitches, whose activity has been experimentally validated. The RNAiFold software is unique among approaches for inverse folding in that (exhaustive) constraint programming is used instead of heuristic methods. For that reason, RNAiFold can generate all sequences that fold into the target structure or determine that there is no solution. RNAiFold 2.0 is a complete overhaul of RNAiFold 1.0, rewritten from the now defunct COMET language to C++. The new code properly extends the capabilities of its predecessor by providing a user-friendly pipeline to design synthetic constructs having the functionality of given Rfam families. In addition, the new software supports amino acid constraints, even for proteins translated in different reading frames from overlapping coding sequences; moreover, structure compatibility/incompatibility constraints have been expanded. With these features, RNAiFold 2.0 allows the user to design single RNA molecules as well as hybridization complexes of two RNA molecules.National Science Foundation [DBI-1262439]. Funding for open access charge: National Science Foundation./nConflict of interest statement. None declared
RNAiFold 2.0: a web server and software to design custom and Rfam-based RNA molecules.
Several algorithms for RNA inverse folding have been used to design synthetic riboswitches, ribozymes and thermoswitches, whose activity has been experimentally validated. The RNAiFold software is unique among approaches for inverse folding in that (exhaustive) constraint programming is used instead of heuristic methods. For that reason, RNAiFold can generate all sequences that fold into the target structure or determine that there is no solution. RNAiFold 2.0 is a complete overhaul of RNAiFold 1.0, rewritten from the now defunct COMET language to C++. The new code properly extends the capabilities of its predecessor by providing a user-friendly pipeline to design synthetic constructs having the functionality of given Rfam families. In addition, the new software supports amino acid constraints, even for proteins translated in different reading frames from overlapping coding sequences; moreover, structure compatibility/incompatibility constraints have been expanded. With these features, RNAiFold 2.0 allows the user to design single RNA molecules as well as hybridization complexes of two RNA molecules.National Science Foundation [DBI-1262439]. Funding for open access charge: National Science Foundation./nConflict of interest statement. None declared
Identification of functional RNA structures in sequence data
Thesis advisor: Michelle M. MeyerThesis advisor: Peter CloteStructured RNAs have many biological functions ranging from catalysis of chemical reactions to gene regulation. Many of these homologous structured RNAs display most of their conservation at the secondary or tertiary structure level. As a result, strategies for natural structured RNA discovery rely heavily on identification of sequences sharing a common stable secondary structure. However, correctly identifying the functional elements of the structure continues to be challenging. In addition to studying natural RNAs, we improve our ability to distinguish functional elements by studying sequences derived from in vitro selection experiments to select structured RNAs that bind specific proteins. In this thesis, we seek to improve methods for distinguishing functional RNA structures from arbitrarily predicted structures in sequencing data. To do so, we developed novel algorithms that prioritize the structural properties of the RNA that are under selection. In order to identify natural structured ncRNAs, we bring concepts from evolutionary biology to bear on the de novo RNA discovery process. Since there is selective pressure to maintain the structure, we apply molecular evolution concepts such as neutrality to identify functional RNA structures. We hypothesize that alignments corresponding to structured RNAs should consist of neutral sequences. During the course of this work, we developed a novel measure of neutrality, the structure ensemble neutrality (SEN), which calculates neutrality by averaging the magnitude of structure retained over all single point mutations to a given sequence. In order to analyze in vitro selection data for RNA-protein binding motifs, we developed a novel framework that identifies enriched substructures in the sequence pool. Our method accounts for both sequence and structure components by abstracting the overall secondary structure into smaller substructures composed of a single base-pair stack. Unlike many current tools, our algorithm is designed to deal with the large data sets coming from high-throughput sequencing. In conclusion, our algorithms have similar performance to existing programs. However, unlike previous methods, our algorithms are designed to leverage the evolutionary selective pressures in order to emphasize functional structure conservation.Thesis (PhD) — Boston College, 2016.Submitted to: Boston College. Graduate School of Arts and Sciences.Discipline: Biology