71 research outputs found
Free energy estimation of short DNA duplex hybridizations
<p>Abstract</p> <p>Background</p> <p>Estimation of DNA duplex hybridization free energy is widely used for predicting cross-hybridizations in DNA computing and microarray experiments. A number of software programs based on different methods and parametrizations are available for the theoretical estimation of duplex free energies. However, significant differences in free energy values are sometimes observed among estimations obtained with various methods, thus being difficult to decide what value is the accurate one.</p> <p>Results</p> <p>We present in this study a quantitative comparison of the similarities and differences among four published DNA/DNA duplex free energy calculation methods and an extended Nearest-Neighbour Model for perfect matches based on triplet interactions. The comparison was performed on a benchmark data set with 695 pairs of short oligos that we collected and manually curated from 29 publications. Sequence lengths range from 4 to 30 nucleotides and span a large GC-content percentage range. For perfect matches, we propose an extension of the Nearest-Neighbour Model that matches or exceeds the performance of the existing ones, both in terms of correlations and root mean squared errors. The proposed model was trained on experimental data with temperature, sodium and sequence concentration characteristics that span a wide range of values, thus conferring the model a higher power of generalization when used for free energy estimations of DNA duplexes under non-standard experimental conditions.</p> <p>Conclusions</p> <p>Based on our preliminary results, we conclude that no statistically significant differences exist among free energy approximations obtained with 4 publicly available and widely used programs, when benchmarked against a collection of 695 pairs of short oligos collected and curated by the authors of this work based on 29 publications. The extended Nearest-Neighbour Model based on triplet interactions presented in this work is capable of performing accurate estimations of free energies for perfect match duplexes under both standard and non-standard experimental conditions and may serve as a baseline for further developments in this area of research.</p
Prediction of RNA secondary structure by maximizing pseudo-expected accuracy
<p>Abstract</p> <p>Background</p> <p>Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; therefore, a new type of estimator is proposed including the maximum expected accuracy (MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, however, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence.</p> <p>Results</p> <p>Instead of using the expected values of the popular accuracy measures for RNA secondary structure prediction, which is difficult to be calculated, the <it>pseudo</it>-expected accuracy, which can easily be computed from base-pairing probabilities, is introduced. It is shown that the pseudo-expected accuracy is a good approximation in terms of sensitivity, PPV, MCC, or F-score. The pseudo-expected accuracy can be approximately maximized for each RNA sequence by stochastic sampling. It is also shown that well-balanced secondary structures between sensitivity and PPV can be predicted with a small computational overhead by combining the pseudo-expected accuracy of MCC or F-score with the γ-centroid estimator.</p> <p>Conclusions</p> <p>This study gives not only a method for predicting the secondary structure that balances between sensitivity and PPV, but also a general method for approximately maximizing the (pseudo-)expected accuracy with respect to various evaluation measures including MCC and F-score.</p
Employing machine learning for reliable miRNA target identification in plants
<p>Abstract</p> <p>Background</p> <p>miRNAs are ~21 nucleotide long small noncoding RNA molecules, formed endogenously in most of the eukaryotes, which mainly control their target genes post transcriptionally by interacting and silencing them. While a lot of tools has been developed for animal miRNA target system, plant miRNA target identification system has witnessed limited development. Most of them have been centered around exact complementarity match. Very few of them considered other factors like multiple target sites and role of flanking regions.</p> <p>Result</p> <p>In the present work, a Support Vector Regression (SVR) approach has been implemented for plant miRNA target identification, utilizing position specific dinucleotide density variation information around the target sites, to yield highly reliable result. It has been named as p-TAREF (plant-Target Refiner). Performance comparison for p-TAREF was done with other prediction tools for plants with utmost rigor and where p-TAREF was found better performing in several aspects. Further, p-TAREF was run over the experimentally validated miRNA targets from species like <it>Arabidopsis</it>, <it>Medicago</it>, Rice and Tomato, and detected them accurately, suggesting gross usability of p-TAREF for plant species. Using p-TAREF, target identification was done for the complete Rice transcriptome, supported by expression and degradome based data. miR156 was found as an important component of the Rice regulatory system, where control of genes associated with growth and transcription looked predominant. The entire methodology has been implemented in a multi-threaded parallel architecture in Java, to enable fast processing for web-server version as well as standalone version. This also makes it to run even on a simple desktop computer in concurrent mode. It also provides a facility to gather experimental support for predictions made, through on the spot expression data analysis, in its web-server version.</p> <p>Conclusion</p> <p>A machine learning multivariate feature tool has been implemented in parallel and locally installable form, for plant miRNA target identification. The performance was assessed and compared through comprehensive testing and benchmarking, suggesting a reliable performance and gross usability for transcriptome wide plant miRNA target identification.</p
An efficient algorithm for the stochastic simulation of the hybridization of DNA to microarrays
<p>Abstract</p> <p>Background</p> <p>Although oligonucleotide microarray technology is ubiquitous in genomic research, reproducibility and standardization of expression measurements still concern many researchers. Cross-hybridization between microarray probes and non-target ssDNA has been implicated as a primary factor in sensitivity and selectivity loss. Since hybridization is a chemical process, it may be modeled at a population-level using a combination of material balance equations and thermodynamics. However, the hybridization reaction network may be exceptionally large for commercial arrays, which often possess at least one reporter per transcript. Quantification of the kinetics and equilibrium of exceptionally large chemical systems of this type is numerically infeasible with customary approaches.</p> <p>Results</p> <p>In this paper, we present a robust and computationally efficient algorithm for the simulation of hybridization processes underlying microarray assays. Our method may be utilized to identify the extent to which nucleic acid targets (e.g. cDNA) will cross-hybridize with probes, and by extension, characterize probe robustnessusing the information specified by MAGE-TAB. Using this algorithm, we characterize cross-hybridization in a modified commercial microarray assay.</p> <p>Conclusions</p> <p>By integrating stochastic simulation with thermodynamic prediction tools for DNA hybridization, one may robustly and rapidly characterize of the selectivity of a proposed microarray design at the probe and "system" levels. Our code is available at <url>http://www.laurenzi.net</url>.</p
Accurate classification of RNA structures using topological fingerprints
While RNAs are well known to possess complex structures, functionally similar RNAs often have little sequence similarity. While the exact size and spacing of base-paired regions vary, functionally similar RNAs have pronounced similarity in the arrangement, or topology, of base-paired stems. Furthermore, predicted RNA structures often lack pseudoknots (a crucial aspect of biological activity), and are only partially correct, or incomplete. A topological approach addresses all of these difficulties. In this work we describe each RNA structure as a graph that can be converted to a topological spectrum (RNA fingerprint). The set of subgraphs in an RNA structure, its RNA fingerprint, can be compared with the fingerprints of other RNA structures to identify and correctly classify functionally related RNAs. Topologically similar RNAs can be identified even when a large fraction, up to 30%, of the stems are omitted, indicating that highly accurate structures are not necessary. We investigate the performance of the RNA fingerprint approach on a set of eight highly curated RNA families, with diverse sizes and functions, containing pseudoknots, and with little sequence similarity–an especially difficult test set. In spite of the difficult test set, the RNA fingerprint approach is very successful (ROC AUC \u3e 0.95). Due to the inclusion of pseudoknots, the RNA fingerprint approach both covers a wider range of possible structures than methods based only on secondary structure, and its tolerance for incomplete structures suggests that it can be applied even to predicted structures. Source code is freely available at https://github.rcac.purdue.edu/mgribsko/XIOS_RNA_fingerprint
Full design automation of multi-state RNA devices to program gene expression using energy-based optimization
[EN] Small RNAs (sRNAs) can operate as regulatory agents to control protein expression by interaction with the 59 untranslated
region of the mRNA. We have developed a physicochemical framework, relying on base pair interaction energies, to design
multi-state sRNA devices by solving an optimization problem with an objective function accounting for the stability of the
transition and final intermolecular states. Contrary to the analysis of the reaction kinetics of an ensemble of sRNAs, we solve
the inverse problem of finding sequences satisfying targeted reactions. We show here that our objective function correlates
well with measured riboregulatory activity of a set of mutants. This has enabled the application of the methodology for an
extended design of RNA devices with specified behavior, assuming different molecular interaction models based on
Watson-Crick interaction. We designed several YES, NOT, AND, and OR logic gates, including the design of combinatorial
riboregulators. In sum, our de novo approach provides a new paradigm in synthetic biology to design molecular interaction
mechanisms facilitating future high-throughput functional sRNA design.Work supported by the grants FP7-ICT-043338 (BACTOCOM) to AJ, and BIO2011-26741 (Ministerio de Economia y Competitividad, Spain) to JAD. GR is supported by an EMBO long-term fellowship co-funded by Marie Curie actions (ALTF-1177-2011), and TEL by a PhD fellowship from the AXA Research Fund. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Rodrigo Tarrega, G.; Landrain, TE.; Majer, E.; Daros Arnau, JA.; Jaramillo, A. (2013). Full design automation of multi-state RNA devices to program gene expression using energy-based optimization. PLoS Computational Biology. 9(8):1003172-1003172. https://doi.org/10.1371/journal.pcbi.1003172S1003172100317298Isaacs, F. J., Dwyer, D. J., & Collins, J. J. (2006). RNA synthetic biology. Nature Biotechnology, 24(5), 545-554. doi:10.1038/nbt1208Isaacs, F. J., Dwyer, D. J., Ding, C., Pervouchine, D. D., Cantor, C. R., & Collins, J. J. (2004). Engineered riboregulators enable post-transcriptional control of gene expression. Nature Biotechnology, 22(7), 841-847. doi:10.1038/nbt986Lucks, J. B., Qi, L., Mutalik, V. K., Wang, D., & Arkin, A. P. (2011). Versatile RNA-sensing transcriptional regulators for engineering genetic networks. Proceedings of the National Academy of Sciences, 108(21), 8617-8622. doi:10.1073/pnas.1015741108Mutalik, V. K., Qi, L., Guimaraes, J. C., Lucks, J. B., & Arkin, A. P. (2012). Rationally designed families of orthogonal RNA regulators of translation. Nature Chemical Biology, 8(5), 447-454. doi:10.1038/nchembio.919Bayer, T. S., & Smolke, C. D. (2005). Programmable ligand-controlled riboregulators of eukaryotic gene expression. Nature Biotechnology, 23(3), 337-343. doi:10.1038/nbt1069Nakashima, N., & Tamura, T. (2009). Conditional gene silencing of multiple genes with antisense RNAs and generation of a mutator strain of Escherichia coli. Nucleic Acids Research, 37(15), e103-e103. doi:10.1093/nar/gkp498Callura, J. M., Cantor, C. R., & Collins, J. J. (2012). Genetic switchboard for synthetic biology applications. Proceedings of the National Academy of Sciences, 109(15), 5850-5855. doi:10.1073/pnas.1203808109Beisel, C. L., Bayer, T. S., Hoff, K. G., & Smolke, C. D. (2008). Model‐guided design of ligand‐regulated RNAi for programmable control of gene expression. Molecular Systems Biology, 4(1), 224. doi:10.1038/msb.2008.62Qi, L., Lucks, J. B., Liu, C. C., Mutalik, V. K., & Arkin, A. P. (2012). Engineering naturally occurring trans -acting non-coding RNAs to sense molecular signals. Nucleic Acids Research, 40(12), 5775-5786. doi:10.1093/nar/gks168Carothers, J. M., Goler, J. A., Juminaga, D., & Keasling, J. D. (2011). Model-Driven Engineering of RNA Devices to Quantitatively Program Gene Expression. Science, 334(6063), 1716-1719. doi:10.1126/science.1212209Rodrigo, G., Landrain, T. E., & Jaramillo, A. (2012). De novo automated design of small RNA circuits for engineering synthetic riboregulation in living cells. Proceedings of the National Academy of Sciences, 109(38), 15271-15276. doi:10.1073/pnas.1203831109Brantl, S. (2002). Antisense-RNA regulation and RNA interference. Biochimica et Biophysica Acta (BBA) - Gene Structure and Expression, 1575(1-3), 15-25. doi:10.1016/s0167-4781(02)00280-4Majdalani, N., Vanderpool, C. K., & Gottesman, S. (2005). Bacterial Small RNA Regulators. Critical Reviews in Biochemistry and Molecular Biology, 40(2), 93-113. doi:10.1080/10409230590918702Selinger, D. W., Cheung, K. J., Mei, R., Johansson, E. M., Richmond, C. S., Blattner, F. R., … Church, G. M. (2000). RNA expression analysis using a 30 base pair resolution Escherichia coli genome array. Nature Biotechnology, 18(12), 1262-1268. doi:10.1038/82367Yelin, R., Dahary, D., Sorek, R., Levanon, E. Y., Goldstein, O., Shoshan, A., … Rotman, G. (2003). Widespread occurrence of antisense transcription in the human genome. Nature Biotechnology, 21(4), 379-386. doi:10.1038/nbt808Wang, X.-J., Gaasterland, T., & Chua, N.-H. (2005). Genome Biology, 6(4), R30. doi:10.1186/gb-2005-6-4-r30Stojanovic, M. N., & Stefanovic, D. (2003). A deoxyribozyme-based molecular automaton. Nature Biotechnology, 21(9), 1069-1074. doi:10.1038/nbt862Seelig, G., Soloveichik, D., Zhang, D. Y., & Winfree, E. (2006). Enzyme-Free Nucleic Acid Logic Circuits. Science, 314(5805), 1585-1588. doi:10.1126/science.1132493Yin, P., Choi, H. M. T., Calvert, C. R., & Pierce, N. A. (2008). Programming biomolecular self-assembly pathways. Nature, 451(7176), 318-322. doi:10.1038/nature06451Ran, T., Kaplan, S., & Shapiro, E. (2009). Molecular implementation of simple logic programs. Nature Nanotechnology, 4(10), 642-648. doi:10.1038/nnano.2009.203Penchovsky, R., & Breaker, R. R. (2005). Computational design and experimental validation of oligonucleotide-sensing allosteric ribozymes. Nature Biotechnology, 23(11), 1424-1433. doi:10.1038/nbt1155Salis, H. M., Mirsky, E. A., & Voigt, C. A. (2009). Automated design of synthetic ribosome binding sites to control protein expression. Nature Biotechnology, 27(10), 946-950. doi:10.1038/nbt.1568Laidler, K. J., & King, M. C. (1983). Development of transition-state theory. The Journal of Physical Chemistry, 87(15), 2657-2664. doi:10.1021/j100238a002Sosnick, T. R., & Pan, T. (2003). RNA folding: models and perspectives. Current Opinion in Structural Biology, 13(3), 309-316. doi:10.1016/s0959-440x(03)00066-6Yurke, B. (2003). Genetic Programming and Evolvable Machines, 4(2), 111-122. doi:10.1023/a:1023928811651Bandyra, K. J., Said, N., Pfeiffer, V., Górna, M. W., Vogel, J., & Luisi, B. F. (2012). The Seed Region of a Small RNA Drives the Controlled Destruction of the Target mRNA by the Endoribonuclease RNase E. Molecular Cell, 47(6), 943-953. doi:10.1016/j.molcel.2012.07.015Dawid, A., Cayrol, B., & Isambert, H. (2009). RNA synthetic biology inspired from bacteria: construction of transcription attenuators under antisense regulation. Physical Biology, 6(2), 025007. doi:10.1088/1478-3975/6/2/025007Lioliou, E., Romilly, C., Romby, P., & Fechter, P. (2010). RNA-mediated regulation in bacteria: from natural to artificial systems. New Biotechnology, 27(3), 222-235. doi:10.1016/j.nbt.2010.03.002Dirks, R. M., Bois, J. S., Schaeffer, J. M., Winfree, E., & Pierce, N. A. (2007). Thermodynamic Analysis of Interacting Nucleic Acid Strands. SIAM Review, 49(1), 65-88. doi:10.1137/060651100Das, R., Karanicolas, J., & Baker, D. (2010). Atomic accuracy in predicting and designing noncanonical RNA structure. Nature Methods, 7(4), 291-294. doi:10.1038/nmeth.1433Vogel, J., & Luisi, B. F. (2011). Hfq and its constellation of RNA. Nature Reviews Microbiology, 9(8), 578-589. doi:10.1038/nrmicro2615Friedland, A. E., Lu, T. K., Wang, X., Shi, D., Church, G., & Collins, J. J. (2009). Synthetic Gene Networks That Count. Science, 324(5931), 1199-1202. doi:10.1126/science.1172005Rodrigo, G., Carrera, J., Landrain, T. E., & Jaramillo, A. (2012). Perspectives on the automatic design of regulatory systems for synthetic biology. FEBS Letters, 586(15), 2037-2042. doi:10.1016/j.febslet.2012.02.031Chin, J. W. (2006). Modular approaches to expanding the functions of living matter. Nature Chemical Biology, 2(6), 304-311. doi:10.1038/nchembio789McCaskill, J. S. (1990). The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29(6-7), 1105-1119. doi:10.1002/bip.360290621Chitsaz, H., Salari, R., Sahinalp, S. C., & Backofen, R. (2009). A partition function algorithm for interacting nucleic acid strands. Bioinformatics, 25(12), i365-i373. doi:10.1093/bioinformatics/btp212Hofacker, I. L., Fontana, W., Stadler, P. F., Bonhoeffer, L. S., Tacker, M., & Schuster, P. (1994). Fast folding and comparison of RNA secondary structures. Monatshefte f�r Chemie Chemical Monthly, 125(2), 167-188. doi:10.1007/bf00818163Andronescu, M., Zhang, Z. C., & Condon, A. (2005). Secondary Structure Prediction of Interacting RNA Molecules. Journal of Molecular Biology, 345(5), 987-1001. doi:10.1016/j.jmb.2004.10.082Mathews, D. H., Sabina, J., Zuker, M., & Turner, D. H. (1999). Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. Journal of Molecular Biology, 288(5), 911-940. doi:10.1006/jmbi.1999.2700Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by Simulated Annealing. Science, 220(4598), 671-680. doi:10.1126/science.220.4598.67
- …