7 research outputs found
Employing machine learning for reliable miRNA target identification in plants
<p>Abstract</p> <p>Background</p> <p>miRNAs are ~21 nucleotide long small noncoding RNA molecules, formed endogenously in most of the eukaryotes, which mainly control their target genes post transcriptionally by interacting and silencing them. While a lot of tools has been developed for animal miRNA target system, plant miRNA target identification system has witnessed limited development. Most of them have been centered around exact complementarity match. Very few of them considered other factors like multiple target sites and role of flanking regions.</p> <p>Result</p> <p>In the present work, a Support Vector Regression (SVR) approach has been implemented for plant miRNA target identification, utilizing position specific dinucleotide density variation information around the target sites, to yield highly reliable result. It has been named as p-TAREF (plant-Target Refiner). Performance comparison for p-TAREF was done with other prediction tools for plants with utmost rigor and where p-TAREF was found better performing in several aspects. Further, p-TAREF was run over the experimentally validated miRNA targets from species like <it>Arabidopsis</it>, <it>Medicago</it>, Rice and Tomato, and detected them accurately, suggesting gross usability of p-TAREF for plant species. Using p-TAREF, target identification was done for the complete Rice transcriptome, supported by expression and degradome based data. miR156 was found as an important component of the Rice regulatory system, where control of genes associated with growth and transcription looked predominant. The entire methodology has been implemented in a multi-threaded parallel architecture in Java, to enable fast processing for web-server version as well as standalone version. This also makes it to run even on a simple desktop computer in concurrent mode. It also provides a facility to gather experimental support for predictions made, through on the spot expression data analysis, in its web-server version.</p> <p>Conclusion</p> <p>A machine learning multivariate feature tool has been implemented in parallel and locally installable form, for plant miRNA target identification. The performance was assessed and compared through comprehensive testing and benchmarking, suggesting a reliable performance and gross usability for transcriptome wide plant miRNA target identification.</p
Recommended from our members
Synthetic biology of polyketide synthases.
Complex reduced polyketides represent the largest class of natural products that have applications in medicine, agriculture, and animal health. This structurally diverse class of compounds shares a common methodology of biosynthesis employing modular enzyme systems called polyketide synthases (PKSs). The modules are composed of enzymatic domains that share sequence and functional similarity across all known PKSs. We have used the nomenclature of synthetic biology to classify the enzymatic domains and modules as parts and devices, respectively, and have generated detailed lists of both. In addition, we describe the chassis (hosts) that are used to assemble, express, and engineer the parts and devices to produce polyketides. We describe a recently developed software tool to design PKS system and provide an example of its use. Finally, we provide perspectives of what needs to be accomplished to fully realize the potential that synthetic biology approaches bring to this class of molecules
Constraining Genome-Scale Models to Represent the Bow Tie Structure of Metabolism for 13C Metabolic Flux Analysis.
Determination of internal metabolic fluxes is crucial for fundamental and applied biology because they map how carbon and electrons flow through metabolism to enable cell function. 13 C Metabolic Flux Analysis ( 13 C MFA) and Two-Scale 13 C Metabolic Flux Analysis (2S- 13 C MFA) are two techniques used to determine such fluxes. Both operate on the simplifying approximation that metabolic flux from peripheral metabolism into central "core" carbon metabolism is minimal, and can be omitted when modeling isotopic labeling in core metabolism. The validity of this "two-scale" or "bow tie" approximation is supported both by the ability to accurately model experimental isotopic labeling data, and by experimentally verified metabolic engineering predictions using these methods. However, the boundaries of core metabolism that satisfy this approximation can vary across species, and across cell culture conditions. Here, we present a set of algorithms that (1) systematically calculate flux bounds for any specified "core" of a genome-scale model so as to satisfy the bow tie approximation and (2) automatically identify an updated set of core reactions that can satisfy this approximation more efficiently. First, we leverage linear programming to simultaneously identify the lowest fluxes from peripheral metabolism into core metabolism compatible with the observed growth rate and extracellular metabolite exchange fluxes. Second, we use Simulated Annealing to identify an updated set of core reactions that allow for a minimum of fluxes into core metabolism to satisfy these experimental constraints. Together, these methods accelerate and automate the identification of a biologically reasonable set of core reactions for use with 13 C MFA or 2S- 13 C MFA, as well as provide for a substantially lower set of flux bounds for fluxes into the core as compared with previous methods. We provide an open source Python implementation of these algorithms at https://github.com/JBEI/limitfluxtocore
Recommended from our members
ClusterCAD 2.0: an updated computational platform for chimeric type I polyketide synthase and nonribosomal peptide synthetase design
Megasynthase enzymes such as type I modular polyketide synthases (PKSs) and nonribosomal peptide synthetases (NRPSs) play a central role in microbial chemical warfare because they can evolve rapidly by shuffling parts (catalytic domains) to produce novel chemicals. If we can understand the design rules to reshuffle these parts, PKSs and NRPSs will provide a systematic and modular way to synthesize millions of molecules including pharmaceuticals, biomaterials, and biofuels. However, PKS and NRPS engineering remains difficult due to a limited understanding of the determinants of PKS and NRPS fold and function. We developed ClusterCAD to streamline and simplify the process of designing and testing engineered PKS variants. Here, we present the highly improved ClusterCAD 2.0 release, available at https://clustercad.jbei.org. ClusterCAD 2.0 boasts support for PKS-NRPS hybrid and NRPS clusters in addition to PKS clusters; a vastly enlarged database of curated PKS, PKS-NRPS hybrid, and NRPS clusters; a diverse set of chemical 'starters' and loading modules; the new Domain Architecture Cluster Search Tool; and an offline Jupyter Notebook workspace, among other improvements. Together these features massively expand the chemical space that can be accessed by enzymes engineered with ClusterCAD
Recommended from our members
The JBEI quantitative metabolic modeling library (jQMM): a python library for modeling microbial metabolism.
BackgroundModeling of microbial metabolism is a topic of growing importance in biotechnology. Mathematical modeling helps provide a mechanistic understanding for the studied process, separating the main drivers from the circumstantial ones, bounding the outcomes of experiments and guiding engineering approaches. Among different modeling schemes, the quantification of intracellular metabolic fluxes (i.e. the rate of each reaction in cellular metabolism) is of particular interest for metabolic engineering because it describes how carbon and energy flow throughout the cell. In addition to flux analysis, new methods for the effective use of the ever more readily available and abundant -omics data (i.e. transcriptomics, proteomics and metabolomics) are urgently needed.ResultsThe jQMM library presented here provides an open-source, Python-based framework for modeling internal metabolic fluxes and leveraging other -omics data for the scientific study of cellular metabolism and bioengineering purposes. Firstly, it presents a complete toolbox for simultaneously performing two different types of flux analysis that are typically disjoint: Flux Balance Analysis and 13C Metabolic Flux Analysis. Moreover, it introduces the capability to use 13C labeling experimental data to constrain comprehensive genome-scale models through a technique called two-scale 13C Metabolic Flux Analysis (2S-13C MFA). In addition, the library includes a demonstration of a method that uses proteomics data to produce actionable insights to increase biofuel production. Finally, the use of the jQMM library is illustrated through the addition of several Jupyter notebook demonstration files that enhance reproducibility and provide the capability to be adapted to the user's specific needs.ConclusionsjQMM will facilitate the design and metabolic engineering of organisms for biofuels and other chemicals, as well as investigations of cellular metabolism and leveraging -omics data. As an open source software project, we hope it will attract additions from the community and grow with the rapidly changing field of metabolic engineering
Recommended from our members
Structural insights into dehydratase substrate selection for the borrelidin and fluvirucin polyketide synthases.
Engineered polyketide synthases (PKSs) are promising synthetic biology platforms for the production of chemicals with diverse applications. The dehydratase (DH) domain within modular type I PKSs generates an α,ÎČ-unsaturated bond in nascent polyketide intermediates through a dehydration reaction. Several crystal structures of DH domains have been solved, providing important structural insights into substrate selection and dehydration. Here, we present two DH domain structures from two chemically diverse PKSs. The first DH domain, isolated from the third module in the borrelidin PKS, is specific towards a trans-cyclopentane-carboxylate-containing polyketide substrate. The second DH domain, isolated from the first module in the fluvirucin B1 PKS, accepts an amide-containing polyketide intermediate. Sequence-structure analysis of these domains, in addition to previously published DH structures, display many significant similarities and key differences pertaining to substrate selection. The two major differences between BorA DH M3, FluA DH M1 and other DH domains are found in regions of unmodeled residues or residues containing high B-factors. These two regions are located between α3-ÎČ11 and ÎČ7-α2. From the catalytic Asp located in α3 to a conserved Pro in ÎČ11, the residues between them form part of the bottom of the substrate-binding cavity responsible for binding to acyl-ACP intermediates