556 research outputs found
A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model
We present a sparse knowledge gradient (SpKG) algorithm for adaptively
selecting the targeted regions within a large RNA molecule to identify which
regions are most amenable to interactions with other molecules. Experimentally,
such regions can be inferred from fluorescence measurements obtained by binding
a complementary probe with fluorescence markers to the targeted regions. We use
a biophysical model which shows that the fluorescence ratio under the log scale
has a sparse linear relationship with the coefficients describing the
accessibility of each nucleotide, since not all sites are accessible (due to
the folding of the molecule). The SpKG algorithm uniquely combines the Bayesian
ranking and selection problem with the frequentist regularized
regression approach Lasso. We use this algorithm to identify the sparsity
pattern of the linear model as well as sequentially decide the best regions to
test before experimental budget is exhausted. Besides, we also develop two
other new algorithms: batch SpKG algorithm, which generates more suggestions
sequentially to run parallel experiments; and batch SpKG with a procedure which
we call length mutagenesis. It dynamically adds in new alternatives, in the
form of types of probes, are created by inserting, deleting or mutating
nucleotides within existing probes. In simulation, we demonstrate these
algorithms on the Group I intron (a mid-size RNA molecule), showing that they
efficiently learn the correct sparsity pattern, identify the most accessible
region, and outperform several other policies
Recommended from our members
Insights into RNA design from novel molecular tools
RNA, previously recognized merely as a messenger of genetic information, has been recently rediscovered as a versatile molecule with a central role in cellular regulation. These regulatory functions are enabled by its specific chemical makeup that allows it to fold into intricate and flexible structures. In stark contrast with DNA, RNA forms a variety of structural motifs that serve as efficient points of contact in molecular recognition. It is therefore clear, that dynamic RNA structures dictate the binding availability of interfaces that play important roles in molecular regulation inside living cells. As such, the need for tools that can accurately capture and predict RNA structure in vivo continues to be essential to understand RNA function. To this end, my dissertation focuses on the development of molecular tools to predict and characterize accessible RNA interfaces in their native environment. First, I established the usefulness of a fluorescence-based in vivo oligonucleotide hybridization approach to identify accessible interfaces by characterizing numerous RNA regions in several biologically relevant molecules in E. coli. I then described these RNA interactions using a biophysical model based on thermodynamic principles and incorporating large sets of data collected using this fluorescence-based system. This approach displayed improved prediction capabilities of RNA accessibility compared to un-optimized versions without incorporation of in vivo data. Finally, I detailed the development and application of a high throughput tool for the large-scale characterization of accessible interfaces within native RNAs in a single experiment. In this approach, in vivo oligonucleotide hybridization was coupled to transcriptional elongation control to allow analysis via next generation sequencing. This tool was used to obtain complete landscapes of functional structure for 72 regulatory molecules in a single experiment (>1000 regions). Altogether the results of this high throughput approach revealed a pattern indicating that RNA-RNA interaction sites are either highly accessible or highly protected, suggesting their binding status (e.g. actively bound or unbound). In addition, within bacterial small RNAs, our approached revealed the role of the global regulator Hfq as universal structural relaxer. The compendium of these tools provides a unique and fundamental perspective in the study of functional RNA structure, namely, the identification of dynamic structures. Furthermore, the information provided by these approaches significantly aids in the design of synthetic RNAs for a variety of purposes, including gene expression control.Chemical Engineerin
On Using Inductive Biases for Designing Deep Learning Architectures
Recent advancements in field of Artificial Intelligence, especially in the field of Deep Learning (DL), have paved way for new and improved solutions to complex problems occurring in almost all domains. Often we have some prior knowledge and beliefs of the underlying system of the problem at-hand which we want to capture in the corresponding deep learning architectures. Sometimes, it is not clear on how to include our prior beliefs into the traditionally recommended deep architectures like Recurrent neural networks, Convolutional neural networks, Variational Autoencoders and others. Often the post-hoc techniques of modifying these architectures are not straightforward and provide little performance gain.
There have been efforts on developing domain specific architectures but those techniques are generally not transferable to other domains. We ask the question that can we come up with generic and intuitive techniques to design deep learning architectures that takes our prior knowledge of the system as an inductive bias?
In this dissertation, we develop two novel approaches towards this end. The first one called `Cooperative Neural Networks' can incorporate the inductive bias from the underlying probabilistic graphical model representation of the domain. The second one called problem dependent `Unrolled Algorithms' parameterizes the recurrent structure of unrolling the iterations of an optimization algorithm for the objective function defining the problem. We found that the neural network architectures obtained from our approaches typically end up with very fewer learnable parameters and provide considerable improvement in run-time compared to other deep learning methods. We have successfully applied our techniques to solve Natural Language processing related tasks, doing sparse graph recovery and computational biology problems like doing gene regulatory network inference.
Firstly, we introduce the Cooperative Neural Networks approach which is a new theoretical approach for implementing learning systems that can exploit both prior insights about the independence structure of the problem domain and the universal approximation capability of the deep neural networks. Specifically, we develop CoNN-sLDA model for the document classification task. We use the popular Latent Dirichlet Allocation graphical model as the inductive bias for the CoNN-sLDA model. We demonstrate a 23% reduction in error on the challenging MultiSent data set compared to state-of-the-art and also derived ways to make the learned representations more interpretable.
Secondly, we elucidate the idea of using problem dependent `Unrolled Algorithms' for the sparse graph recovery task. We propose a deep learning architecture, GLAD, which uses an Alternating Minimization algorithm as our model inductive bias and learns the model parameters via supervised learning. We show that GLAD learns a very compact and effective model for recovering sparse graphs from data. We do an extensive theoretical analysis that strengthen our claims for using similar approaches for other problems as well.
Finally, we further build up on the proposed `Unrolled Algorithm' technique for a challenging real world computational biology problem. To this end, we design GRNUlar, a novel deep learning framework for supervised learning of gene regulatory networks (GRNs) from single cell RNA-Sequencing data. Our framework incorporates two intertwined models. We first leverage the expressive ability of neural networks to capture complex dependencies between transcription factors and the corresponding genes they regulate, by developing a multi-task learning framework. Then, in order to capture sparsity of GRNs observed in the real world, we design an unrolled algorithm technique for our framework. Our deep architecture requires supervision for training, for which we repurpose existing synthetic data simulators that generate scRNA-Seq data guided by an underlying GRN. Experimental results demonstrate GRNUlar outperforms state-of-the-art methods on both synthetic and real datasets. Our work also demonstrates the novel and successful use of expression data simulators for supervised learning of GRN inference.Ph.D
Women in Science 2016
Women in Science 2016 summarizes research done by Smith College’s Summer Research Fellowship (SURF) Program participants. Ever since its 1967 start, SURF has been a cornerstone of Smith’s science education. In 2016, 150 students participated in SURF (144 hosted on campus and nearby eld sites), supervised by 56 faculty mentor-advisors drawn from the Clark Science Center and connected to its eighteen science, mathematics, and engineering departments and programs and associated centers and units. At summer’s end, SURF participants were asked to summarize their research experiences for this publication.https://scholarworks.smith.edu/clark_womeninscience/1005/thumbnail.jp
Deep Learning And Uncertainty Quantification: Methodologies And Applications
Uncertainty quantification is a recent emerging interdisciplinary area that leverages the power of statistical methods, machine learning models, numerical methods and data-driven approach to provide reliable inference for quantities of interest in natural science and engineering problems. In practice, the sources of uncertainty come from different aspects such as: aleatoric uncertainty where the uncertainty comes from the observations or is due to the stochastic nature of the problem; epistemic uncertainty where the uncertainty comes from inaccurate mathematical models, computational methods or model parametrization. Cope with the above different types of uncertainty, a successful and scalable model for uncertainty quantification requires prior knowledge in the problem, careful design of mathematical models, cautious selection of computational tools, etc. The fast growth in deep learning, probabilistic methods and the large volume of data available across different research areas enable researchers to take advantage of these recent advances to propose novel methodologies to solve scientific problems where uncertainty quantification plays important roles. The objective of this dissertation is to address the existing gaps and propose new methodologies for uncertainty quantification with deep learning methods and demonstrate their power in engineering applications.
On the methodology side, we first present a generative adversarial framework to model aleatoric uncertainty in stochastic systems. Secondly, we leverage the proposed generative model with recent advances in physics-informed deep learning to learn the uncertainty propagation in solutions of partial differential equations. Thirdly, we introduce a simple and effective approach for posterior uncertainty quantification for learning nonlinear operators. Fourthly, we consider inverse problems of physical systems on identifying unknown forms and parameters in dynamical systems via observed noisy data.
On the application side, we first propose an importance sampling approach for sequential decision making. Second, we propose a physics-informed neural network method to quantify the epistemic uncertainty in cardiac activation mapping modeling and conduct active learning. Third, we present an anto-encoder based framework for data augmentation and generation for data that is expensive to obtain such as single-cell RNA sequencing
- …