27 research outputs found
Fast Kinetic Monte Carlo Simulations: Implementation, Application, and Analysis.
This work presents a multi-component kinetic Monte Carlo (KMC) model and its
applications to three example systems: Ga droplet epitaxy, nanowires grown by
the Vapor-Liquid-Solid (VLS) method, and sintering of porous granular material.
The first two systems are examples of liquid mediated growth. We detail how the
liquid phase is modeled. A caching technique is proposed to eliminate redundant
calculations, leading to performance gains. Underlying the cache is a hash
table, indexed by neighborhood patterns of an atom configuration. We present
numerical evidence that such neighborhood patterns are redundant within and
between configurations, justifying the caching procedure. A simulated annealing
search for optimal, system-specific hash functions is performed.
Simulation results and analysis of droplet epitaxy are then described. We detail
the calibration of model parameters, exhibiting a good agreement with
homoepitaxial thin film experiments. Droplet epitaxy simulations capture a
variety of nanostrutures seen in experiments, ranging from compact dots to
nanorings. The correct trends in growth conditions are also captured, resulting
in a phase diagram consistent with what is seen experimentally. Core-shell
structures are also simulated. We present simulations to suggest the existence
of two mechanisms behind the their formation: nucleation at the vapor-liquid
interface and an instability at the vapor-solid interface. An analytical model
is developed and isolates the relevant processes behind the formation of the
phenomena seen throughout the simulations and in experiments.
In the VLS nanowire simulations, we present how the catalyzed role of the liquid
phase is incorporated into the model and perform an energy parameter study. We
exhibit the role of the catalyzed reaction rate and its contribution to growth
leading to features such as tapering. The mobility along the liquid-solid
interface is also studied. We show how this affects nanowire growth direction
and kinking. In the sintering simulations, we present the KMC model in contrast
with previous simulation work. A similar parameter study is then performed by
studying the effect of parameters on coarsening statistics. Grain statistics are
measured as a function of time and captures a power-law behavior for the grain
radius. Critical behavior with respect to certain parameters is also presented.PhDApplied and Interdisciplinary MathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99949/1/kgre_1.pd
A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model
We present a sparse knowledge gradient (SpKG) algorithm for adaptively
selecting the targeted regions within a large RNA molecule to identify which
regions are most amenable to interactions with other molecules. Experimentally,
such regions can be inferred from fluorescence measurements obtained by binding
a complementary probe with fluorescence markers to the targeted regions. We use
a biophysical model which shows that the fluorescence ratio under the log scale
has a sparse linear relationship with the coefficients describing the
accessibility of each nucleotide, since not all sites are accessible (due to
the folding of the molecule). The SpKG algorithm uniquely combines the Bayesian
ranking and selection problem with the frequentist regularized
regression approach Lasso. We use this algorithm to identify the sparsity
pattern of the linear model as well as sequentially decide the best regions to
test before experimental budget is exhausted. Besides, we also develop two
other new algorithms: batch SpKG algorithm, which generates more suggestions
sequentially to run parallel experiments; and batch SpKG with a procedure which
we call length mutagenesis. It dynamically adds in new alternatives, in the
form of types of probes, are created by inserting, deleting or mutating
nucleotides within existing probes. In simulation, we demonstrate these
algorithms on the Group I intron (a mid-size RNA molecule), showing that they
efficiently learn the correct sparsity pattern, identify the most accessible
region, and outperform several other policies
A Rigorous Uncertainty-Aware Quantification Framework Is Essential for Reproducible and Replicable Machine Learning Workflows
The ability to replicate predictions by machine learning (ML) or artificial
intelligence (AI) models and results in scientific workflows that incorporate
such ML/AI predictions is driven by numerous factors. An uncertainty-aware
metric that can quantitatively assess the reproducibility of quantities of
interest (QoI) would contribute to the trustworthiness of results obtained from
scientific workflows involving ML/AI models. In this article, we discuss how
uncertainty quantification (UQ) in a Bayesian paradigm can provide a general
and rigorous framework for quantifying reproducibility for complex scientific
workflows. Such as framework has the potential to fill a critical gap that
currently exists in ML/AI for scientific workflows, as it will enable
researchers to determine the impact of ML/AI model prediction variability on
the predictive outcomes of ML/AI-powered workflows. We expect that the
envisioned framework will contribute to the design of more reproducible and
trustworthy workflows for diverse scientific applications, and ultimately,
accelerate scientific discoveries
Identifying Bayesian Optimal Experiments for Uncertain Biochemical Pathway Models
Pharmacodynamic (PD) models are mathematical models of cellular reaction
networks that include drug mechanisms of action. These models are useful for
studying predictive therapeutic outcomes of novel drug therapies in silico.
However, PD models are known to possess significant uncertainty with respect to
constituent parameter data, leading to uncertainty in the model predictions.
Furthermore, experimental data to calibrate these models is often limited or
unavailable for novel pathways. In this study, we present a Bayesian optimal
experimental design approach for improving PD model prediction accuracy. We
then apply our method using simulated experimental data to account for
uncertainty in hypothetical laboratory measurements. This leads to a
probabilistic prediction of drug performance and a quantitative measure of
which prospective laboratory experiment will optimally reduce prediction
uncertainty in the PD model. The methods proposed here provide a way forward
for uncertainty quantification and guided experimental design for models of
novel biological pathways
A Bayesian experimental autonomous researcher for mechanical design
While additive manufacturing (AM) has facilitated the production of complex structures, it has also highlighted the immense challenge inherent in identifying the optimum AM structure for a given application. Numerical methods are important tools for optimization, but experiment remains the gold standard for studying nonlinear, but critical, mechanical properties such as toughness. To address the vastness of AM design space and the need for experiment, we develop a Bayesian experimental autonomous researcher (BEAR) that combines Bayesian optimization and high-throughput automated experimentation. In addition to rapidly performing experiments, the BEAR leverages iterative experimentation by selecting experiments based on all available results. Using the BEAR, we explore the toughness of a parametric family of structures and observe an almost 60-fold reduction in the number of experiments needed to identify high-performing structures relative to a grid-based search. These results show the value of machine learning in experimental fields where data are sparse.Published versio
Recommended from our members
Mathematical nuances of Gaussian process-driven autonomous experimentation
The fields of machine learning (ML) and artificial intelligence (AI) have transformed almost every aspect of science and engineering. The excitement for AI/ML methods is in large part due to their perceived novelty, as compared to traditional methods of statistics, computation, and applied mathematics. But clearly, all methods in ML have their foundations in mathematical theories, such as function approximation, uncertainty quantification, and function optimization. Autonomous experimentation is no exception; it is often formulated as a chain of off-the-shelf tools, organized in a closed loop, without emphasis on the intricacies of each algorithm involved. The uncomfortable truth is that the success of any ML endeavor, and this includes autonomous experimentation, strongly depends on the sophistication of the underlying mathematical methods and software that have to allow for enough flexibility to consider functions that are in agreement with particular physical theories. We have observed that standard off-the-shelf tools, used by many in the applied ML community, often hide the underlying complexities and therefore perform poorly. In this paper, we want to give a perspective on the intricate connections between mathematics and ML, with a focus on Gaussian process-driven autonomous experimentation. Although the Gaussian process is a powerful mathematical concept, it has to be implemented and customized correctly for optimal performance. We present several simple toy problems to explore these nuances and highlight the importance of mathematical and statistical rigor in autonomous experimentation and ML. One key takeaway is that ML is not, as many had hoped, a set of agnostic plug-and-play solvers for everyday scientific problems, but instead needs expertise and mastery to be applied successfully. Graphical abstract: [Figure not available: see fulltext.
Recommended from our members
Exact Gaussian processes for massive datasets via non-stationary sparsity-discovering kernels
A Gaussian Process (GP) is a prominent mathematical framework for stochastic function approximation in science and engineering applications. Its success is largely attributed to the GP's analytical tractability, robustness, and natural inclusion of uncertainty quantification. Unfortunately, the use of exact GPs is prohibitively expensive for large datasets due to their unfavorable numerical complexity of [Formula: see text] in computation and [Formula: see text] in storage. All existing methods addressing this issue utilize some form of approximation-usually considering subsets of the full dataset or finding representative pseudo-points that render the covariance matrix well-structured and sparse. These approximate methods can lead to inaccuracies in function approximations and often limit the user's flexibility in designing expressive kernels. Instead of inducing sparsity via data-point geometry and structure, we propose to take advantage of naturally-occurring sparsity by allowing the kernel to discover-instead of induce-sparse structure. The premise of this paper is that the data sets and physical processes modeled by GPs often exhibit natural or implicit sparsities, but commonly-used kernels do not allow us to exploit such sparsity. The core concept of exact, and at the same time sparse GPs relies on kernel definitions that provide enough flexibility to learn and encode not only non-zero but also zero covariances. This principle of ultra-flexible, compactly-supported, and non-stationary kernels, combined with HPC and constrained optimization, lets us scale exact GPs well beyond 5 million data points
Recommended from our members
Optimal Learning in Experimental Design Using the Knowledge Gradient Policy with Application to Characterizing Nanoemulsion Stability
We present a technique for adaptively choosing a sequence of experiments for materials design and optimization. Specifically, we consider the problem of identifying the choice of experimental control variables that optimize the kinetic stability of a nanoemulsion, which we formulate as a ranking and selection problem. We introduce an optimization algorithm called the knowledge gradient with discrete priors (KGDP) that sequentially and adaptively selects experiments and that maximizes the rate of learning the optimal control variables. This is done through a combination of a physical, kinetic model of nanoemulsion stability, Bayesian inference, and a decision policy. Prior knowledge from domain experts is incorporated into the algorithm as well. Through numerical experiments, we show that the KGDP algorithm outperforms the policies of both random exploration (in which an experiment is selected uniformly at random among all potential experiments) and exploitation (which selects the experiment that appears to be the best, given the current state of Bayesian knowledge)