1,985 research outputs found
Bayesian optimization for materials design
We introduce Bayesian optimization, a technique developed for optimizing
time-consuming engineering simulations and for fitting machine learning models
on large datasets. Bayesian optimization guides the choice of experiments
during materials design and discovery to find good material designs in as few
experiments as possible. We focus on the case when materials designs are
parameterized by a low-dimensional vector. Bayesian optimization is built on a
statistical technique called Gaussian process regression, which allows
predicting the performance of a new design based on previously tested designs.
After providing a detailed introduction to Gaussian process regression, we
introduce two Bayesian optimization methods: expected improvement, for design
problems with noise-free evaluations; and the knowledge-gradient method, which
generalizes expected improvement and may be used in design problems with noisy
evaluations. Both methods are derived using a value-of-information analysis,
and enjoy one-step Bayes-optimality
A User-Guided Bayesian Framework for Ensemble Feature Selection in Life Science Applications (UBayFS)
Feature selection represents a measure to reduce the complexity of
high-dimensional datasets and gain insights into the systematic variation in
the data. This aspect is of specific importance in domains that rely on model
interpretability, such as life sciences. We propose UBayFS, an ensemble feature
selection technique embedded in a Bayesian statistical framework. Our approach
considers two sources of information: data and domain knowledge. We build a
meta-model from an ensemble of elementary feature selectors and aggregate this
information in a multinomial likelihood. The user guides UBayFS by weighting
features and penalizing specific feature blocks or combinations, implemented
via a Dirichlet-type prior distribution and a regularization term. In a
quantitative evaluation, we demonstrate that our framework (a) allows for a
balanced trade-off between user knowledge and data observations, and (b)
achieves competitive performance with state-of-the-art methods
- …