708 research outputs found
K2-ABC: Approximate Bayesian Computation with Kernel Embeddings
Complicated generative models often result in a situation where computing the
likelihood of observed data is intractable, while simulating from the
conditional density given a parameter value is relatively easy. Approximate
Bayesian Computation (ABC) is a paradigm that enables simulation-based
posterior inference in such cases by measuring the similarity between simulated
and observed data in terms of a chosen set of summary statistics. However,
there is no general rule to construct sufficient summary statistics for complex
models. Insufficient summary statistics will "leak" information, which leads to
ABC algorithms yielding samples from an incorrect (partial) posterior. In this
paper, we propose a fully nonparametric ABC paradigm which circumvents the need
for manually selecting summary statistics. Our approach, K2-ABC, uses maximum
mean discrepancy (MMD) as a dissimilarity measure between the distributions
over observed and simulated data. MMD is easily estimated as the squared
difference between their empirical kernel embeddings. Experiments on a
simulated scenario and a real-world biological problem illustrate the
effectiveness of the proposed algorithm
An Efficient Doubly-Robust Test for the Kernel Treatment Effect
The average treatment effect, which is the difference in expectation of the
counterfactuals, is probably the most popular target effect in causal inference
with binary treatments. However, treatments may have effects beyond the mean,
for instance decreasing or increasing the variance. We propose a new
kernel-based test for distributional effects of the treatment. It is, to the
best of our knowledge, the first kernel-based, doubly-robust test with provably
valid type-I error. Furthermore, our proposed algorithm is efficient, avoiding
the use of permutations
Predicting pharmaceutical particle size distributions using kernel mean embedding
In the pharmaceutical industry, the transition to continuous manufacturing of solid dosage forms is adopted by more and more companies. For these continuous processes, high-quality process models are needed. In pharmaceutical wet granulation, a unit operation in the ConsiGmaTM-25 continuous powder-to-tablet system (GEA Pharma systems, Collette, Wommelgem, Belgium), the product under study presents itself as a collection of particles that differ in shape and size. The measurement of this collection results in a particle size distribution. However, the theoretical basis to describe the physical phenomena leading to changes in this particle size distribution is lacking. It is essential to understand how the particle size distribution changes as a function of the unit operation's process settings, as it has a profound effect on the behavior of the fluid bed dryer. Therefore, we suggest a data-driven modeling framework that links the machine settings of the wet granulation unit operation and the output distribution of granules. We do this without making any assumptions on the nature of the distributions under study. A simulation of the granule size distribution could act as a soft sensor when in-line measurements are challenging to perform. The method of this work is a two-step procedure: first, the measured distributions are transformed into a high-dimensional feature space, where the relation between the machine settings and the distributions can be learnt. Second, the inverse transformation is performed, allowing an interpretation of the results in the original measurement space. Further, a comparison is made with previous work, which employs a more mechanistic framework for describing the granules. A reliable prediction of the granule size is vital in the assurance of quality in the production line, and is needed in the assessment of upstream (feeding) and downstream (drying, milling, and tableting) issues. Now that a validated data-driven framework for predicting pharmaceutical particle size distributions is available, it can be applied in settings such as model-based experimental design and, due to its fast computation, there is potential in real-time model predictive control
- …