73 research outputs found
Structured probabilistic inference
AbstractProbabilistic inference is among the main topics with reasoning in uncertainty in AI. For this purpose, Bayesian Networks (BNs) is one of the most successful and efficient Probabilistic Graphical Model (PGM) so far. Since the mid-90s, a growing number of BNs extensions have been proposed. Object-oriented, entity-relationship and first-order logic are the main representation paradigms used to extend BNs. While entity-relationship and first-order models have been successfully used for machine learning in defining lifted probabilistic inference, object-oriented models have been mostly underused. Structured inference, which exploits the structural knowledge encoded in an object-oriented PGM, is a surprisingly unstudied technique. In this paper we propose a full object-oriented framework for PRM and propose two extensions of the state-of-the-art structured inference algorithm: SPI which removes the major flaws of existing algorithms and SPISBB which largely enhances SPI by using d-separation
Anomaly Detection in Networks via Score-Based Generative Models
Node outlier detection in attributed graphs is a challenging problem for
which there is no method that would work well across different datasets.
Motivated by the state-of-the-art results of score-based models in graph
generative modeling, we propose to incorporate them into the aforementioned
problem. Our method achieves competitive results on small-scale graphs. We
provide an empirical analysis of the Dirichlet energy, and show that generative
models might struggle to accurately reconstruct it.Comment: 16 pages, 8 figures, ICML workshop on Structured Probabilistic
Inference & Generative Modelin
10 Years of Probabilistic Querying – What Next?
Over the past decade, the two research areas of probabilistic databases and probabilistic programming have intensively studied the problem of making structured probabilistic inference scalable, but — so far — both areas developed almost independently of one another. While probabilistic databases have focused on describing tractable query classes based on the structure of query plans and data lineage, probabilistic programming has contributed sophisticated inference techniques based on knowledge compilation and lifted (first-order) inference. Both fields have developed their own variants of — both exact and approximate — top-k algorithms for query evaluation, and both investigate query optimization techniques known from SQL, Datalog, and Prolog, which all calls for a more intensive study of the commonalities and integration of the two fields. Moreover, we believe that natural-language processing and information extraction will remain a driving factor and in fact a longstanding challenge for developing expressive representation models which can be combined with structured probabilistic inference — also for the next decades to come
Beyond Intuition, a Framework for Applying GPs to Real-World Data
Gaussian Processes (GPs) offer an attractive method for regression over
small, structured and correlated datasets. However, their deployment is
hindered by computational costs and limited guidelines on how to apply GPs
beyond simple low-dimensional datasets. We propose a framework to identify the
suitability of GPs to a given problem and how to set up a robust and
well-specified GP model. The guidelines formalise the decisions of experienced
GP practitioners, with an emphasis on kernel design and options for
computational scalability. The framework is then applied to a case study of
glacier elevation change yielding more accurate results at test time.Comment: Accepted at the 1st ICML Workshop on Structured Probabilistic
Inference and Generative Modelling (2023
Geometric Constraints in Probabilistic Manifolds: A Bridge from Molecular Dynamics to Structured Diffusion Processes
Understanding the macroscopic characteristics of biological complexes demands
precision and specificity in statistical ensemble modeling. One of the primary
challenges in this domain lies in sampling from particular subsets of the
state-space, driven either by existing structural knowledge or specific areas
of interest within the state-space. We propose a method that enables sampling
from distributions that rigorously adhere to arbitrary sets of geometric
constraints in Euclidean spaces. This is achieved by integrating a constraint
projection operator within the well-regarded architecture of Denoising
Diffusion Probabilistic Models, a framework founded in generative modeling and
probabilistic inference. The significance of this work becomes apparent, for
instance, in the context of deep learning-based drug design, where it is
imperative to maintain specific molecular profile interactions to realize the
desired therapeutic outcomes and guarantee safety.Comment: Published at ICML 2023 Workshop on Structured Probabilistic Inference
and Generative Modelin
Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation
The practical utility of causality in decision-making is widely recognized,
with causal discovery and inference being inherently intertwined. Nevertheless,
a notable gap exists in the evaluation of causal discovery methods, where
insufficient emphasis is placed on downstream inference. To address this gap,
we evaluate six established baseline causal discovery methods and a newly
proposed method based on GFlowNets, on the downstream task of treatment effect
estimation. Through the implementation of a robust evaluation procedure, we
offer valuable insights into the efficacy of these causal discovery methods for
treatment effect estimation, considering both synthetic and real-world
scenarios, as well as low-data scenarios. Furthermore, the results of our study
demonstrate that GFlowNets possess the capability to effectively capture a wide
range of useful and diverse ATE modes.Comment: Peer-Reviewed and Accepted to ICML 2023 Workshop on Structured
Probabilistic Inference & Generative Modelin
BatchGFN: Generative Flow Networks for Batch Active Learning
We introduce BatchGFN -- a novel approach for pool-based active learning that
uses generative flow networks to sample sets of data points proportional to a
batch reward. With an appropriate reward function to quantify the utility of
acquiring a batch, such as the joint mutual information between the batch and
the model parameters, BatchGFN is able to construct highly informative batches
for active learning in a principled way. We show our approach enables sampling
near-optimal utility batches at inference time with a single forward pass per
point in the batch in toy regression problems. This alleviates the
computational complexity of batch-aware algorithms and removes the need for
greedy approximations to find maximizers for the batch reward. We also present
early results for amortizing training across acquisition steps, which will
enable scaling to real-world tasks.Comment: Accepted at the Structured Probabilistic Inference & Generative
Modeling workshop, ICML 202
Augmenting Control over Exploration Space in Molecular Dynamics Simulators to Streamline De Novo Analysis through Generative Control Policies
This study introduces the P5 model - a foundational method that utilizes
reinforcement learning (RL) to augment control, effectiveness, and scalability
in molecular dynamics simulations (MD). Our innovative strategy optimizes the
sampling of target polymer chain conformations, marking an efficiency
improvement of over 37.1%. The RL-induced control policies function as an
inductive bias, modulating Brownian forces to steer the system towards the
preferred state, thereby expanding the exploration of the configuration space
beyond what traditional MD allows. This broadened exploration generates a more
varied set of conformations and targets specific properties, a feature pivotal
for progress in polymer development, drug discovery, and material design. Our
technique offers significant advantages when investigating new systems with
limited prior knowledge, opening up new methodologies for tackling complex
simulation problems with generative techniques.Comment: ICML 2023 Workshop on Structured Probabilistic Inference (SPIGM) and
Generative Modeling, of the International Conference of Machine Learning
(ICML
Diffusion Generative Inverse Design
Inverse design refers to the problem of optimizing the input of an objective
function in order to enact a target outcome. For many real-world engineering
problems, the objective function takes the form of a simulator that predicts
how the system state will evolve over time, and the design challenge is to
optimize the initial conditions that lead to a target outcome. Recent
developments in learned simulation have shown that graph neural networks (GNNs)
can be used for accurate, efficient, differentiable estimation of simulator
dynamics, and support high-quality design optimization with gradient- or
sampling-based optimization procedures. However, optimizing designs from
scratch requires many expensive model queries, and these procedures exhibit
basic failures on either non-convex or high-dimensional problems.In this work,
we show how denoising diffusion models (DDMs) can be used to solve inverse
design problems efficiently and propose a particle sampling algorithm for
further improving their efficiency. We perform experiments on a number of fluid
dynamics design challenges, and find that our approach substantially reduces
the number of calls to the simulator compared to standard techniques.Comment: ICML workshop on Structured Probabilistic Inference & Generative
Modelin
An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets
Reinforcement Learning (RL) algorithms aim to learn an optimal policy by
iteratively sampling actions to learn how to maximize the total expected
return, . GFlowNets are a special class of algorithms designed to
generate diverse candidates, , from a discrete set, by learning a policy
that approximates the proportional sampling of . GFlowNets exhibit
improved mode discovery compared to conventional RL algorithms, which is very
useful for applications such as drug discovery and combinatorial search.
However, since GFlowNets are a relatively recent class of algorithms, many
techniques which are useful in RL have not yet been associated with them. In
this paper, we study the utilization of a replay buffer for GFlowNets. We
explore empirically various replay buffer sampling techniques and assess the
impact on the speed of mode discovery and the quality of the modes discovered.
Our experimental results in the Hypergrid toy domain and a molecule synthesis
environment demonstrate significant improvements in mode discovery when
training with a replay buffer, compared to training only with trajectories
generated on-policy.Comment: Accepted to ICML 2023 workshop on Structured Probabilistic Inference
& Generative Modelin
- …