79 research outputs found
GFlowNet-EM for learning compositional latent variable models
Latent variable models (LVMs) with discrete compositional latents are an
important but challenging setting due to a combinatorially large number of
possible configurations of the latents. A key tradeoff in modeling the
posteriors over latents is between expressivity and tractable optimization. For
algorithms based on expectation-maximization (EM), the E-step is often
intractable without restrictive approximations to the posterior. We propose the
use of GFlowNets, algorithms for sampling from an unnormalized density by
learning a stochastic policy for sequential construction of samples, for this
intractable E-step. By training GFlowNets to sample from the posterior over
latents, we take advantage of their strengths as amortized variational
inference algorithms for complex distributions over discrete structures. Our
approach, GFlowNet-EM, enables the training of expressive LVMs with discrete
compositional latents, as shown by experiments on non-context-free grammar
induction and on images using discrete variational autoencoders (VAEs) without
conditional independence enforced in the encoder.Comment: ICML 2023; code: https://github.com/GFNOrg/GFlowNet-E
Amortizing intractable inference in large language models
Autoregressive large language models (LLMs) compress knowledge from their
training data through next-token conditional distributions. This limits
tractable querying of this knowledge to start-to-end autoregressive sampling.
However, many tasks of interest -- including sequence continuation, infilling,
and other forms of constrained generation -- involve sampling from intractable
posterior distributions. We address this limitation by using amortized Bayesian
inference to sample from these intractable posteriors. Such amortization is
algorithmically achieved by fine-tuning LLMs via diversity-seeking
reinforcement learning algorithms: generative flow networks (GFlowNets). We
empirically demonstrate that this distribution-matching paradigm of LLM
fine-tuning can serve as an effective alternative to maximum-likelihood
training and reward-maximizing policy optimization. As an important
application, we interpret chain-of-thought reasoning as a latent variable
modeling problem and demonstrate that our approach enables data-efficient
adaptation of LLMs to tasks that require multi-step rationalization and tool
use.Comment: 23 pages; code: https://github.com/GFNOrg/gfn-lm-tunin
PhyloGFN: Phylogenetic inference with generative flow networks
Phylogenetics is a branch of computational biology that studies the
evolutionary relationships among biological entities. Its long history and
numerous applications notwithstanding, inference of phylogenetic trees from
sequence data remains challenging: the high complexity of tree space poses a
significant obstacle for the current combinatorial and probabilistic
techniques. In this paper, we adopt the framework of generative flow networks
(GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and
Bayesian phylogenetic inference. Because GFlowNets are well-suited for sampling
complex combinatorial structures, they are a natural choice for exploring and
sampling from the multimodal posterior distribution over tree topologies and
evolutionary distances. We demonstrate that our amortized posterior sampler,
PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real
benchmark datasets. PhyloGFN is competitive with prior works in marginal
likelihood estimation and achieves a closer fit to the target distribution than
state-of-the-art variational inference methods
BatchGFN: Generative Flow Networks for Batch Active Learning
We introduce BatchGFN -- a novel approach for pool-based active learning that
uses generative flow networks to sample sets of data points proportional to a
batch reward. With an appropriate reward function to quantify the utility of
acquiring a batch, such as the joint mutual information between the batch and
the model parameters, BatchGFN is able to construct highly informative batches
for active learning in a principled way. We show our approach enables sampling
near-optimal utility batches at inference time with a single forward pass per
point in the batch in toy regression problems. This alleviates the
computational complexity of batch-aware algorithms and removes the need for
greedy approximations to find maximizers for the batch reward. We also present
early results for amortizing training across acquisition steps, which will
enable scaling to real-world tasks.Comment: Accepted at the Structured Probabilistic Inference & Generative
Modeling workshop, ICML 202
Learning GFlowNets from partial episodes for improved convergence and stability
Generative flow networks (GFlowNets) are a family of algorithms for training
a sequential sampler of discrete objects under an unnormalized target density
and have been successfully used for various probabilistic modeling tasks.
Existing training objectives for GFlowNets are either local to states or
transitions, or propagate a reward signal over an entire sampling trajectory.
We argue that these alternatives represent opposite ends of a gradient
bias-variance tradeoff and propose a way to exploit this tradeoff to mitigate
its harmful effects. Inspired by the TD() algorithm in reinforcement
learning, we introduce subtrajectory balance or SubTB(), a GFlowNet
training objective that can learn from partial action subsequences of varying
lengths. We show that SubTB() accelerates sampler convergence in
previously studied and new environments and enables training GFlowNets in
environments with longer action sequences and sparser reward landscapes than
what was possible before. We also perform a comparative analysis of stochastic
gradient dynamics, shedding light on the bias-variance tradeoff in GFlowNet
training and the advantages of subtrajectory balance.Comment: ICML 202
GFlowOut: Dropout with Generative Flow Networks
Bayesian Inference offers principled tools to tackle many critical problems
with modern neural networks such as poor calibration and generalization, and
data inefficiency. However, scaling Bayesian inference to large architectures
is challenging and requires restrictive approximations. Monte Carlo Dropout has
been widely used as a relatively cheap way for approximate Inference and to
estimate uncertainty with deep neural networks. Traditionally, the dropout mask
is sampled independently from a fixed distribution. Recent works show that the
dropout mask can be viewed as a latent variable, which can be inferred with
variational inference. These methods face two important challenges: (a) the
posterior distribution over masks can be highly multi-modal which can be
difficult to approximate with standard variational inference and (b) it is not
trivial to fully utilize sample-dependent information and correlation among
dropout masks to improve posterior estimation. In this work, we propose
GFlowOut to address these issues. GFlowOut leverages the recently proposed
probabilistic framework of Generative Flow Networks (GFlowNets) to learn the
posterior distribution over dropout masks. We empirically demonstrate that
GFlowOut results in predictive distributions that generalize better to
out-of-distribution data, and provide uncertainty estimates which lead to
better performance in downstream tasks
Learning, Probability and Logic: Toward a Unified Approach for Content-Based Music Information Retrieval
Within the last 15 years, the field of Music Information Retrieval (MIR) has made tremendous progress in the development of algorithms for organizing and analyzing the ever-increasing large and varied amount of music and music-related data available digitally. However, the development of content-based methods to enable or ameliorate multimedia retrieval still remains a central challenge. In this perspective paper, we critically look at the problem of automatic chord estimation from audio recordings as a case study of content-based algorithms, and point out several bottlenecks in current approaches: expressiveness and flexibility are obtained to the expense of robustness and vice versa; available multimodal sources of information are little exploited; modeling multi-faceted and strongly interrelated musical information is limited with current architectures; models are typically restricted to short-term analysis that does not account for the hierarchical temporal structure of musical signals. Dealing with music data requires the ability to tackle both uncertainty and complex relational structure at multiple levels of representation. Traditional approaches have generally treated these two aspects separately, probability and learning being the usual way to represent uncertainty in knowledge, while logical representation being the usual way to represent knowledge and complex relational information. We advocate that the identified hurdles of current approaches could be overcome by recent developments in the area of Statistical Relational Artificial Intelligence (StarAI) that unifies probability, logic and (deep) learning. We show that existing approaches used in MIR find powerful extensions and unifications in StarAI, and we explain why we think it is time to consider the new perspectives offered by this promising research field
- âŠ