5 research outputs found
Affine monads and lazy structures for Bayesian programming
We show that streams and lazy data structures are a natural idiom for programming with infinite-dimensional Bayesian methods such as Poisson processes, Gaussian processes, jump processes, Dirichlet processes, and Beta processes. The crucial semantic idea, inspired by developments in synthetic probability theory, is to work with two separate monads: an affine monad of probability, which supports laziness, and a commutative, non-affine monad of measures, which does not. (Affine means that T(1)≅ 1.) We show that the separation is important from a decidability perspective, and that the recent model of quasi-Borel spaces supports these two monads.
To perform Bayesian inference with these examples, we introduce new inference methods that are specially adapted to laziness; they are proven correct by reference to the Metropolis-Hastings-Green method. Our theoretical development is implemented as a Haskell library, LazyPPL
Amortizing intractable inference in large language models
Autoregressive large language models (LLMs) compress knowledge from their
training data through next-token conditional distributions. This limits
tractable querying of this knowledge to start-to-end autoregressive sampling.
However, many tasks of interest -- including sequence continuation, infilling,
and other forms of constrained generation -- involve sampling from intractable
posterior distributions. We address this limitation by using amortized Bayesian
inference to sample from these intractable posteriors. Such amortization is
algorithmically achieved by fine-tuning LLMs via diversity-seeking
reinforcement learning algorithms: generative flow networks (GFlowNets). We
empirically demonstrate that this distribution-matching paradigm of LLM
fine-tuning can serve as an effective alternative to maximum-likelihood
training and reward-maximizing policy optimization. As an important
application, we interpret chain-of-thought reasoning as a latent variable
modeling problem and demonstrate that our approach enables data-efficient
adaptation of LLMs to tasks that require multi-step rationalization and tool
use.Comment: 23 pages; code: https://github.com/GFNOrg/gfn-lm-tunin
Probabilistic Programming Interfaces for Random Graphs::Markov Categories, Graphons, and Nominal Sets
We study semantic models of probabilistic programming languages over graphs, and establish a connection to graphons from graph theory and combinatorics. We show that every well-behaved equational theory for our graph probabilistic programming language corresponds to a graphon, and conversely, every graphon arises in this way.We provide three constructions for showing that every graphon arises from an equational theory. The first is an abstract construction, using Markov categories and monoidal indeterminates. The second and third are more concrete. The second is in terms of traditional measure theoretic probability, which covers 'black-and-white' graphons. The third is in terms of probability monads on the nominal sets of Gabbay and Pitts. Specifically, we use a variation of nominal sets induced by the theory of graphs, which covers Erdős-Rényi graphons. In this way, we build new models of graph probabilistic programming from graphons
Probabilistic programming interfaces for random graphs: Markov categories, graphons, and nominal sets
We study semantic models of probabilistic programming languages over graphs, and establish a connection to graphons from graph theory and combinatorics. We show that every well-behaved equational theory for our graph probabilistic programming language corresponds to a graphon, and conversely, every graphon arises in this way. We provide three constructions for showing that every graphon arises from an equational theory. The first is an abstract construction, using Markov categories and monoidal indeterminates. The second and third are more concrete. The second is in terms of traditional measure theoretic probability, which covers ‘black-and-white’ graphons. The third is in terms of probability monads on the nominal sets of Gabbay and Pitts. Specifically, we use a variation of nominal sets induced by the theory of graphs, which covers Erdős-Rényi graphons. In this way, we build new models of graph probabilistic programming from graphons
A model of stochastic memoization and name generation in probabilistic programming: categorical semantics via monads on presheaf categories
Stochastic memoization is a higher-order construct of probabilisticprogramming languages that is key in Bayesian nonparametrics, a modularapproach that allows us to extend models beyond their parametric limitationsand compose them in an elegant and principled manner. Stochastic memoization issimple and useful in practice, but semantically elusive, particularly regardingdataflow transformations. As the naive implementation resorts to the statemonad, which is not commutative, it is not clear if stochastic memoizationpreserves the dataflow property -- i.e., whether we can reorder the lines of aprogram without changing its semantics, provided the dataflow graph ispreserved. In this paper, we give an operational and categorical semantics tostochastic memoization and name generation in the context of a minimalprobabilistic programming language, for a restricted class of functions. Ourcontribution is a first model of stochastic memoization of constant Bernoullifunctions with a non-enumerable type, which validates data flowtransformations, bridging the gap between traditional probability theory andhigher-order probability models. Our model uses a presheaf category and a novelprobability monad on it.Comment: To be published in the MFPS 2023 Proceedings as part of the Electronic Notes in Theoretical Informatics and Computer Science (ENTICS) serie