116,280 research outputs found
Combining Models of Approximation with Partial Learning
In Gold's framework of inductive inference, the model of partial learning
requires the learner to output exactly one correct index for the target object
and only the target object infinitely often. Since infinitely many of the
learner's hypotheses may be incorrect, it is not obvious whether a partial
learner can be modifed to "approximate" the target object.
Fulk and Jain (Approximate inference and scientific method. Information and
Computation 114(2):179--191, 1994) introduced a model of approximate learning
of recursive functions. The present work extends their research and solves an
open problem of Fulk and Jain by showing that there is a learner which
approximates and partially identifies every recursive function by outputting a
sequence of hypotheses which, in addition, are also almost all finite variants
of the target function.
The subsequent study is dedicated to the question how these findings
generalise to the learning of r.e. languages from positive data. Here three
variants of approximate learning will be introduced and investigated with
respect to the question whether they can be combined with partial learning.
Following the line of Fulk and Jain's research, further investigations provide
conditions under which partial language learners can eventually output only
finite variants of the target language. The combinabilities of other partial
learning criteria will also be briefly studied.Comment: 28 page
Scalable Recommendation with Poisson Factorization
We develop a Bayesian Poisson matrix factorization model for forming
recommendations from sparse user behavior data. These data are large user/item
matrices where each user has provided feedback on only a small subset of items,
either explicitly (e.g., through star ratings) or implicitly (e.g., through
views or purchases). In contrast to traditional matrix factorization
approaches, Poisson factorization implicitly models each user's limited
attention to consume items. Moreover, because of the mathematical form of the
Poisson likelihood, the model needs only to explicitly consider the observed
entries in the matrix, leading to both scalable computation and good predictive
performance. We develop a variational inference algorithm for approximate
posterior inference that scales up to massive data sets. This is an efficient
algorithm that iterates over the observed entries and adjusts an approximate
posterior over the user/item representations. We apply our method to large
real-world user data containing users rating movies, users listening to songs,
and users reading scientific papers. In all these settings, Bayesian Poisson
factorization outperforms state-of-the-art matrix factorization methods
Group equivariant neural posterior estimation
Simulation-based inference with conditional neural density estimators is a powerful approach to solving inverse problems in science. However, these methods typically treat the underlying forward model as a black box, with no way to exploit geometric properties such as equivariances. Equivariances are common in scientific models, however integrating them directly into expressive inference networks (such as normalizing flows) is not straightforward. We here describe an alternative method to incorporate equivariances under joint transformations of parameters and data. Our method -- called group equivariant neural posterior estimation (GNPE) -- is based on self-consistently standardizing the "pose" of the data while estimating the posterior over parameters. It is architecture-independent, and applies both to exact and approximate equivariances. As a real-world application, we use GNPE for amortized inference of astrophysical binary black hole systems from gravitational-wave observations. We show that GNPE achieves state-of-the-art accuracy while reducing inference times by three orders of magnitude
Group equivariant neural posterior estimation
Simulation-based inference with conditional neural density estimators is a
powerful approach to solving inverse problems in science. However, these
methods typically treat the underlying forward model as a black box, with no
way to exploit geometric properties such as equivariances. Equivariances are
common in scientific models, however integrating them directly into expressive
inference networks (such as normalizing flows) is not straightforward. We here
describe an alternative method to incorporate equivariances under joint
transformations of parameters and data. Our method -- called group equivariant
neural posterior estimation (GNPE) -- is based on self-consistently
standardizing the "pose" of the data while estimating the posterior over
parameters. It is architecture-independent, and applies both to exact and
approximate equivariances. As a real-world application, we use GNPE for
amortized inference of astrophysical binary black hole systems from
gravitational-wave observations. We show that GNPE achieves state-of-the-art
accuracy while reducing inference times by three orders of magnitude.Comment: 13+11 pages, 5+8 figure
Whither PQL?
Generalized linear mixed models (GLMM) are generalized linear models with normally distributed random effects in the linear predictor. Penalized quasi-likelihood (PQL), an approximate method of inference in GLMMs, involves repeated fitting of linear mixed models with “working” dependent variables and iterative weights that depend on parameter estimates from the previous cycle of iteration. The generality of PQL, and its implementation in commercially available software, has encouraged the application of GLMMs in many scientific fields. Caution is needed, however, since PQL may sometimes yield badly biased estimates of variance components, especially with binary outcomes.
Recent developments in numerical integration, including adaptive Gaussian quadrature, higher order Laplace expansions, stochastic integration and Markov chain Monte Carlo (MCMC) algorithms, provide attractive alternatives to PQL for approximate likelihood inference in GLMMs. Analyses of some well known datasets, and simulations based on these analyses, suggest that PQL still performs remarkably well in comparison with more elaborate procedures in many practical situations. Adaptive Gaussian quadrature is a viable alternative for nested designs where the numerical integration is limited to a small number of dimensions. Higher order Laplace approximations hold the promise of accurate inference more generally. MCMC is likely the method of choice for the most complex problems that involve high dimensional integrals
Approximate Bayesian inference for individual-based models with emergent dynamics
Individual-based models are used in a variety of scientific domains to study systems composed of multiple agents that interact
with one another and lead to complex emergent dynamics at the macroscale. A standard approach in the analysis of these systems is
to specify the microscale interaction rules in a simulation model, run simulations, and then qualitatively compare outputs to empirical
observations. Recently, more robust methods for inference for these types of models have been introduced, notably approximate Bayesian
computation, however major challenges remain due to the computational cost of simulations and the nonlinear nature of many complex
systems. Here, we compare two methods of approximate inference in a classic individual-based model of group dynamics with well-studied
nonlinear macroscale behaviour; we employ a Gaussian process accelerated ABC method with an approximated likelihood and with a
synthetic likelihood. We compare the accuracy of results when re-inferring parameters using a measure of macro-scale disorder (the
order parameter) as a summary statistic. Our findings reveal that for a canonical simple model of animal collective movement, parameter
inference is accurate and computationally efficient, even when the model is poised at the critical transition between order and disorder
Estimating False Discovery Proportion Under Arbitrary Covariance Dependence
Multiple hypothesis testing is a fundamental problem in high dimensional
inference, with wide applications in many scientific fields. In genome-wide
association studies, tens of thousands of tests are performed simultaneously to
find if any SNPs are associated with some traits and those tests are
correlated. When test statistics are correlated, false discovery control
becomes very challenging under arbitrary dependence. In the current paper, we
propose a novel method based on principal factor approximation, which
successfully subtracts the common dependence and weakens significantly the
correlation structure, to deal with an arbitrary dependence structure. We
derive an approximate expression for false discovery proportion (FDP) in large
scale multiple testing when a common threshold is used and provide a consistent
estimate of realized FDP. This result has important applications in controlling
FDR and FDP. Our estimate of realized FDP compares favorably with Efron
(2007)'s approach, as demonstrated in the simulated examples. Our approach is
further illustrated by some real data applications. We also propose a
dependence-adjusted procedure, which is more powerful than the fixed threshold
procedure.Comment: 51 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1012.439
- …