386 research outputs found
String and Membrane Gaussian Processes
In this paper we introduce a novel framework for making exact nonparametric
Bayesian inference on latent functions, that is particularly suitable for Big
Data tasks. Firstly, we introduce a class of stochastic processes we refer to
as string Gaussian processes (string GPs), which are not to be mistaken for
Gaussian processes operating on text. We construct string GPs so that their
finite-dimensional marginals exhibit suitable local conditional independence
structures, which allow for scalable, distributed, and flexible nonparametric
Bayesian inference, without resorting to approximations, and while ensuring
some mild global regularity constraints. Furthermore, string GP priors
naturally cope with heterogeneous input data, and the gradient of the learned
latent function is readily available for explanatory analysis. Secondly, we
provide some theoretical results relating our approach to the standard GP
paradigm. In particular, we prove that some string GPs are Gaussian processes,
which provides a complementary global perspective on our framework. Finally, we
derive a scalable and distributed MCMC scheme for supervised learning tasks
under string GP priors. The proposed MCMC scheme has computational time
complexity and memory requirement , where
is the data size and the dimension of the input space. We illustrate the
efficacy of the proposed approach on several synthetic and real-world datasets,
including a dataset with millions input points and attributes.Comment: To appear in the Journal of Machine Learning Research (JMLR), Volume
1
Distributional Regression for Data Analysis
Flexible modeling of how an entire distribution changes with covariates is an
important yet challenging generalization of mean-based regression that has seen
growing interest over the past decades in both the statistics and machine
learning literature. This review outlines selected state-of-the-art statistical
approaches to distributional regression, complemented with alternatives from
machine learning. Topics covered include the similarities and differences
between these approaches, extensions, properties and limitations, estimation
procedures, and the availability of software. In view of the increasing
complexity and availability of large-scale data, this review also discusses the
scalability of traditional estimation methods, current trends, and open
challenges. Illustrations are provided using data on childhood malnutrition in
Nigeria and Australian electricity prices.Comment: Accepted for publication in Annual Review of Statistics and its
Applicatio
Finite-dimensional nonparametric priors: theory and applications
The investigation of flexible classes of discrete prior has been an active research line in Bayesian statistics. Several contributions were devoted to the study of nonparametric priors, including the Dirichlet process, the Pitman–Yor process and normalized random measures with independent increments (NRMI). In contrast, only few finite-dimensional discrete priors are known, and even less come with sufficient theoretical guarantees. In this thesis we aim at filling this gap by presenting several novel general classes of parametric priors closely connected to well-known infinite-dimensional processes, which are recovered as limiting case. A priori and posteriori properties are extensively studied. For instance, we determine explicit expressions for the induced random partition, the associated urn schemes and the posterior distributions. Furthermore, we exploit finite-dimensional approximations to facilitate posterior computations in complex models beyond the exchangeability framework. Our theoretical and computational findings are employed in a variety of real statistical problems, covering toxicological, sociological, and marketing applications
Variational Bayes inference in high-dimensional time-varying parameter models
This paper proposes a mean field variational Bayes algorithm for efficient posterior and predictive inference in time-varying parameter models. Our approach involves: i) computationally trivial Kalman filter updates of regression coefficients, ii) a dynamic variables election prior that removes irrelevant variables in each time period, and iii) a fast approximate state-space estimator of the regression volatility parameter. In an exercise involving simulated data we evaluate the new algorithm numerically and establish its computational advantages. Using macroeconomic data for the US we find that regression models that combine time-varying parameters with the information in many predictors have the potential to improve forecasts over a number of alternatives
Amortised Inference in Bayesian Neural Networks
Meta-learning is a framework in which machine learning models train over a
set of datasets in order to produce predictions on new datasets at test time.
Probabilistic meta-learning has received an abundance of attention from the
research community in recent years, but a problem shared by many existing
probabilistic meta-models is that they require a very large number of datasets
in order to produce high-quality predictions with well-calibrated uncertainty
estimates. In many applications, however, such quantities of data are simply
not available.
In this dissertation we present a significantly more data-efficient approach
to probabilistic meta-learning through per-datapoint amortisation of inference
in Bayesian neural networks, introducing the Amortised Pseudo-Observation
Variational Inference Bayesian Neural Network (APOVI-BNN). First, we show that
the approximate posteriors obtained under our amortised scheme are of similar
or better quality to those obtained through traditional variational inference,
despite the fact that the amortised inference is performed in a single forward
pass. We then discuss how the APOVI-BNN may be viewed as a new member of the
neural process family, motivating the use of neural process training objectives
for potentially better predictive performance on complex problems as a result.
Finally, we assess the predictive performance of the APOVI-BNN against other
probabilistic meta-models in both a one-dimensional regression problem and in a
significantly more complex image completion setting. In both cases, when the
amount of training data is limited, our model is the best in its class.Comment: This thesis served as the author's final project report for the
University of Cambridge part IIB Engineering Tripos. 37 pages, 7 figure
Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain
The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
- …