25 research outputs found
Antithetic and Monte Carlo kernel estimators for partial rankings
In the modern age, rankings data is ubiquitous and it is useful for a variety
of applications such as recommender systems, multi-object tracking and
preference learning. However, most rankings data encountered in the real world
is incomplete, which prevents the direct application of existing modelling
tools for complete rankings. Our contribution is a novel way to extend kernel
methods for complete rankings to partial rankings, via consistent Monte Carlo
estimators for Gram matrices: matrices of kernel values between pairs of
observations. We also present a novel variance reduction scheme based on an
antithetic variate construction between permutations to obtain an improved
estimator for the Mallows kernel. The corresponding antithetic kernel estimator
has lower variance and we demonstrate empirically that it has a better
performance in a variety of Machine Learning tasks. Both kernel estimators are
based on extending kernel mean embeddings to the embedding of a set of full
rankings consistent with an observed partial ranking. They form a
computationally tractable alternative to previous approaches for partial
rankings data. An overview of the existing kernels and metrics for permutations
is also provided
Efficient Bayesian inference via Monte Carlo and machine learning algorithms
Mención Internacional en el tÃtulo de doctorIn many fields of science and engineering, we are faced with an inverse problem where
we aim to recover an unobserved parameter or variable of interest from a set of observed
variables. Bayesian inference is a probabilistic approach for inferring this unknown parameter
that has become extremely popular, finding application in myriad problems in
fields such as machine learning, signal processing, remote sensing and astronomy. In
Bayesian inference, all the information about the parameter is summarized by the posterior
distribution. Unfortunately, the study of the posterior distribution requires the computation
of complicated integrals, that are analytically intractable and need to be approximated.
Monte Carlo is a huge family of sampling algorithms for performing optimization
and numerical integration that has become the main horsepower for carrying out Bayesian
inference. The main idea of Monte Carlo is that we can approximate the posterior distribution
by a set of samples, obtained by an iterative process that involves sampling from a
known distribution. Markov chain Monte Carlo (MCMC) and importance sampling (IS)
are two important groups of Monte Carlo algorithms. This thesis focuses on developing
and analyzing Monte Carlo algorithms (either MCMC, IS or combination of both)
under different challenging scenarios presented below. In summary, in this thesis we address
several important points, enumerated (a)–(f), that currently represent a challenge in
Bayesian inference via Monte Carlo. A first challenge that we address is the problematic
exploration of the parameter space by off-the-shelf MCMC algorithms when there
is (a) multimodality, or with (b) highly concentrated posteriors. Another challenge that
we address is the (c) proposal construction in IS. Furtheremore, in recent applications we
need to deal with (d) expensive posteriors, and/or we need to handle (e) noisy posteriors.
Finally, the Bayesian framework also offers a way of comparing competing hypothesis
(models) in a principled way by means of marginal likelihoods. Hence, a task that arises
as of fundamental importance is (f) marginal likelihood computation.
Chapters 2 and 3 deal with (a), (b), and (c). In Chapter 2, we propose a novel population
MCMC algorithm called Parallel Metropolis-Hastings Coupler (PMHC). PMHC is
very suitable for multimodal scenarios since it works with a population of states, instead
of a single one, hence allowing for sharing information. PMHC combines independent
exploration by the use of parallel Metropolis-Hastings algorithms, with cooperative exploration
by the use of a population MCMC technique called Normal Kernel Coupler.
In Chapter 3, population MCMC are combined with IS within the layered adaptive IS
(LAIS) framework. The combination of MCMC and IS serves two purposes. First, an
automatic proposal construction. Second, it aims at increasing the robustness, since the
MCMC samples are not used directly to form the sample approximation of the posterior.
The use of minibatches of data is proposed to deal with highly concentrated posteriors.
Other extensions for reducing the costs with respect to the vanilla LAIS framework, based on recycling and clustering, are discussed and analyzed.
Chapters 4, 5 and 6 deal with (c), (d) and (e). The use of nonparametric approximations
of the posterior plays an important role in the design of efficient Monte Carlo algorithms.
Nonparametric approximations of the posterior can be obtained using machine learning
algorithms for nonparametric regression, such as Gaussian Processes and Nearest Neighbors.
Then, they can serve as cheap surrogate models, or for building efficient proposal
distributions. In Chapter 4, in the context of expensive posteriors, we propose adaptive
quadratures of posterior expectations and the marginal likelihood using a sequential algorithm
that builds and refines a nonparametric approximation of the posterior. In Chapter
5, we propose Regression-based Adaptive Deep Importance Sampling (RADIS), an adaptive
IS algorithm that uses a nonparametric approximation of the posterior as the proposal
distribution. We illustrate the proposed algorithms in applications of astronomy and remote
sensing. Chapter 4 and 5 consider noiseless posterior evaluations for building the
nonparametric approximations. More generally, in Chapter 6 we give an overview and
classification of MCMC and IS schemes using surrogates built with noisy evaluations.
The motivation here is the study of posteriors that are both costly and noisy. The classification
reveals a connection between algorithms that use the posterior approximation as a
cheap surrogate, and algorithms that use it for building an efficient proposal. We illustrate
specific instances of the classified schemes in an application of reinforcement learning.
Finally, in Chapter 7 we study noisy IS, namely, IS when the posterior evaluations are
noisy, and derive optimal proposal distributions for the different estimators in this setting.
Chapter 8 deals with (f). In Chapter 8, we provide with an exhaustive review of methods
for marginal likelihood computation, with special focus on the ones based on Monte
Carlo. We derive many connections among the methods and compare them in several
simulations setups. Finally, in Chapter 9 we summarize the contributions of this thesis
and discuss some potential avenues of future research.Programa de Doctorado en IngenierÃa Matemática por la Universidad Carlos III de MadridPresidente: Valero Laparra Pérez-Muelas.- Secretario: Michael Peter Wiper.- Vocal: Omer Deniz Akyildi
Efficient XAI Techniques: A Taxonomic Survey
Recently, there has been a growing demand for the deployment of Explainable
Artificial Intelligence (XAI) algorithms in real-world applications. However,
traditional XAI methods typically suffer from a high computational complexity
problem, which discourages the deployment of real-time systems to meet the
time-demanding requirements of real-world scenarios. Although many approaches
have been proposed to improve the efficiency of XAI methods, a comprehensive
understanding of the achievements and challenges is still needed. To this end,
in this paper we provide a review of efficient XAI. Specifically, we categorize
existing techniques of XAI acceleration into efficient non-amortized and
efficient amortized methods. The efficient non-amortized methods focus on
data-centric or model-centric acceleration upon each individual instance. In
contrast, amortized methods focus on learning a unified distribution of model
explanations, following the predictive, generative, or reinforcement
frameworks, to rapidly derive multiple model explanations. We also analyze the
limitations of an efficient XAI pipeline from the perspectives of the training
phase, the deployment phase, and the use scenarios. Finally, we summarize the
challenges of deploying XAI acceleration methods to real-world scenarios,
overcoming the trade-off between faithfulness and efficiency, and the selection
of different acceleration methods.Comment: 15 pages, 3 figure
BaCO: A Fast and Portable Bayesian Compiler Optimization Framework
We introduce the Bayesian Compiler Optimization framework (BaCO), a general
purpose autotuner for modern compilers targeting CPUs, GPUs, and FPGAs. BaCO
provides the flexibility needed to handle the requirements of modern autotuning
tasks. Particularly, it deals with permutation, ordered, and continuous
parameter types along with both known and unknown parameter constraints. To
reason about these parameter types and efficiently deliver high-quality code,
BaCO uses Bayesian optimiza tion algorithms specialized towards the autotuning
domain. We demonstrate BaCO's effectiveness on three modern compiler systems:
TACO, RISE & ELEVATE, and HPVM2FPGA for CPUs, GPUs, and FPGAs respectively. For
these domains, BaCO outperforms current state-of-the-art autotuners by
delivering on average 1.36x-1.56x faster code with a tiny search budget, and
BaCO is able to reach expert-level performance 2.9x-3.9x faster
Stochastic Methods for Fine-Grained Image Segmentation and Uncertainty Estimation in Computer Vision
In this dissertation, we exploit concepts of probability theory, stochastic methods and machine learning to address three existing limitations of deep learning-based models for image understanding. First, although convolutional neural networks (CNN) have substantially improved the state of the art in image understanding, conventional CNNs provide segmentation masks that poorly adhere to object boundaries, a critical limitation for many potential applications. Second, training deep learning models requires large amounts of carefully selected and annotated data, but large-scale annotation of image segmentation datasets is often prohibitively expensive. And third, conventional deep learning models also lack the capability of uncertainty estimation, which compromises both decision making and model interpretability. To address these limitations, we introduce the Region Growing Refinement (RGR) algorithm, an unsupervised post-processing algorithm that exploits Monte Carlo sampling and pixel similarities to propagate high-confidence labels into regions of low-confidence classification. The probabilistic Region Growing Refinement (pRGR) provides RGR with a rigorous mathematical foundation that exploits concepts of Bayesian estimation and variance reduction techniques. Experiments demonstrate both the effectiveness of (p)RGR for the refinement of segmentation predictions, as well as its suitability for uncertainty estimation, since its variance estimates obtained in the Monte Carlo iterations are highly correlated with segmentation accuracy. We also introduce FreeLabel, an intuitive open-source web interface that exploits RGR to allow users to obtain high-quality segmentation masks with just a few freehand scribbles, in a matter of seconds. Designed to benefit the computer vision community, FreeLabel can be used for both crowdsourced or private annotation and has a modular structure that can be easily adapted for any image dataset. The practical relevance of methods developed in this dissertation are illustrated through applications on agricultural and healthcare-related domains. We have combined RGR and modern CNNs for fine segmentation of fruit flowers, motivated by the importance of automated bloom intensity estimation for optimization of fruit orchard management and, possibly, automatizing procedures such as flower thinning and pollination. We also exploited an early version of FreeLabel to annotate novel datasets for segmentation of fruit flowers, which are currently publicly available. Finally, this dissertation also describes works on fine segmentation and gaze estimation for images collected from assisted living environments, with the ultimate goal of assisting geriatricians in evaluating health status of patients in such facilities
Essays in Likelihood-Based Computational Econometrics
The theory of probabilities is basically
only common sense reduced to a calculus.
Pierre Simon Laplace, 1812
The quote above is from Pierre Simon Laplace’s introduction to his seminal work Th´eorie
analytique des probabilit´es, in which he lays the groundwork for what is currently known
as Bayesian analysis. He proceeds to describe probability theory, and statistical inference,
as a method that makes one estimate accurately what right-minded people feel by a sort
of instinct, often without being able to give a reason for it. (translation from French: Dale,
1995) This statement contains a profound truth and insight: Probability theory offers
a clean and simple recipe for reasoning under uncertainty which I experienced as eyeopening
when I first learned about it. As my knowledge of probability theory increased,
however, I also realized that in isolation this quote presents things to be much simpler than
they actually are: Reducing common sense to a calculus is extremely difficult to do well
in practice. Translating our common sense into the language of probabilities takes a lot of
practice, and if done accurately it often leads to a calculus without any exact solutions. It
is therefore the task of statisticians and econometricians to find practical ways of reducing
our common sense to calculus, and to devise smart new methods for efficiently doing the
resulting calculations. This work represents my contribution towards these goals
Dependence: From classical copula modeling to neural networks
The development of tools to measure and to model dependence in high-dimensional data is of great interest in a wide range of applications including finance, risk management, bioinformatics and environmental sciences. The copula framework, which allows us to extricate the underlying dependence structure of any multivariate distribution from its univariate marginals, has garnered growing popularity over the past few decades. Within the broader context of this framework, we develop several novel statistical methods and tools for analyzing, interpreting and modeling dependence.
In the first half of this thesis, we advance classical copula modeling by introducing new dependence measures and parametric dependence models. To that end, we propose a framework for quantifying dependence between random vectors. Using the notion of a collapsing function, we summarize random vectors by single random variables, referred to as collapsed random variables. In the context of this collapsing function framework, we develop various tools to characterize the dependence between random vectors including new measures of association computed from the collapsed random variables, asymptotic results required to construct confidence intervals for these measures, collapsed copulas to analytically summarize the dependence for certain collapsing functions and a graphical assessment of independence between groups of random variables. We explore several suitable collapsing functions in theoretical and empirical settings. To showcase tools derived from our framework, we present data applications in bioinformatics and finance.
Furthermore, we contribute to the growing literature on parametric copula modeling by generalizing the class of Archimax copulas (AXCs) to hierarchical Archimax copulas (HAXCs). AXCs are typically used to model the dependence at non-extreme levels while accounting for any asymptotic dependence between extremes. HAXCs then enhance the flexibility of AXCs by their ability to model partial asymmetries. We explore two ways of inducing hierarchies. Furthermore, we present various examples of HAXCs along with their stochastic representations, which are used to establish corresponding sampling algorithms.
While the burgeoning research on the construction of parametric copulas has yielded some powerful tools for modeling dependence, the flexibility of these models is already limited in moderately high dimensions and they can often fail to adequately characterize complex dependence structures that arise in real datasets. In the second half of this thesis, we explore utilizing generative neural networks instead of parametric dependence models. In particular, we investigate the use of a type of generative neural network known as the generative moment matching network (GMMN) for two critical dependence modeling tasks. First, we demonstrate how GMMNs can be utilized to generate quasi-random samples from a large variety of multivariate distributions. These GMMN quasi-random samples can then be used to obtain low-variance estimates of quantities of interest. Compared to classical parametric copula methods for multivariate quasi-random sampling, GMMNs provide a more flexible and universal approach. Moreover, we theoretically and numerically corroborate the variance reduction capabilities of GMMN randomized quasi-Monte Carlo estimators. Second, we propose a GMMN--GARCH approach for modeling dependent multivariate time series, where ARMA--GARCH models are utilized to capture the temporal dependence within each univariate marginal time series and GMMNs are used to model the underlying cross-sectional dependence. If the number of marginal time series is large, we embed an intermediate dimension reduction step within our framework. The primary objective of our proposed approach is to produce empirical predictive distributions (EPDs), also known as probabilistic forecasts. In turn, these EPDs are also used to forecast certain risk measures, such as value-at-risk. Furthermore, in the context of modeling yield curves and foreign exchange rate returns, we show that the flexibility of our GMMN--GARCH models leads to better EPDs and risk-measure forecasts, compared to classical copula--GARCH models