6 research outputs found

    Sequential and adaptive Bayesian computation for inference and optimization

    Get PDF
    With the advent of cheap and ubiquitous measurement devices, today more data is measured, recorded, and archived in a relatively short span of time than all data recorded throughout history. Moreover, advances in computation have made it possible to model much more complicated phenomena and to use the vast amounts of data to calibrate the resulting high-dimensional models. In this thesis, we are interested in two fundamental problems which are repeatedly being faced in practice as the dimension of the models and datasets are growing steadily: the problem of inference in high-dimensional models and the problem of optimization for problems when the number of data points is very large. The inference problem gets difficult when the model one wants to calibrate and estimate is defined in a high-dimensional space. The behavior of computational algorithms in high-dimensional spaces is complicated and defies intuition. Computational methods which work accurately for inferring low-dimensional models, for example, may fail to generalize the same performance to high-dimensional models. In recent years, due to the significant interest in high-dimensional models, there has been a plethora of work in signal processing and machine learning to develop computational methods which are robust in high-dimensional spaces. In particular, the high-dimensional stochastic filtering problem has attracted significant attention as it arises in multiple fields which are of crucial importance such as geophysics, aerospace, control. In particular, a class of algorithms called particle filters has received attention and become a fruitful field of research because of their accuracy and robustness in low-dimensional systems. In short, these methods keep a cloud of particles (samples in a state space), which describe the empirical probability distribution over the state variable of interest. The particle filters use a model of the phenomenon of interest to propagate and predict the future states and use an observation model to assimilate the observations to correct the state estimates. The most common particle filter, called the bootstrap particle filter (BPF), consists of an iterative sampling-weighting-resampling scheme. However, BPFs also largely fail at inferring high-dimensional dynamical systems due to a number of reasons. In this work, we propose a novel particle filter, named the nudged particle filter (NuPF), which specifically aims at improving the performance of particle filters in high-dimensional systems. The algorithm relies on the idea of nudging, which has been widely used in the geophysics literature to tackle high-dimensional inference problems. In particular, in addition to standard sampling-weighting-resampling steps of the particle filter, we define a general nudging step based on the gradient of the likelihoods, which generalize some of the nudging schemes proposed in the literature. This step is based on modifying the particles, generated in the sampling step, using the gradients of the likelihoods. In particular, the nudging step moves a fraction of the particles to the regions under which they have high-likelihoods. This scheme results in significantly improved behavior in high-dimensional models. The resulting NuPF is able to track high-dimensional systems successfully. Unlike the proposed nudging schemes in the literature, the NuPF does not rely on Gaussianity assumptions and can be defined for a general likelihood. We analytically prove that, because we only move a fraction of the particles and not all of them, the algorithm has a convergence rate that matches standard Monte Carlo algorithms. More precisely, the NuPF has the same asymptotic convergence guarantees as the bootstrap particle filter. As a byproduct, we also show that the nudging step improves the robustness of the particle filter against model misspecification. In particular, model misspecification occurs when the true data-generating system and the model posed by the user of the algorithm differ significantly. In this case, a majority of computational inference methods fail due to the discrepancy between the modeling assumptions and the observed data. We show that the nudging step increases the robustness of particle filters against model misspecification. Specifically, we prove that the NuPF generates particle systems which have provably higher marginal likelihoods compared to the standard bootstrap particle filter. This theoretical result is attained by showing that the NuPF can be interpreted as a bootstrap particle filter for a modified state-space model. Finally, we demonstrate the empirical behavior of the NuPF with several examples. In particular, we show results on high-dimensional linear state-space models, a misspecified Lorenz 63 model, a high-dimensional Lorenz 96 model, and a misspecified object tracking model. In all examples, the NuPF infers the states successfully. The second problem, the so-called scability problem in optimization, occurs because of the large number of data points in modern datasets. With the increasing abundance of data, many problems in signal processing, statistical inference, and machine learning turn into a large-scale optimization problems. For example, in signal processing, one might be interested in estimating a sparse signal given a large number of corrupted observations. Similarly, maximum-likelihood inference problems in statistics result in large-scale optimization problems. Another significant application domain is machine learning, where all important training methods are defined as optimization problems. To tackle these problems, computational optimization methods developed over the past decades are inefficient since they need to compute function evaluations or gradients over all the data for a single iteration. Because of this reason, a class of optimization methods, termed stochastic optimization methods, have emerged. The algorithms of this class are designed to tackle problems which are defined over a big number of data points. In short, these methods utilize a subsample of the dataset in order to update the parameter estimate and do so iteratively until some convergence criterion is met. However, there is a major difficulty that has to be addressed: Although the convergence theory for these algorithms is understood, they can have unstable behavior in practice. In particular, the most commonly used stochastic optimization method, namely the stochastic gradient descent, can diverge easily if its step-size is poorly set. Over the years, practitioners have developed a number of rules of thumb to alleviate stability issues. We argue in this thesis that one way to develop robust stochastic optimization methods is to frame them as inference methods. In particular, we show that stochastic optimization schemes can be recast as inference methods and can be understood as inference algorithms. Framing the problem as an inference problem opens the way to compare these methods to the optimal inference algorithms and understand why they might be failing or producing unstable behavior. In this vein, we show that there is an intrinsic relationship between a class of stochastic optimization methods, called incremental proximal methods, and Kalman (and extended Kalman) filters. The filtering approach to stochastic optimization results in an automatic calibration of the step-size, which removes the instability problems depending on the step-sizes. The probabilistic interpretation of stochastic optimization problems also paves the way to develop new optimization methods based on strategies which are popular in the inference literature. In particular, one can use a set of sampling methods in order to solve the inference problem and hence obtain the global minimum. In this manner, we propose a parallel sequential Monte Carlo optimizer (PSMCO), which is aiming at solving stochastic optimization problems. The PSMCO is designed as a zeroth order method which does not use gradients. It only uses subsets of the data points in order to move at each iteration. The PSMCO obtains an estimate of a global minimum at each iteration by utilizing a cheap kernel density estimator. We prove that the resulting estimator converges to a global minimum almost surely as the number of Monte Carlo samples tends to infinity. We also empirically demonstrate that the algorithm is able to reconstruct multiple global minima and solve difficult global optimization problems. By further exploiting the relationship between inference and optimization, we also propose a probabilistic and online matrix factorization method, termed the dictionary filter to solve large-scale matrix factorization problems. Matrix factorization methods have received significant interest from the machine learning community due to their expressive representations of high-dimensional data and interpretability of their estimates. As the majority of the matrix factorization methods are defined as optimization problems, they suffer from the same issues as stochastic optimization methods. In particular, when using stochastic gradient descent, one might need to try and err many times before deciding to use a step-size. To alleviate these problems, we introduce a matrix-variate probabilistic model for which inference results in a matrix factorization scheme. The scheme is online, in the sense that it only uses a single data point at a time to update the factors. The algorithm bears relationship with optimization schemes, namely with the incremental proximal method defined over a matrix-variate cost function. By way of intuition we developed for the optimization-inference relationship, we devise a model which results in similar update rules for matrix factorization as for the incremental proximal method. However, the probabilistic updates are more stable and efficient. Moreover, the algorithm does not have a step-size parameter to tune, as its role is played by the posterior covariance matrix. We demonstrate the utility of the algorithm on a missing data problem and a video processing problem. We show that the algorithm can be successfully used in machine learning problems and several promising extensions of the method can be constructed easily.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: Ricardo Cao Abad.- Secretario: Michael Peter Wiper.- Vocal: Nicholas Paul Whitele

    Convergence rates for optimised adaptive importance samplers

    Get PDF
    Adaptive importance samplers are adaptive Monte Carlo algorithms to estimate expectations with respect to some target distribution which adapt themselves to obtain better estimators over a sequence of iterations. Although it is straightforward to show that they have the same O(1/N−−√) convergence rate as standard importance samplers, where N is the number of Monte Carlo samples, the behaviour of adaptive importance samplers over the number of iterations has been left relatively unexplored. In this work, we investigate an adaptation strategy based on convex optimisation which leads to a class of adaptive importance samplers termed optimised adaptive importance samplers (OAIS). These samplers rely on the iterative minimisation of the χ2-divergence between an exponential family proposal and the target. The analysed algorithms are closely related to the class of adaptive importance samplers which minimise the variance of the weight function. We first prove non-asymptotic error bounds for the mean squared errors (MSEs) of these algorithms, which explicitly depend on the number of iterations and the number of samples together. The non-asymptotic bounds derived in this paper imply that when the target belongs to the exponential family, the L2 errors of the optimised samplers converge to the optimal rate of O(1/N−−√) and the rate of convergence in the number of iterations are explicitly provided. When the target does not belong to the exponential family, the rate of convergence is the same but the asymptotic L2 error increases by a factor ρ⋆−−√>1, where ρ⋆−1 is the minimum χ2-divergence between the target and an exponential family proposal.This work was supported by The Alan Turing Institute for Data Science and AI under EPSRC Grant EP/N510129/1. J.M. acknowledges the support of the Spanish Agencia Estatal de Investigación (awards TEC2015-69868-C2-1-R ADVENTURE and RTI2018-099655-B-I00 CLARA) and the Office of Naval Research (Award No. N00014-19-1-2226)

    Nudging the particle filter

    Get PDF
    Documento depositado en el repositorio arXiv.org. Versión: arXiv:1708.07801v2 [stat.CO]We investigate a new sampling scheme to improve the performance of particle filters in scenarios where either (a) there is a significant mismatch between the assumed model dynamics and the actual system producing the available observations, or (b) the system of interest is high dimensional and the posterior probability tends to concentrate in relatively small regions of the state space. The proposed scheme generates nudged particles, i.e., subsets of particles which are deterministically pushed towards specific areas of the state space where the likelihood is expected to be high, an operation known as nudging in the geophysics literature. This is a device that can be plugged into any particle filtering scheme, as it does not involve modifications in the classical algorithmic steps of sampling, computation of weights, and resampling. Since the particles are modified, but the importance weights do not account for this modification, the use of nudging leads to additional bias in the resulting estimators. However, we prove analytically that particle filters equipped with the proposed device still attain asymptotic convergence (with the same error rates as conventional particle methods) as long as the nudged particles are generated according to simple and easy-to-implement rules. Finally, we show numerical results that illustrate the improvement in performance and robustness that can be attained using the proposed scheme. In particular, we show the results of computer experiments involving misspecified Lorenz 63 model, object tracking with misspecified models, and a large dimensional Lorenz 96 chaotic model. For the examples we have investigated, the new particle filter outperforms conventional algorithms empirically, while it has only negligible computational overhead.This work was partially supported by Ministerio de Economía y Competitividad of Spain (TEC2015-69868-C2-1-R ADVENTURE), the Office of Naval Research Global (N62909-15-1-2011), and the regional government of Madrid (program CA SICAM-CM S2013/ICE-2845

    The Incremental Proximal Method: A Probabilistic Perspective

    No full text
    International audienc

    A Probabilistic Incremental Proximal Gradient Method

    No full text

    The safety and efficacy of first-line atezolizumab plus bevacizumab in patients with unresectable hepatocellular carcinoma: A multicenter real-world study from Turkey

    No full text
    The aim of the study was to evaluate the real-world clinical outcomes of atezolizumab and bevacizumab (Atez/Bev) as the initial therapy for advanced hepatocellular carcinoma (HCC). We retrospectively analyzed 65 patients treated with Atez/Bev for advanced HCC from 22 institutions in Turkey between September 2020 and March 2023. Responses were evaluated by RECIST v1.1 criteria. The median progression-free survival (PFS) and overall survival (OS) were calculated using the Kaplan-Meier method. Cox regression model was employed to conduct multivariate analyses. The median age was 65 (range, 22-89) years, and 83.1% of the patients were male. A total of 1.5% achieved a complete response, 35.4% had a partial response, 36.9% had stable disease, and 26.2% had progressive disease. The disease control rate was 73.8% and associated with alpha-fetoprotein levels at diagnosis and concomitant antibiotic use. The incidence rates of any grade and grade ≥ 3 adverse events were 29.2% and 10.7%, respectively. At a median follow-up of 11.3 (3.4-33.3) months, the median PFS and OS were 5.1 (95% CI: 3-7.3) and 18.1 (95% CI: 6.2-29.9) months, respectively. In univariate analyses, ECOG-PS ≥ 1 (relative to 0), Child-Pugh class B (relative to A), neutrophil-to-lymphocyte ratio (NLR) > 2.9 (relative to ≤ 2.9), and concomitant antibiotic use significantly increased the overall risk of mortality. Multivariate analysis revealed that ECOG-PS ≥ 1 (HR: 2.69, P = .02), NLR > 2.9 (HR: 2.94, P = .017), and concomitant antibiotic use (HR: 4.18, P = .003) were independent predictors of OS. Atez/Bev is an effective and safe first-line therapy for advanced-stage HCC in a real-world setting. The survival benefit was especially promising in patients with a ECOG-PS score of 0, Child-Pugh class A, lower NLR, and patients who were not exposed to antibiotics during the treatment
    corecore