151 research outputs found

    ε-Kernel Coresets for Stochastic Points

    Get PDF
    With the dramatic growth in the number of application domains that generate probabilistic, noisy and uncertain data, there has been an increasing interest in designing algorithms for geometric or combinatorial optimization problems over such data. In this paper, we initiate the study of constructing epsilon-kernel coresets for uncertain points. We consider uncertainty in the existential model where each point\u27s location is fixed but only occurs with a certain probability, and the locational model where each point has a probability distribution describing its location. An epsilon-kernel coreset approximates the width of a point set in any direction. We consider approximating the expected width (an ε-EXP-KERNEL), as well as the probability distribution on the width (an (ε, tau)-QUANT-KERNEL) for any direction. We show that there exists a set of O(ε^{-(d-1)/2}) deterministic points which approximate the expected width under the existential and locational models, and we provide efficient algorithms for constructing such coresets. We show, however, it is not always possible to find a subset of the original uncertain points which provides such an approximation. However, if the existential probability of each point is lower bounded by a constant, an ε-EXP-KERNEL is still possible. We also provide efficient algorithms for construct an (ε, τ)-QUANT-KERNEL coreset in nearly linear time. Our techniques utilize or connect to several important notions in probability and geometry, such as Kolmogorov distances, VC uniform convergence and Tukey depth, and may be useful in other geometric optimization problem in stochastic settings. Finally, combining with known techniques, we show a few applications to approximating the extent of uncertain functions, maintaining extent measures for stochastic moving points and some shape fitting problems under uncertainty

    Practical bounds on the error of Bayesian posterior approximations: A nonasymptotic approach

    Full text link
    Bayesian inference typically requires the computation of an approximation to the posterior distribution. An important requirement for an approximate Bayesian inference algorithm is to output high-accuracy posterior mean and uncertainty estimates. Classical Monte Carlo methods, particularly Markov Chain Monte Carlo, remain the gold standard for approximate Bayesian inference because they have a robust finite-sample theory and reliable convergence diagnostics. However, alternative methods, which are more scalable or apply to problems where Markov Chain Monte Carlo cannot be used, lack the same finite-data approximation theory and tools for evaluating their accuracy. In this work, we develop a flexible new approach to bounding the error of mean and uncertainty estimates of scalable inference algorithms. Our strategy is to control the estimation errors in terms of Wasserstein distance, then bound the Wasserstein distance via a generalized notion of Fisher distance. Unlike computing the Wasserstein distance, which requires access to the normalized posterior distribution, the Fisher distance is tractable to compute because it requires access only to the gradient of the log posterior density. We demonstrate the usefulness of our Fisher distance approach by deriving bounds on the Wasserstein error of the Laplace approximation and Hilbert coresets. We anticipate that our approach will be applicable to many other approximate inference methods such as the integrated Laplace approximation, variational inference, and approximate Bayesian computationComment: 22 pages, 2 figure

    Multi-Resolution Hashing for Fast Pairwise Summations

    Full text link
    A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector yy (query) that is unknown a priori. Given a set of points XRdX\subset \mathbb{R}^{d} and a pairwise function w:Rd×Rd[0,1]w:\mathbb{R}^{d}\times \mathbb{R}^{d}\to [0,1], we study the problem of designing a data-structure that enables sublinear-time approximation of the summation Zw(y)=1XxXw(x,y)Z_{w}(y)=\frac{1}{|X|}\sum_{x\in X}w(x,y) for any query yRdy\in \mathbb{R}^{d}. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data structures through hashing that reaches far beyond what previous techniques allowed. A key design principle is a collection of T1T\geq 1 hashing schemes with collision probabilities p1,,pTp_{1},\ldots, p_{T} such that supt[T]{pt(x,y)}=Θ(w(x,y))\sup_{t\in [T]}\{p_{t}(x,y)\} = \Theta(\sqrt{w(x,y)}). This leads to a data-structure that approximates Zw(y)Z_{w}(y) using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for any log-convex function w(x,y)=eϕ(x,y)w(x,y)=e^{\phi(\langle x,y\rangle)} of the inner product on the unit sphere x,ySd1x,y\in \mathcal{S}^{d-1}. Our method leads to data structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density or Partition Function Estimation. We provide extensions of our result from the sphere to Rd\mathbb{R}^{d} and from scalar functions to vector functions.Comment: 39 pages, 3 figure

    Probabilistic Smallest Enclosing Ball in High Dimensions via Subgradient Sampling

    Get PDF
    We study a variant of the median problem for a collection of point sets in high dimensions. This generalizes the geometric median as well as the (probabilistic) smallest enclosing ball (pSEB) problems. Our main objective and motivation is to improve the previously best algorithm for the pSEB problem by reducing its exponential dependence on the dimension to linear. This is achieved via a novel combination of sampling techniques for clustering problems in metric spaces with the framework of stochastic subgradient descent. As a result, the algorithm becomes applicable to shape fitting problems in Hilbert spaces of unbounded dimension via kernel functions. We present an exemplary application by extending the support vector data description (SVDD) shape fitting method to the probabilistic case. This is done by simulating the pSEB algorithm implicitly in the feature space induced by the kernel function