Search CORE

2,615 research outputs found

Optimal Clustering under Uncertainty

Author: Benalcázar Marco E.
Dalton Lori A.
Dougherty Edward R.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Classical clustering algorithms typically either lack an underlying probability framework to make them predictive or focus on parameter estimation rather than defining and minimizing a notion of error. Recent work addresses these issues by developing a probabilistic framework based on the theory of random labeled point processes and characterizing a Bayes clusterer that minimizes the number of misclustered points. The Bayes clusterer is analogous to the Bayes classifier. Whereas determining a Bayes classifier requires full knowledge of the feature-label distribution, deriving a Bayes clusterer requires full knowledge of the point process. When uncertain of the point process, one would like to find a robust clusterer that is optimal over the uncertainty, just as one may find optimal robust classifiers with uncertain feature-label distributions. Herein, we derive an optimal robust clusterer by first finding an effective random point process that incorporates all randomness within its own probabilistic structure and from which a Bayes clusterer can be derived that provides an optimal robust clusterer relative to the uncertainty. This is analogous to the use of effective class-conditional distributions in robust classification. After evaluating the performance of robust clusterers in synthetic mixtures of Gaussians models, we apply the framework to granular imaging, where we make use of the asymptotic granulometric moment theory for granular images to relate robust clustering theory to the application.Comment: 19 pages, 5 eps figures, 1 tabl

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Generalized Species Sampling Priors with Latent Beta reinforcements

Author: Blei D.
Blei D.M.
Cardin N.
Curtis C.
Edoardo M. Airoldi
Fabrizio Leisen
Federico Bassetti
Ferguson J.D.
Fortini S.
Jara A.
Michele Guindani
Müller P.
Park J.H.
Sudderth E.B.
Sun W.
Thiago Costa
———
Publication venue
Publication date: 01/01/2014
Field of study

Many popular Bayesian nonparametric priors can be characterized in terms of exchangeable species sampling sequences. However, in some applications, exchangeability may not be appropriate. We introduce a {novel and probabilistically coherent family of non-exchangeable species sampling sequences characterized by a tractable predictive probability function with weights driven by a sequence of independent Beta random variables. We compare their theoretical clustering properties with those of the Dirichlet Process and the two parameters Poisson-Dirichlet process. The proposed construction provides a complete characterization of the joint process, differently from existing work. We then propose the use of such process as prior distribution in a hierarchical Bayes modeling framework, and we describe a Markov Chain Monte Carlo sampler for posterior inference. We evaluate the performance of the prior and the robustness of the resulting inference in a simulation study, providing a comparison with popular Dirichlet Processes mixtures and Hidden Markov Models. Finally, we develop an application to the detection of chromosomal aberrations in breast cancer by leveraging array CGH data.Comment: For correspondence purposes, Edoardo M. Airoldi's email is [email protected]; Federico Bassetti's email is [email protected]; Michele Guindani's email is [email protected] ; Fabrizo Leisen's email is [email protected]. To appear in the Journal of the American Statistical Associatio

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

PubMed Central

eScholarship - University of California

Kent Academic Repository

FigShare

Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations

Author: Arias-Castro
Band
Brad Jackson
Capra
Claerbout
Coram
Coram
Diaconis
Diaconis
Donoho
Donoho
Donoho
Du
Efron
Fenimore
Gelman
Hogg
Horvath
Jackson
James Chiang
Jay P. Norris
Jeffrey D. Scargle
McLean
Norris
Norris
Papoulis
Prahl
Qin
Scargle
Scargle
Schmidt
Tompkins
Tong
Way
Xie
Publication venue: 'IOP Publishing'
Publication date: 06/08/2012
Field of study

This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it - an improved and generalized version of Bayesian Blocks (Scargle 1998) - that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piecewise linear and piecewise exponential representations, multi-variate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by (Arias-Castro, Donoho and Huo 2003). In the spirit of Reproducible Research (Donoho et al. 2008) all of the code and data necessary to reproduce all of the figures in this paper are included as auxiliary material.Comment: Added some missing script files and updated other ancillary data (code and data files). To be submitted to the Astophysical Journa

arXiv.org e-Print Archive

Crossref

Boise State University - ScholarWorks

NASA Technical Reports Server

Multiscale change-point segmentation: beyond step functions.

Author: Guo Q.
Li H.
Munk A.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 21/01/2019
Field of study

Modern multiscale type segmentation methods are known to detect multiple change-points with high statistical accuracy, while allowing for fast computation. Underpinning (minimax) estimation theory has been developed mainly for models that assume the signal as a piecewise constant function. In this paper, for a large collection of multiscale segmentation methods (including various existing procedures), such theory will be extended to certain function classes beyond step functions in a nonparametric regression setting. This extends the interpretation of such methods on the one hand and on the other hand reveals these methods as robust to deviation from piecewise constant functions. Our main finding is the adaptation over nonlinear approximation classes for a universal thresholding, which includes bounded variation functions, and (piecewise) Holder functions of smoothness order 0 < alpha <= 1 as special cases. From this we derive statistical guarantees on feature detection in terms of jumps and modes. Another key finding is that these multiscale segmentation methods perform nearly (up to a log-factor) as well as the oracle piecewise constant segmentation estimator (with known jump locations), and the best piecewise constant approximants of the (unknown) true signal. Theoretical findings are examined by various numerical simulations

arXiv.org e-Print Archive

MPG.PuRe