31,909 research outputs found
Optimal Clustering under Uncertainty
Classical clustering algorithms typically either lack an underlying
probability framework to make them predictive or focus on parameter estimation
rather than defining and minimizing a notion of error. Recent work addresses
these issues by developing a probabilistic framework based on the theory of
random labeled point processes and characterizing a Bayes clusterer that
minimizes the number of misclustered points. The Bayes clusterer is analogous
to the Bayes classifier. Whereas determining a Bayes classifier requires full
knowledge of the feature-label distribution, deriving a Bayes clusterer
requires full knowledge of the point process. When uncertain of the point
process, one would like to find a robust clusterer that is optimal over the
uncertainty, just as one may find optimal robust classifiers with uncertain
feature-label distributions. Herein, we derive an optimal robust clusterer by
first finding an effective random point process that incorporates all
randomness within its own probabilistic structure and from which a Bayes
clusterer can be derived that provides an optimal robust clusterer relative to
the uncertainty. This is analogous to the use of effective class-conditional
distributions in robust classification. After evaluating the performance of
robust clusterers in synthetic mixtures of Gaussians models, we apply the
framework to granular imaging, where we make use of the asymptotic
granulometric moment theory for granular images to relate robust clustering
theory to the application.Comment: 19 pages, 5 eps figures, 1 tabl
Asymptotic Theory for Clustered Samples
We provide a complete asymptotic distribution theory for clustered data with
a large number of independent groups, generalizing the classic laws of large
numbers, uniform laws, central limit theory, and clustered covariance matrix
estimation. Our theory allows for clustered observations with heterogeneous and
unbounded cluster sizes. Our conditions cleanly nest the classical results for
i.n.i.d. observations, in the sense that our conditions specialize to the
classical conditions under independent sampling. We use this theory to develop
a full asymptotic distribution theory for estimation based on linear
least-squares, 2SLS, nonlinear MLE, and nonlinear GMM
- ā¦