94,750 research outputs found
Robust EM algorithm for model-based curve clustering
Model-based clustering approaches concern the paradigm of exploratory data
analysis relying on the finite mixture model to automatically find a latent
structure governing observed data. They are one of the most popular and
successful approaches in cluster analysis. The mixture density estimation is
generally performed by maximizing the observed-data log-likelihood by using the
expectation-maximization (EM) algorithm. However, it is well-known that the EM
algorithm initialization is crucial. In addition, the standard EM algorithm
requires the number of clusters to be known a priori. Some solutions have been
provided in [31, 12] for model-based clustering with Gaussian mixture models
for multivariate data. In this paper we focus on model-based curve clustering
approaches, when the data are curves rather than vectorial data, based on
regression mixtures. We propose a new robust EM algorithm for clustering
curves. We extend the model-based clustering approach presented in [31] for
Gaussian mixture models, to the case of curve clustering by regression
mixtures, including polynomial regression mixtures as well as spline or
B-spline regressions mixtures. Our approach both handles the problem of
initialization and the one of choosing the optimal number of clusters as the EM
learning proceeds, rather than in a two-fold scheme. This is achieved by
optimizing a penalized log-likelihood criterion. A simulation study confirms
the potential benefit of the proposed algorithm in terms of robustness
regarding initialization and funding the actual number of clusters.Comment: In Proceedings of the 2013 International Joint Conference on Neural
Networks (IJCNN), 2013, Dallas, TX, US
The Role of Randomness and Noise in Strategic Classification
We investigate the problem of designing optimal classifiers in the strategic
classification setting, where the classification is part of a game in which
players can modify their features to attain a favorable classification outcome
(while incurring some cost). Previously, the problem has been considered from a
learning-theoretic perspective and from the algorithmic fairness perspective.
Our main contributions include 1. Showing that if the objective is to maximize
the efficiency of the classification process (defined as the accuracy of the
outcome minus the sunk cost of the qualified players manipulating their
features to gain a better outcome), then using randomized classifiers (that is,
ones where the probability of a given feature vector to be accepted by the
classifier is strictly between 0 and 1) is necessary. 2. Showing that in many
natural cases, the imposed optimal solution (in terms of efficiency) has the
structure where players never change their feature vectors (the randomized
classifier is structured in a way, such that the gain in the probability of
being classified as a 1 does not justify the expense of changing one's
features). 3. Observing that the randomized classification is not a stable
best-response from the classifier's viewpoint, and that the classifier doesn't
benefit from randomized classifiers without creating instability in the system.
4. Showing that in some cases, a noisier signal leads to better equilibria
outcomes -- improving both accuracy and fairness when more than one
subpopulation with different feature adjustment costs are involved. This is
interesting from a policy perspective, since it is hard to force institutions
to stick to a particular randomized classification strategy (especially in a
context of a market with multiple classifiers), but it is possible to alter the
information environment to make the feature signals inherently noisier.Comment: 22 pages. Appeared in FORC, 202
- …