9,468 research outputs found
Semi-supervised cross-entropy clustering with information bottleneck constraint
In this paper, we propose a semi-supervised clustering method, CEC-IB, that
models data with a set of Gaussian distributions and that retrieves clusters
based on a partial labeling provided by the user (partition-level side
information). By combining the ideas from cross-entropy clustering (CEC) with
those from the information bottleneck method (IB), our method trades between
three conflicting goals: the accuracy with which the data set is modeled, the
simplicity of the model, and the consistency of the clustering with side
information. Experiments demonstrate that CEC-IB has a performance comparable
to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but
is faster, more robust to noisy labels, automatically determines the optimal
number of clusters, and performs well when not all classes are present in the
side information. Moreover, in contrast to other semi-supervised models, it can
be successfully applied in discovering natural subgroups if the partition-level
side information is derived from the top levels of a hierarchical clustering
Labeling the Features Not the Samples: Efficient Video Classification with Minimal Supervision
Feature selection is essential for effective visual recognition. We propose
an efficient joint classifier learning and feature selection method that
discovers sparse, compact representations of input features from a vast sea of
candidates, with an almost unsupervised formulation. Our method requires only
the following knowledge, which we call the \emph{feature sign}---whether or not
a particular feature has on average stronger values over positive samples than
over negatives. We show how this can be estimated using as few as a single
labeled training sample per class. Then, using these feature signs, we extend
an initial supervised learning problem into an (almost) unsupervised clustering
formulation that can incorporate new data without requiring ground truth
labels. Our method works both as a feature selection mechanism and as a fully
competitive classifier. It has important properties, low computational cost and
excellent accuracy, especially in difficult cases of very limited training
data. We experiment on large-scale recognition in video and show superior speed
and performance to established feature selection approaches such as AdaBoost,
Lasso, greedy forward-backward selection, and powerful classifiers such as SVM.Comment: arXiv admin note: text overlap with arXiv:1411.771
Temporal Model Adaptation for Person Re-Identification
Person re-identification is an open and challenging problem in computer
vision. Majority of the efforts have been spent either to design the best
feature representation or to learn the optimal matching metric. Most approaches
have neglected the problem of adapting the selected features or the learned
model over time. To address such a problem, we propose a temporal model
adaptation scheme with human in the loop. We first introduce a
similarity-dissimilarity learning method which can be trained in an incremental
fashion by means of a stochastic alternating directions methods of multipliers
optimization procedure. Then, to achieve temporal adaptation with limited human
effort, we exploit a graph-based approach to present the user only the most
informative probe-gallery matches that should be used to update the model.
Results on three datasets have shown that our approach performs on par or even
better than state-of-the-art approaches while reducing the manual pairwise
labeling effort by about 80%
Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications
Food authenticity studies are concerned with determining if food samples have been correctly labelled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification
performance on several high-dimensional multiclass food authenticity datasets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity datasets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins
- ā¦