7 research outputs found
Dont Just Divide; Polarize and Conquer!
In data containing heterogeneous subpopulations, classification performance
benefits from incorporating the knowledge of cluster structure in the
classifier. Previous methods for such combined clustering and classification
are either 1) classifier-specific and not generic, or 2) independently perform
clustering and classifier training, which may not form clusters that can
potentially benefit classifier performance. The question of how to perform
clustering to improve the performance of classifiers trained on the clusters
has received scant attention in previous literature, despite its importance in
several real-world applications. In this paper, we design a simple and
efficient classification algorithm called Clustering Aware Classification
(CAC), to find clusters that are well suited for being used as training
datasets by classifiers for each underlying subpopulation. Our experiments on
synthetic and real benchmark datasets demonstrate the efficacy of CAC over
previous methods for combined clustering and classification.Comment: 19 Pages, 5 figure
Z Distance Function for KNN Classification
This paper proposes a new distance metric function, called Z distance, for
KNN classification. The Z distance function is not a geometric direct-line
distance between two data points. It gives a consideration to the class
attribute of a training dataset when measuring the affinity between data
points. Concretely speaking, the Z distance of two data points includes their
class center distance and real distance. And its shape looks like "Z". In this
way, the affinity of two data points in the same class is always stronger than
that in different classes. Or, the intraclass data points are always closer
than those interclass data points. We evaluated the Z distance with
experiments, and demonstrated that the proposed distance function achieved
better performance in KNN classification
Parameter Free Large Margin Nearest Neighbor for Distance Metric Learning
We introduce a novel supervised metric learning algorithm named parameter free large margin nearest neighbor (PFLMNN) which can be seen as an improvement of the classical large margin nearest neighbor (LMNN) algorithm. The contributions of our work consist of two aspects. First, our method discards the costterm which shrinks the distances between inquiry input and its k target neighbors (the k nearest neighbors with same labels as inquiry input) in LMNN, and only focuses on improving the action to push the imposters (the samples with different labels form the inquiry input) apart out of the neighborhood of inquiry. As a result, our method does not have the parameter needed to tune on the validating set, which makes it more convenient to use. Second, by leveraging the geometry information of the imposters, we construct a novel cost function to penalize the smalldistances between each inquiry and its imposters. Different from LMNN considering every imposter located in the neighborhood of each inquiry, our method only takes care of the nearest imposters. Because when the nearest imposter is pushed out of the neighborhood of its inquiry, other imposters would be all out. In this way, the constraints in our model are much less than that of LMNN, which makes our method much easier to find the optimal distance metric. Consequently, our method not only learns a better distance metric than LMNN, but also runs faster than LMNN. Extensive experiments on different data sets with various sizes and difficulties are conducted, and the results have shown that, compared with LMNN, PFLMNN achieves better classification results