8,316 research outputs found
Classification with the nearest neighbor rule in general finite dimensional spaces: necessary and sufficient conditions
Given an -sample of random vectors whose
joint law is unknown, the long-standing problem of supervised classification
aims to \textit{optimally} predict the label of a given a new observation
. In this context, the nearest neighbor rule is a popular flexible and
intuitive method in non-parametric situations.
Even if this algorithm is commonly used in the machine learning and
statistics communities, less is known about its prediction ability in general
finite dimensional spaces, especially when the support of the density of the
observations is . This paper is devoted to the study of the
statistical properties of the nearest neighbor rule in various situations. In
particular, attention is paid to the marginal law of , as well as the
smoothness and margin properties of the \textit{regression function} . We identify two necessary and sufficient conditions to
obtain uniform consistency rates of classification and to derive sharp
estimates in the case of the nearest neighbor rule. Some numerical experiments
are proposed at the end of the paper to help illustrate the discussion.Comment: 53 Pages, 3 figure
Local feature weighting in nearest prototype classification
The distance metric is the corner stone of nearest neighbor (NN)-based methods, and therefore, of nearest prototype (NP) algorithms. That is because they classify depending on the similarity of the data. When the data is characterized by a set of features which may contribute to the classification task in different levels, feature weighting or selection is required, sometimes in a local sense. However, local weighting is typically restricted to NN approaches. In this paper, we introduce local feature weighting (LFW) in NP classification. LFW provides each prototype its own weight vector, opposite to typical global weighting methods found in the NP literature, where all the prototypes share the same one. Providing each prototype its own weight vector has a novel effect in the borders of the Voronoi regions generated: They become nonlinear. We have integrated LFW with a previously developed evolutionary nearest prototype classifier (ENPC). The experiments performed both in artificial and real data sets demonstrate that the resulting algorithm that we call LFW in nearest prototype classification (LFW-NPC) avoids overfitting on training data in domains where the features may have different contribution to the classification task in different areas of the feature space. This generalization capability is also reflected in automatically obtaining an accurate and reduced set of prototypes.Publicad
Adaptive Graph via Multiple Kernel Learning for Nonnegative Matrix Factorization
Nonnegative Matrix Factorization (NMF) has been continuously evolving in
several areas like pattern recognition and information retrieval methods. It
factorizes a matrix into a product of 2 low-rank non-negative matrices that
will define parts-based, and linear representation of nonnegative data.
Recently, Graph regularized NMF (GrNMF) is proposed to find a compact
representation,which uncovers the hidden semantics and simultaneously respects
the intrinsic geometric structure. In GNMF, an affinity graph is constructed
from the original data space to encode the geometrical information. In this
paper, we propose a novel idea which engages a Multiple Kernel Learning
approach into refining the graph structure that reflects the factorization of
the matrix and the new data space. The GrNMF is improved by utilizing the graph
refined by the kernel learning, and then a novel kernel learning method is
introduced under the GrNMF framework. Our approach shows encouraging results of
the proposed algorithm in comparison to the state-of-the-art clustering
algorithms like NMF, GrNMF, SVD etc.Comment: This paper has been withdrawn by the author due to the terrible
writin
Feature Selection and Weighting by Nearest Neighbor Ensembles
In the field of statistical discrimination nearest neighbor methods are a well known, quite simple but successful nonparametric classification tool. In higher dimensions, however, predictive power normally deteriorates. In general, if some covariates are assumed to be noise variables, variable selection is a promising approach. The paper’s main focus is on the development and evaluation of a nearest neighbor ensemble with implicit variable selection. In contrast to other nearest neighbor approaches we are not primarily interested in classification, but in estimating the (posterior) class probabilities. In simulation studies and for real world data the proposed nearest neighbor ensemble is compared to an extended forward/backward variable selection procedure for nearest neighbor classifiers, and some alternative well established classification tools (that offer probability estimates as well). Despite its simple structure, the proposed method’s performance is quite good - especially if relevant covariates can be separated from noise variables. Another advantage of the presented ensemble is the easy identification of interactions that are usually hard to detect. So not simply variable selection but rather some kind of feature selection is performed.
The paper is a preprint of an article published in Chemometrics and Intelligent Laboratory Systems. Please use the journal version for citation
Clustering and classifying images with local and global variability
A procedure for clustering and classifying images determined by three classification
variables is presented. A measure of global variability based on the singular value
decomposition of the image matrices, and two average measures of local variability
based on spatial correlation and spatial changes. The performance of the procedure is
compared using three different databases
- …