23 research outputs found
A Review on Advanced Decision Trees for Efficient & Effective k-NN Classification
K Nearest Neighbor (KNN) strategy is a notable classification strategy in data mining and estimations in light of its direct execution and colossal arrangement execution. In any case, it is outlandish for ordinary KNN strategies to select settled k esteem to all tests. Past courses of action assign different k esteems to different test tests by the cross endorsement strategy however are typically tedious. This work proposes new KNN strategies, first is a KTree strategy to learn unique k esteems for different test or new cases, by including a training arrange in the KNN classification. This work additionally proposes a change rendition of KTree technique called K*Tree to speed its test organize by putting additional data of the training tests in the leaf node of KTree, for example, the training tests situated in the leaf node, their KNNs, and the closest neighbor of these KNNs. K*Tree, which empowers to lead KNN arrangement utilizing a subset of the training tests in the leaf node instead of all training tests utilized in the recently KNN techniques. This really reduces the cost of test organize
One-class classifiers based on entropic spanning graphs
One-class classifiers offer valuable tools to assess the presence of outliers
in data. In this paper, we propose a design methodology for one-class
classifiers based on entropic spanning graphs. Our approach takes into account
the possibility to process also non-numeric data by means of an embedding
procedure. The spanning graph is learned on the embedded input data and the
outcoming partition of vertices defines the classifier. The final partition is
derived by exploiting a criterion based on mutual information minimization.
Here, we compute the mutual information by using a convenient formulation
provided in terms of the -Jensen difference. Once training is
completed, in order to associate a confidence level with the classifier
decision, a graph-based fuzzy model is constructed. The fuzzification process
is based only on topological information of the vertices of the entropic
spanning graph. As such, the proposed one-class classifier is suitable also for
data characterized by complex geometric structures. We provide experiments on
well-known benchmarks containing both feature vectors and labeled graphs. In
addition, we apply the method to the protein solubility recognition problem by
considering several representations for the input samples. Experimental results
demonstrate the effectiveness and versatility of the proposed method with
respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification
Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN,
Vancouver, Canad
Dimensionality's blessing: Clustering images by underlying distribution
Many high dimensional vector distances tend to a constant. This is typically
considered a negative "contrast-loss" phenomenon that hinders clustering and
other machine learning techniques. We reinterpret "contrast-loss" as a
blessing. Re-deriving "contrast-loss" using the law of large numbers, we show
it results in a distribution's instances concentrating on a thin "hyper-shell".
The hollow center means apparently chaotically overlapping distributions are
actually intrinsically separable. We use this to develop
distribution-clustering, an elegant algorithm for grouping of data points by
their (unknown) underlying distribution. Distribution-clustering, creates
notably clean clusters from raw unlabeled data, estimates the number of
clusters for itself and is inherently robust to "outliers" which form their own
clusters. This enables trawling for patterns in unorganized data and may be the
key to enabling machine intelligence.Comment: Accepted in CVPR 201
On the Selection of Anchors and Targets for Video Hyperlinking
A problem not well understood in video hyperlinking is what qualifies a
fragment as an anchor or target. Ideally, anchors provide good starting points
for navigation, and targets supplement anchors with additional details while
not distracting users with irrelevant, false and redundant information. The
problem is not trivial for intertwining relationship between data
characteristics and user expectation. Imagine that in a large dataset, there
are clusters of fragments spreading over the feature space. The nature of each
cluster can be described by its size (implying popularity) and structure
(implying complexity). A principle way of hyperlinking can be carried out by
picking centers of clusters as anchors and from there reach out to targets
within or outside of clusters with consideration of neighborhood complexity.
The question is which fragments should be selected either as anchors or
targets, in one way to reflect the rich content of a dataset, and meanwhile to
minimize the risk of frustrating user experience. This paper provides some
insights to this question from the perspective of hubness and local intrinsic
dimensionality, which are two statistical properties in assessing the
popularity and complexity of data space. Based these properties, two novel
algorithms are proposed for low-risk automatic selection of anchors and
targets.Comment: ACM International Conference on Multimedia Retrieval (ICMR), 2017.
(Oral