192,409 research outputs found

    Using Choquet integrals for kNN approximation and classification

    Full text link
    k-nearest neighbors (kNN) is a popular method for function approximation and classification. One drawback of this method is that the nearest neighbors can be all located on one side of the point in question x. An alternative natural neighbors method is expensive for more than three variables. In this paper we propose the use of the discrete Choquet integral for combining the values of the nearest neighbors so that redundant information is canceled out. We design a fuzzy measure based on location of the nearest neighbors, which favors neighbors located all around x. <br /

    A Graph-Based Semi-Supervised k Nearest-Neighbor Method for Nonlinear Manifold Distributed Data Classification

    Get PDF
    kk Nearest Neighbors (kkNN) is one of the most widely used supervised learning algorithms to classify Gaussian distributed data, but it does not achieve good results when it is applied to nonlinear manifold distributed data, especially when a very limited amount of labeled samples are available. In this paper, we propose a new graph-based kkNN algorithm which can effectively handle both Gaussian distributed data and nonlinear manifold distributed data. To achieve this goal, we first propose a constrained Tired Random Walk (TRW) by constructing an RR-level nearest-neighbor strengthened tree over the graph, and then compute a TRW matrix for similarity measurement purposes. After this, the nearest neighbors are identified according to the TRW matrix and the class label of a query point is determined by the sum of all the TRW weights of its nearest neighbors. To deal with online situations, we also propose a new algorithm to handle sequential samples based a local neighborhood reconstruction. Comparison experiments are conducted on both synthetic data sets and real-world data sets to demonstrate the validity of the proposed new kkNN algorithm and its improvements to other version of kkNN algorithms. Given the widespread appearance of manifold structures in real-world problems and the popularity of the traditional kkNN algorithm, the proposed manifold version kkNN shows promising potential for classifying manifold-distributed data.Comment: 32 pages, 12 figures, 7 table

    Secure kk-ish Nearest Neighbors Classifier

    Get PDF
    In machine learning, classifiers are used to predict a class of a given query based on an existing (classified) database. Given a database S of n d-dimensional points and a d-dimensional query q, the k-nearest neighbors (kNN) classifier assigns q with the majority class of its k nearest neighbors in S. In the secure version of kNN, S and q are owned by two different parties that do not want to share their data. Unfortunately, all known solutions for secure kNN either require a large communication complexity between the parties, or are very inefficient to run. In this work we present a classifier based on kNN, that can be implemented efficiently with homomorphic encryption (HE). The efficiency of our classifier comes from a relaxation we make on kNN, where we allow it to consider kappa nearest neighbors for kappa ~ k with some probability. We therefore call our classifier k-ish Nearest Neighbors (k-ish NN). The success probability of our solution depends on the distribution of the distances from q to S and increase as its statistical distance to Gaussian decrease. To implement our classifier we introduce the concept of double-blinded coin-toss. In a doubly-blinded coin-toss the success probability as well as the output of the toss are encrypted. We use this coin-toss to efficiently approximate the average and variance of the distances from q to S. We believe these two techniques may be of independent interest. When implemented with HE, the k-ish NN has a circuit depth that is independent of n, therefore making it scalable. We also implemented our classifier in an open source library based on HELib and tested it on a breast tumor database. The accuracy of our classifier (F_1 score) were 98\% and classification took less than 3 hours compared to (estimated) weeks in current HE implementations

    Two-spin subsystem entanglement in spin 1/2 rings with long range interactions

    Full text link
    We consider the two-spin subsystem entanglement for eigenstates of the Hamiltonian H=1j<kN(1rj,k)ασjσk H= \sum_{1\leq j< k \leq N} (\frac{1}{r_{j,k}})^{\alpha} {\mathbf \sigma}_j\cdot {\mathbf \sigma}_k for a ring of NN spins 1/2 with asssociated spin vector operator (/2)σj(\hbar /2){\bf \sigma}_j for the jj-th spin. Here rj,kr_{j,k} is the chord-distance betwen sites jj and kk. The case α=2\alpha =2 corresponds to the solvable Haldane-Shastry model whose spectrum has very high degeneracies not present for α2\alpha \neq 2. Two spin subsystem entanglement shows high sensistivity and distinguishes α=2\alpha =2 from α2\alpha \neq 2. There is no entanglement beyond nearest neighbors for all eigenstates when α=2\alpha =2. Whereas for α2\alpha \neq 2 one has selective entanglement at any distance for eigenstates of sufficiently high energy in a certain interval of α\alpha which depends on the energy. The ground state (which is a singlet only for even NN) does not have entanglement beyond nearest neighbors, and the nearest neighbor entanglement is virtually independent of the range of the interaction controlled by α\alpha.Comment: 16 figure

    Solving All-k-Nearest Neighbor Problem without an Index

    Get PDF
    Among the similarity queries in metric spaces, there are one that obtains the k-nearest neighbors of all the elements in the database (All-k-NN). One way to solve it is the naïve one: comparing each object in the database with all the other ones and returning the k elements nearest to it (k-NN). Another way to do this is by preprocessing the database to build an index, and then searching on this index for the k-NN of each element of the dataset. Answering to the All-k-NN problem allows to build the k-Nearest Neighbor graph (kNNG). Given an object collection of a metric space, the Nearest Neighbor Graph (NNG) associates each node with its closest neighbor under the given metric. If we link each object to their k nearest neighbors, we obtain the k Nearest Neighbor Graph (kNNG).The kNNG can be considered an index for a database, which is quite efficient and can allow improvements. In this work, we propose a new technique to solve the All-k-NN problem which do not use any index to obtain the k-NN of each element. This approach solves the problem avoiding as many comparisons as possible, only comparing some database elements and taking advantage of the distance function properties. Its total cost is significantly lower than that of the naïve solution.XVI Workshop Bases de Datos y Minería de Datos.Red de Universidades con Carreras en Informátic

    CSD: Discriminance with Conic Section for Improving Reverse k Nearest Neighbors Queries

    Full text link
    The reverse kk nearest neighbor (RkkNN) query finds all points that have the query point as one of their kk nearest neighbors (kkNN), where the kkNN query finds the kk closest points to its query point. Based on the characteristics of conic section, we propose a discriminance, named CSD (Conic Section Discriminance), to determine points whether belong to the RkkNN set without issuing any queries with non-constant computational complexity. By using CSD, we also implement an efficient RkkNN algorithm CSD-RkkNN with a computational complexity at O(k1.5logk)O(k^{1.5}\cdot log\,k). The comparative experiments are conducted between CSD-RkkNN and other two state-of-the-art RkNN algorithms, SLICE and VR-RkkNN. The experimental results indicate that the efficiency of CSD-RkkNN is significantly higher than its competitors
    corecore