4,195 research outputs found

    Near-Neighbor Preserving Dimension Reduction for Doubling Subsets of l_1

    Get PDF
    Randomized dimensionality reduction has been recognized as one of the fundamental techniques in handling high-dimensional data. Starting with the celebrated Johnson-Lindenstrauss Lemma, such reductions have been studied in depth for the Euclidean (l_2) metric, but much less for the Manhattan (l_1) metric. Our primary motivation is the approximate nearest neighbor problem in l_1. We exploit its reduction to the decision-with-witness version, called approximate near neighbor, which incurs a roughly logarithmic overhead. In 2007, Indyk and Naor, in the context of approximate nearest neighbors, introduced the notion of nearest neighbor-preserving embeddings. These are randomized embeddings between two metric spaces with guaranteed bounded distortion only for the distances between a query point and a point set. Such embeddings are known to exist for both l_2 and l_1 metrics, as well as for doubling subsets of l_2. The case that remained open were doubling subsets of l_1. In this paper, we propose a dimension reduction by means of a near neighbor-preserving embedding for doubling subsets of l_1. Our approach is to represent the pointset with a carefully chosen covering set, then randomly project the latter. We study two types of covering sets: c-approximate r-nets and randomly shifted grids, and we discuss the tradeoff between them in terms of preprocessing time and target dimension. We employ Cauchy variables: certain concentration bounds derived should be of independent interest

    Video Feature Extraction Based on Modified LLE Using Adaptive Nearest Neighbor Approach

    Full text link
    Locally linear embedding (LLE) is an unsupervised learning algorithm which computes the low dimensional, neighborhood preserving embeddings of high dimensional data. LLE attempts to discover non-linear structure in high dimensional data by exploiting the local symmetries of linear reconstructions. In this paper, video feature extraction is done using modified LLE alongwith adaptive nearest neighbor approach to find the nearest neighbor and the connected components. The proposed feature extraction method is applied to a video. The video feature description gives a new tool for analysis of video

    Dimensionality reduction for approximate near neighbor search in the Manhattan metric

    Get PDF
    Οι τυχαίες προβολές αποτελούν μια απο τις πιο διαδεδομένες μεθόδους για το χειρισμό δεδομένων μεγάλης διάστασης. Ξεκινώντας από το περίφημο Johnson-Lindenstrauss Lemma, τέτοιου είδους προβολές έχουν μελετηθεί αρκετά για την Ευκλείδια (L2) μετρική, και πολύ λιγότερο για τη μετρική Μανχάταν (L1). Σε αυτή την εργασία εστιάζουμε στο πρόβλημα του προσεγγιστικού κοντινότερου γείτονα στη μετρική Μανχάταν, εκμεταλλεύοντας την αποφαντική εκδοχή του προβλήματος, που λέγεται προσεγγιστικός κοντινός γείτονας και επιβάλει ένα (περίπου) λογαριθμικό κόστος. Το 2007, οι Indyk και Naor εισήγαγαν την έννοια των εμβυθίσεων που διατηρούν τον κοντινότερο γείτονα (nearest neighbor-preserving embeddings). Οι εμβυθίσεις αυτές είναι τυχαιοκρατικές και εγγυόνται για την αλλοίωση μόνο N αποστάσεων (μεταξύ ενός σημείου-query και N σημείων), αντί για όλα τις δυνατές O(N^2). Τέτοιου είδους εμβυθίσεις υπάρχουν για τις μετρικές L2 και L1, καθώς και για διπλασιάζοντα (doubling) υποσύνολα της L2. Σε αυτή την εργασία παρουσιάζουμε μια συνάρτηση εμβύθισης για την μείωση διάστασης, η οποία διατηρεί τον κοντινό γείτονα (near neighbor-preserving) για διπλασιάζοντα υποσύνολα της L1. Η τεχνική που εφαρμόζουμε είναι να προβάλουμε τυχαία όχι τα ίδια τα σημεία, αλλά ένα σύνολο αντιπροσώπων τους. Μελετούμε δύο είδη αντιπροσώπων, τα approximate nets και τα randomly shifted grids, και τα συγκρίνουμε ως προς την νέα διάσταση και το χρόνο υπολογισμού της συνάρτησης εμβύθισης.The approximate nearest neighbor problem is one of the fundamental problems in computational geometry and has received much attention during the past decades. Efficient and practical algorithms are known for data sets of low dimension. However, modern, high-dimensional data cannot be handled by these algorithms, because of the so called "curse of dimensionality". A new theory for approximate nearest neighbors in high dimensions emerged with an influential paper by Indyk and Motwani, in 1998, yielding algorithms that depend polynomially on the dimension. Nevertheless, is has been realized that designing efficient ANN data structures is closely related with dimension-reducing embeddings. One popular dimension reduction technique is randomized projections. Starting with the celebrated Johnson-Lindenstrauss Lemma, such projections have been studied in depth for the Euclidean (L2) metric and, much less, for the Manhattan (L1) metric. In 2007, Indyk and Naor, in the context of approximate nearest neighbors, introduced the notion of nearest neighbor-preserving embeddings. These are randomized embeddings between two metric spaces with guaranteed bounded distortion only for the distances between a query point and a point set. Such embeddings are known to exist for both L2 and L1 metrics, as well as for doubling subsets of L2. In this thesis, we consider the approximate nearest neighbor problem in doubling subsets of L1. We exploit the decision-with-witness version, called approximate near neighbor, which incurs a roughly logarithmic overhead, and we propose a dimension reducing, near neighbor-preserving embedding for doubling subsets of L1. Our approach is to represent the point set with a carefully chosen covering set, and then apply a random linear projection to that covering set, using a matrix of Cauchy random variables. We study two cases of covering sets: approximate nets and randomly shifted grids, and we discuss the differences between them in terms of computing time, target dimension, as well as their algorithmic implications

    Efficient Nearest Neighbor Classification Using a Cascade of Approximate Similarity Measures

    Full text link
    Nearest neighbor classification using shape context can yield highly accurate results in a number of recognition problems. Unfortunately, the approach can be too slow for practical applications, and thus approximation strategies are needed to make shape context practical. This paper proposes a method for efficient and accurate nearest neighbor classification in non-Euclidean spaces, such as the space induced by the shape context measure. First, a method is introduced for constructing a Euclidean embedding that is optimized for nearest neighbor classification accuracy. Using that embedding, multiple approximations of the underlying non-Euclidean similarity measure are obtained, at different levels of accuracy and efficiency. The approximations are automatically combined to form a cascade classifier, which applies the slower approximations only to the hardest cases. Unlike typical cascade-of-classifiers approaches, that are applied to binary classification problems, our method constructs a cascade for a multiclass problem. Experiments with a standard shape data set indicate that a two-to-three order of magnitude speed up is gained over the standard shape context classifier, with minimal losses in classification accuracy.National Science Foundation (IIS-0308213, IIS-0329009, EIA-0202067); Office of Naval Research (N00014-03-1-0108
    corecore