72,031 research outputs found

    CrossMedia: Supporting Collaborative Research of Media Retrieval

    Get PDF
    AbstractThe goal of our new e-science platform is to support collaborative research communities by providing a simple solution to jointly develop semantic- and media search algorithms on common and challenging datasets processed by novel feature extractors. Querying of nearest neighbor (NN) elements on large data collections is an important task for several information or content retrieval tasks. In the paper a flexible framework for research purposes is introduced for testing features, metrics, distances and indexing structures. The core part of the content based retrieval system is the LHI-tree, a disk-based index scheme for fast retrieval of multimodal features. Additionally, we compare LHI-tree to FLANN, an effective implementation of ANN search and show that LHI-tree gives similar list of retrieved images

    VLSH: Voronoi-based Locality Sensitive Hashing

    Get PDF
    Abstract-We present a fast, yet accurate k-nearest neighbor search algorithm for high-dimensional sampling-based motion planners. Our technique is built on top of Locality Sensitive Hashing (LSH), but is extended to support arbitrary distance metrics used for motion planning problems and adapt irregular distributions of samples generated in the configuration space. To enable such novel characteristics our method embeds samples generated in the configuration space into a simple l2 norm space by using pivot points. We then implicitly define Voronoi regions and use local LSHs with varying quantization factors for those Voronoi regions. We have applied our method and other prior techniques to high-dimensional motion planning problems. Our method is able to show performance improvement by a factor of up to three times even with higher accuracy over prior, approximate nearest neighbor search techniques

    Enhancement of student performance prediction using modified K-nearest neighbor

    Get PDF
    The traditional K-nearest neighbor (KNN) algorithm uses an exhaustive search for a complete training set to predict a single test sample. This procedure can slow down the system to consume more time for huge datasets. The selection of classes for a new sample depends on a simple majority voting system that does not reflect the various significance of different samples (i.e. ignoring the similarities among samples). It also leads to a misclassification problem due to the occurrence of a double majority class. In reference to the above-mentioned issues, this work adopts a combination of moment descriptor and KNN to optimize the sample selection. This is done based on the fact that classifying the training samples before the searching actually takes place can speed up and improve the predictive performance of the nearest neighbor. The proposed method can be called as fast KNN (FKNN). The experimental results show that the proposed FKNN method decreases original KNN consuming time within a range of (75.4%) to (90.25%), and improve the classification accuracy percentage in the range from (20%) to (36.3%) utilizing three types of student datasets to predict whether the student can pass or fail the exam automatically

    Vector Quantization Techniques for Approximate Nearest Neighbor Search on Large-Scale Datasets

    Get PDF
    The technological developments of the last twenty years are leading the world to a new era. The invention of the internet, mobile phones and smart devices are resulting in an exponential increase in data. As the data is growing every day, finding similar patterns or matching samples to a query is no longer a simple task because of its computational costs and storage limitations. Special signal processing techniques are required in order to handle the growth in data, as simply adding more and more computers cannot keep up.Nearest neighbor search, or similarity search, proximity search or near item search is the problem of finding an item that is nearest or most similar to a query according to a distance or similarity measure. When the reference set is very large, or the distance or similarity calculation is complex, performing the nearest neighbor search can be computationally demanding. Considering today’s ever-growing datasets, where the cardinality of samples also keep increasing, a growing interest towards approximate methods has emerged in the research community.Vector Quantization for Approximate Nearest Neighbor Search (VQ for ANN) has proven to be one of the most efficient and successful methods targeting the aforementioned problem. It proposes to compress vectors into binary strings and approximate the distances between vectors using look-up tables. With this approach, the approximation of distances is very fast, while the storage space requirement of the dataset is minimized thanks to the extreme compression levels. The distance approximation performance of VQ for ANN has been shown to be sufficiently well for retrieval and classification tasks demonstrating that VQ for ANN techniques can be a good replacement for exact distance calculation methods.This thesis contributes to VQ for ANN literature by proposing five advanced techniques, which aim to provide fast and efficient approximate nearest neighbor search on very large-scale datasets. The proposed methods can be divided into two groups. The first group consists of two techniques, which propose to introduce subspace clustering to VQ for ANN. These methods are shown to give the state-of-the-art performance according to tests on prevalent large-scale benchmarks. The second group consists of three methods, which propose improvements on residual vector quantization. These methods are also shown to outperform their predecessors. Apart from these, a sixth contribution in this thesis is a demonstration of VQ for ANN in an application of image classification on large-scale datasets. It is shown that a k-NN classifier based on VQ for ANN performs on par with the k-NN classifiers, but requires much less storage space and computations

    Tree-Independent Dual-Tree Algorithms

    Full text link
    Dual-tree algorithms are a widely used class of branch-and-bound algorithms. Unfortunately, developing dual-tree algorithms for use with different trees and problems is often complex and burdensome. We introduce a four-part logical split: the tree, the traversal, the point-to-point base case, and the pruning rule. We provide a meta-algorithm which allows development of dual-tree algorithms in a tree-independent manner and easy extension to entirely new types of trees. Representations are provided for five common algorithms; for k-nearest neighbor search, this leads to a novel, tighter pruning bound. The meta-algorithm also allows straightforward extensions to massively parallel settings.Comment: accepted in ICML 201

    Accelerating Nearest Neighbor Search on Manycore Systems

    Full text link
    We develop methods for accelerating metric similarity search that are effective on modern hardware. Our algorithms factor into easily parallelizable components, making them simple to deploy and efficient on multicore CPUs and GPUs. Despite the simple structure of our algorithms, their search performance is provably sublinear in the size of the database, with a factor dependent only on its intrinsic dimensionality. We demonstrate that our methods provide substantial speedups on a range of datasets and hardware platforms. In particular, we present results on a 48-core server machine, on graphics hardware, and on a multicore desktop
    corecore