325 research outputs found

    Accurate and Fast Retrieval for Complex Non-metric Data via Neighborhood Graphs

    Full text link
    We demonstrate that a graph-based search algorithm-relying on the construction of an approximate neighborhood graph-can directly work with challenging non-metric and/or non-symmetric distances without resorting to metric-space mapping and/or distance symmetrization, which, in turn, lead to substantial performance degradation. Although the straightforward metrization and symmetrization is usually ineffective, we find that constructing an index using a modified, e.g., symmetrized, distance can improve performance. This observation paves a way to a new line of research of designing index-specific graph-construction distance functions

    The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search

    Get PDF
    This paper reconsiders common benchmarking approaches to nearest neighbor search. It is shown that the concept of local intrinsic dimensionality (LID) allows to choose query sets of a wide range of difficulty for real-world datasets. Moreover, the effect of different LID distributions on the running time performance of implementations is empirically studied. To this end, different visualization concepts are introduced that allow to get a more fine-grained overview of the inner workings of nearest neighbor search principles. The paper closes with remarks about the diversity of datasets commonly used for nearest neighbor search benchmarking. It is shown that such real-world datasets are not diverse: results on a single dataset predict results on all other datasets well.Comment: Preprint of the paper accepted at SISAP 201

    High-Dimensional Similarity Search with Quantum-Assisted Variational Autoencoder

    Full text link
    Recent progress in quantum algorithms and hardware indicates the potential importance of quantum computing in the near future. However, finding suitable application areas remains an active area of research. Quantum machine learning is touted as a potential approach to demonstrate quantum advantage within both the gate-model and the adiabatic schemes. For instance, the Quantum-assisted Variational Autoencoder has been proposed as a quantum enhancement to the discrete VAE. We extend on previous work and study the real-world applicability of a QVAE by presenting a proof-of-concept for similarity search in large-scale high-dimensional datasets. While exact and fast similarity search algorithms are available for low dimensional datasets, scaling to high-dimensional data is non-trivial. We show how to construct a space-efficient search index based on the latent space representation of a QVAE. Our experiments show a correlation between the Hamming distance in the embedded space and the Euclidean distance in the original space on the Moderate Resolution Imaging Spectroradiometer (MODIS) dataset. Further, we find real-world speedups compared to linear search and demonstrate memory-efficient scaling to half a billion data points

    PRILJ: an efficient two-step method based on embedding and clustering for the identification of regularities in legal case judgments

    Get PDF
    In an era characterized by fast technological progress that introduces new unpredictable scenarios every day, working in the law field may appear very difficult, if not supported by the right tools. In this respect, some systems based on Artificial Intelligence methods have been proposed in the literature, to support several tasks in the legal sector. Following this line of research, in this paper we propose a novel method, called PRILJ, that identifies paragraph regularities in legal case judgments, to support legal experts during the redaction of legal documents. Methodologically, PRILJ adopts a two-step approach that first groups documents into clusters, according to their semantic content, and then identifies regularities in the paragraphs for each cluster. Embedding-based methods are adopted to properly represent documents and paragraphs into a semantic numerical feature space, and an Approximated Nearest Neighbor Search method is adopted to efficiently retrieve the most similar paragraphs with respect to the paragraphs of a document under preparation. Our extensive experimental evaluation, performed on a real-world dataset provided by EUR-Lex, proves the effectiveness and the efficiency of the proposed method. In particular, its ability of modeling different topics of legal documents, as well as of capturing the semantics of the textual content, appear very beneficial for the considered task, and make PRILJ very robust to the possible presence of noise in the data
    • …
    corecore