192 research outputs found

    Balancing clusters to reduce response time variability in large scale image search

    Get PDF
    Many algorithms for approximate nearest neighbor search in high-dimensional spaces partition the data into clusters. At query time, in order to avoid exhaustive search, an index selects the few (or a single) clusters nearest to the query point. Clusters are often produced by the well-known kk-means approach since it has several desirable properties. On the downside, it tends to produce clusters having quite different cardinalities. Imbalanced clusters negatively impact both the variance and the expectation of query response times. This paper proposes to modify kk-means centroids to produce clusters with more comparable sizes without sacrificing the desirable properties. Experiments with a large scale collection of image descriptors show that our algorithm significantly reduces the variance of response times without seriously impacting the search quality

    Privacy-Preserving Outsourced Media Search

    Get PDF
    International audienceThis work proposes a privacy-protection framework for an important application called outsourced media search. This scenario involves a data owner, a client, and an untrusted server, where the owner outsources a search service to the server. Due to lack of trust, the privacy of the client and the owner should be protected. The framework relies on multimedia hashing and symmetric encryption. It requires involved parties to participate in a privacy-enhancing protocol. Additional processing steps are carried out by the owner and the client: (i) before outsourcing low-level media features to the server, the owner has to one-way hash them, and partially encrypt each hash-value; (ii) the client completes the similarity search by re-ranking the most similar candidates received from the server. One-way hashing and encryption add ambiguity to data and make it difficult for the server to infer contents from database items and queries, so the privacy of both the owner and the client is enforced. The proposed framework realizes trade-offs among strength of privacy enforcement, quality of search, and complexity, because the information loss can be tuned during hashing and encryption. Extensive experiments demonstrate the effectiveness and the flexibility of the framework

    Scalability of the NV-tree: Three Experiments

    Get PDF
    International audienceThe NV-tree is a scalable approximate high-dimensional indexing method specifically designed for large-scale visual instance search. In this paper, we report on three experiments designed to evaluate the performance of the NV-tree. Two of these experiments embed standard benchmarks within collections of up to 28.5 billion features, representing the largest single-server collection ever reported in the literature. The results show that indeed the NV-tree performs very well for visual instance search applications over large-scale collections

    Secure and Efficient Approximate Nearest Neighbors Search

    Get PDF
    International audienceThis paper presents a moderately secure but very efficient approximate nearest neighbors search. After detailing the threats pertaining to the "honest but curious" model, our approach starts from a state-of-the-art algorithm in the domain of approximate nearest neighbors search. We gradually develop mechanisms partially blocking the attacks threatening the original algorithm. The loss of performances compared to the original algorithm is mainly an overhead of a constant computation time and communication payload which are independent of the size of the database

    Searching in one billion vectors: re-rank with source coding

    Get PDF
    International audienceRecent indexing techniques inspired by source coding have been shown successful to index billions of high-dimensional vectors in memory. In this paper, we propose an approach that re-ranks the neighbor hypotheses obtained by these compressed-domain indexing methods. In contrast to the usual post-verification scheme, which performs exact distance calculation on the short-list of hypotheses, the estimated distances are refined based on short quantization codes, to avoid reading the full vectors from disk. We have released a new public dataset of one billion 128-dimensional vectors and proposed an experimental setup to evaluate high dimensional indexing algorithms on a realistic scale. Experiments show that our method accurately and efficiently re-ranks the neighbor hypotheses using little memory compared to the full vectors representation

    Understanding the Security and Robustness of SIFT

    Get PDF
    Many content-based retrieval systems (CBIRS) describe images using the SIFT local features because they provide very robust recognition capabilities. While SIFT features proved to cope with a wide spectrum of general purpose image distortions, its security has not fully been assessed yet. Hsu \emph{et al.} in~\cite{hsu09:_secur_robus_sift} show that very specific anti-SIFT attacks can jeopardize the keypoint detection. These attacks can delude systems using SIFT targeting application such as image authentication and (pirated) copy detection. Having some expertise in CBIRS, we were extremely concerned by their analysis. This paper presents our own investigations on the impact of these anti-SIFT attacks on a real CBIRS indexing a large collection of images. The attacks are indeed not able to break the system. A detailed analysis explains this assessment

    Object grouping in EOS

    Get PDF
    Projet RODINEos is an environment for building distributed object-based systems. Leos, the language for Eos, provides transparency for distribution and persistence. In this paper, we address the problem of declustering the object graph into a number of nodes and of locally clustering objects within pages with minimal impact on the programming process. We propose a grouping model which on the one hand achieves full transparency. The grouping is dynamically achieved by the run-time system as directed by user-provided hints. This dynamic object grouping copes automatically with evolutions of the object graph. The implementation incurs little overhead it is a side-effect of garbage collection. On the other hand, our model supplies Eos users with an explicit and fine control over data and computation placement so they can load balance the overall system

    Dynamicity and Durability in Scalable Visual Instance Search.

    Get PDF
    Visual instance search involves retrieving from a collection of images the ones that contain an instance of a visual query. Systems designed for visual instance search face the major challenge of scalability: a collection of a few million images used for instance search typically creates a few billion features that must be indexed. Furthermore, as real image collections grow rapidly, systems must also provide dynamicity, i.e., be able to handle on-line insertions while concurrently serving retrieval operations. Durability, which is the ability to recover correctly from software and hardware crashes, is the natural complement of dynamicity. Durability, however, has rarely been integrated within scalable and dynamic high-dimensional indexing solutions. This article addresses the issue of dynamicity and durability for scalable indexing of very large and rapidly growing collections of local features for instance retrieval. By extending the NV-tree, a scalable disk-based high-dimensional index, we show how to implement the ACID properties of transactions which ensure both dynamicity and durability. We present a detailed performance evaluation of the transactional NV-tree: (i) We show that the insertion throughput is excellent despite the overhead for enforcing the ACID properties; (ii) We also show that this transactional index is truly scalable using a standard image benchmark embedded in collections of up to 28.5 billion high-dimensional vectors; the largest single-server evaluations reported in the literature

    On Competitiveness of Nearest-Neighbor-Based Music Classification: A Methodological Critique

    Get PDF
    International audienceThe traditional role of nearest-neighbor classification in musicclassification research is that of a straw man opponent for the learningapproach of the hour. Recent work in high-dimensional indexinghas shown that approximate nearest-neighbor algorithms are extremelyscalable, yielding results of reasonable quality from billions of high-dimensionalfeatures. With such efficient large-scale classifiers, the traditionalmusic classification methodology of reducing both feature dimensionalityand feature quantity is incorrect; instead the approximatenearest-neighbor classifier should be given an extensive data collectionto work with. We present a case study, using a well-known MIR classificationbenchmark with well-known music features, which shows thata simple nearest-neighbor classifier performs very competitively whengiven ample data. In this position paper, we therefore argue that nearest-neighborclassification has been treated unfairly in the literature and maybe much more competitive than previously thought

    Challenging the Security of CBIR Systems

    Get PDF
    Content-Based Image Retrieval Systems are now commonly used as a filtering mechanism against the piracy of multimedia contents. Many publications in the last few years have proposed very robust schemes where pirated contents are detected despite severe modifications. But none of these systems have addressed the piracy problem from a \emph{security} perspective. It is now time to check whether they are secure: Can pirates mount violent attacks against CBIRS by carefully studying the technology they use? This paper analyzes the security flaws of the typical technology blocks used in state-of-the-art CBIRS and shows it is possible to delude systems, making them useless in practice
    corecore