Search CORE

290 research outputs found

Spreading vectors for similarity search

Author: Douze Matthijs
Jégou Hervé
Sablayrolles Alexandre
Schmid Cordelia
Publication venue
Publication date: 06/05/2019
Field of study

Discretizing multi-dimensional data distributions is a fundamental step of modern indexing methods. State-of-the-art techniques learn parameters of quantizers on training data for optimal performance, thus adapting quantizers to the data. In this work, we propose to reverse this paradigm and adapt the data to the quantizer: we train a neural net which last layer forms a fixed parameter-free quantizer, such as pre-defined points of a hyper-sphere. As a proxy objective, we design and train a neural network that favors uniformity in the spherical latent space, while preserving the neighborhood structure after the mapping. We propose a new regularizer derived from the Kozachenko--Leonenko differential entropy estimator to enforce uniformity and combine it with a locality-aware triplet loss. Experiments show that our end-to-end approach outperforms most learned quantization methods, and is competitive with the state of the art on widely adopted benchmarks. Furthermore, we show that training without the quantization step results in almost no difference in accuracy, but yields a generic catalyzer that can be applied with any subsequent quantizer.Comment: Published at ICLR 201

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Unsupervised Graph-based Rank Aggregation for Improved Retrieval

Author: Dourado Icaro Cavalcante
Pedronette Daniel Carlos Guimarães
Torres Ricardo da Silva
Publication venue: 'Elsevier BV'
Publication date: 18/03/2019
Field of study

This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or hybrid content representations. We reformulate the ad-hoc retrieval problem as a document retrieval based on fusion graphs, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we claim that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Another contribution is that our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking, without further computations and post-processing steps over the graphs. Based on the graphs, a novel similarity retrieval score is formulated using an efficient computation of minimum common subgraphs. Finally, another benefit over existing approaches is the absence of hyperparameters. A comprehensive experimental evaluation was conducted considering diverse well-known public datasets, composed of textual, image, and multimodal documents. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused, thus demonstrating the successful capability of the proposal in representing queries based on a unified graph-based model of rank fusions

arXiv.org e-Print Archive

Repositorio da Producao Cientifica e Intelectual da Unicamp

Search in WikiImages using mobile phone

Author: Havasi László Rajmund
Kovács László
Pataki Máté
Szabó Mihály
Szirányi Tamás
Varga Domonkos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2013
Field of study

Demonstration will focus on the content based retrieval of Wikipedia images (Hungarian version). A mobile application for iOS will be used to gather images and send directly to the crossmodal processing framework. Searching is implemented in a high performance hybrid index tree with total ~500k entries. The hit list is converted to wikipages and ordered by the content based score

Crossref

SZTAKI Publication Repository

Building K-Anonymous User Cohorts with\\ Consecutive Consistent Weighted Sampling (CCWS)

Author: Li Ping
Li Xiaoyun
Zhao Weijie
Zheng Xinyi
Publication venue
Publication date: 26/04/2023
Field of study

To retrieve personalized campaigns and creatives while protecting user privacy, digital advertising is shifting from member-based identity to cohort-based identity. Under such identity regime, an accurate and efficient cohort building algorithm is desired to group users with similar characteristics. In this paper, we propose a scalable

K

-anonymous cohort building algorithm called {\em consecutive consistent weighted sampling} (CCWS). The proposed method combines the spirit of the (

p

-powered) consistent weighted sampling and hierarchical clustering, so that the

K

-anonymity is ensured by enforcing a lower bound on the size of cohorts. Evaluations on a LinkedIn dataset consisting of

>70

M users and ads campaigns demonstrate that CCWS achieves substantial improvements over several hashing-based methods including sign random projections (SignRP), minwise hashing (MinHash), as well as the vanilla CWS

arXiv.org e-Print Archive

Packing bag-of-features

Author: Cordelia Schmid
Hervé Jégou
Matthijs Douze
Publication venue
Publication date: 01/01/2009
Field of study

One of the main limitations of image search based on bag-of-features is the memory usage per image. Only a few million images can be handled on a single machine in reasonable response time. In this paper, we first evaluate how the memory usage is reduced by using lossless index compression. We then propose an approximate representation of bag-of-features obtained by projecting the corresponding histogram onto a set of pre-defined sparse projection functions, producing several image descriptors. Coupled with a proper indexing structure, an image is represented by a few hundred bytes. A distance expectation criterion is then used to rank the images. Our method is at least one order of magnitude faster than standard bag-of-features while providing excellent search quality. 1

HAL-CentraleSupelec

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

LHI-tree: An efficient disk-based image search application

Author: Havasi László
Szirányi Tamás
Varga Domonkos István
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Crossref

SZTAKI Publication Repository

An adaptive palette reordering method for compressing color-indexed image

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Crossref