Search CORE

20 research outputs found

The Power of Asymmetry in Binary Hashing

Author: Makarychev Yury
Neyshabur Behnam
Salakhutdinov Ruslan
Srebro Nathan
Yadollahpour Payman
Publication venue
Publication date: 29/11/2013
Field of study

When approximating binary similarity using the hamming distance between short binary hashes, we show that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps. I.e. by approximating the similarity between

x

and

x'

as the hamming distance between

f(x)

and

g(x')

, for two distinct binary codes

f,g

, rather than as the hamming distance between

f(x)

and

f(x')

.Comment: Accepted to NIPS 2013, 9 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Anti-sparse coding for approximate nearest neighbor search

Author: Hervé Jégou
Hervé Jégou
Jean-jacques Fuchs
Jean-jacques Fuchs
Teddy Furon
Teddy Furon
Équipe-projet Texmex
Publication venue
Publication date: 01/01/2011
Field of study

This paper proposes a binarization scheme for vectors of high dimension based on the recent concept of anti-sparse coding, and shows its excellent performance for approximate nearest neighbor search. Unlike other binarization schemes, this framework allows, up to a scaling factor, the explicit reconstruction from the binary representation of the original vector. The paper also shows that random projections which are used in Locality Sensitive Hashing algorithms, are significantly outperformed by regular frames for both synthetic and real data if the number of bits exceeds the vector dimensionality, i.e., when high precision is required.Comment: submitted to ICASSP'2012; RR-7771 (2011

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Exploiting multimedia in creating and analysing multimedia Web archives

Author: Dupplaw David
Hall Wendy
Hare Jonathon
Lewis Paul H.
Martinez Kirk
Publication venue: 'MDPI AG'
Publication date: 01/01/2014
Field of study

The data contained on the web and the social web are inherently multimedia and consist of a mixture of textual, visual and audio modalities. Community memories embodied on the web and social web contain a rich mixture of data from these modalities. In many ways, the web is the greatest resource ever created by human-kind. However, due to the dynamic and distributed nature of the web, its content changes, appears and disappears on a daily basis. Web archiving provides a way of capturing snapshots of (parts of) the web for preservation and future analysis. This paper provides an overview of techniques we have developed within the context of the EU funded ARCOMEM (ARchiving COmmunity MEMories) project to allow multimedia web content to be leveraged during the archival process and for post-archival analysis. Through a set of use cases, we explore several practical applications of multimedia analytics within the realm of web archiving, web archive analysis and multimedia data on the web in general

CiteSeerX

Southampton (e-Prints Soton)

Crossref

Directory of Open Access Journals

SADIH: Semantic-Aware DIscrete Hashing

Author: Huang Zi
Li Sheng
Li Yang
Xie Guo-sen
Zhang Zheng
Publication venue
Publication date: 01/01/2019
Field of study

Due to its low storage cost and fast query speed, hashing has been recognized to accomplish similarity search in large-scale multimedia retrieval applications. Particularly supervised hashing has recently received considerable research attention by leveraging the label information to preserve the pairwise similarities of data points in the Hamming space. However, there still remain two crucial bottlenecks: 1) the learning process of the full pairwise similarity preservation is computationally unaffordable and unscalable to deal with big data; 2) the available category information of data are not well-explored to learn discriminative hash functions. To overcome these challenges, we propose a unified Semantic-Aware DIscrete Hashing (SADIH) framework, which aims to directly embed the transformed semantic information into the asymmetric similarity approximation and discriminative hashing function learning. Specifically, a semantic-aware latent embedding is introduced to asymmetrically preserve the full pairwise similarities while skillfully handle the cumbersome n times n pairwise similarity matrix. Meanwhile, a semantic-aware autoencoder is developed to jointly preserve the data structures in the discriminative latent semantic space and perform data reconstruction. Moreover, an efficient alternating optimization algorithm is proposed to solve the resulting discrete optimization problem. Extensive experimental results on multiple large-scale datasets demonstrate that our SADIH can clearly outperform the state-of-the-art baselines with the additional benefit of lower computational costs.Comment: Accepted by The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

University of Queensland eSpace

Memory vectors for similarity search in high-dimensional spaces

Author: Furon Teddy
Gripon Vincent
Iscen Ahmet
Jégou Hervé
Rabbat Michael
Publication venue
Publication date: 01/01/2017
Field of study

We study an indexing architecture to store and search in a database of high-dimensional vectors from the perspective of statistical signal processing and decision theory. This architecture is composed of several memory units, each of which summarizes a fraction of the database by a single representative vector. The potential similarity of the query to one of the vectors stored in the memory unit is gauged by a simple correlation with the memory unit's representative vector. This representative optimizes the test of the following hypothesis: the query is independent from any vector in the memory unit vs. the query is a simple perturbation of one of the stored vectors. Compared to exhaustive search, our approach finds the most similar database vectors significantly faster without a noticeable reduction in search quality. Interestingly, the reduction of complexity is provably better in high-dimensional spaces. We empirically demonstrate its practical interest in a large-scale image search scenario with off-the-shelf state-of-the-art descriptors.Comment: Accepted to IEEE Transactions on Big Dat

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Université de Bretagne Occidentale

HAL-Rennes 1

Asymmetric Hamming Embedding

Author: Gros Patrick
Jain Mihir
Jégou Hervé
Publication venue: HAL CCSD
Publication date: 28/11/2011
Field of study

International audienceThis paper proposes an asymmetric Hamming Embedding scheme for large scale image search based on local descriptors. The comparison of two descriptors relies on an vector-to-binary code comparison, which limits the quantization error associated with the query compared with the original Hamming Embedding method. The approach is used in combination with an inverted file structure that offers high efficiency, comparable to that of a regular bag-of-features retrieval systems. The comparison is performed on two popular datasets. Our method consistently improves the search quality over the symmetric version. The trade-off between memory usage and precision is evaluated, showing that the method is especially useful for short binary signatures

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

A fuzzy asymmetric TOPSIS model for optimizing investment in online advertising campaigns

Author: Arroyo Cañada Francisco Javier
Gil Lafuente Jaime
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/09/2019
Field of study

The high penetration of the Internet and e-commerce in Spain during recent years has increased companies' interest in this medium for advertising planning. In this context Google offers a great advertising inventory and perfectly segmented content pages. This work is concerned with the optimization of online advertising investments based on pay-per-click campaigns. Our main goal is to rank and select different alternative keyword sets aimed at maximizing the awareness of and traffic to a company's website. The keyword selection problem with online advertising purposes is clearly a multiple-criteria decision-making problem additionally characterized by the imprecise, ambiguous and uncertain nature of the available data. To address this problem, we propose a technique for order of preference by similarity to ideal solution (TOPSIS)-based approach, which allows us to rank the alternative keyword sets, taking into account the fuzzy nature of the available data. The TOPSIS is based on the concept that the chosen alternative should have the shortest distance from the positive ideal solution and the longest distance from the negative ideal solution. In this work, due to the characteristics of the studied problem, we propose the use of an asymmetric distance, allowing us to work with ideal solutions that differ from the maximum or the minimum. The suitability of the proposed model is illustrated with an empirical case of a stock exchange broker's advertising investment problem aimed at generating awareness about the brand and increasing the traffic to the corporative website

Diposit Digital de la Universitat de Barcelona

PROFICIENT PRODUCT QUANTIZATION FOR LARGE-SCALE HIGH DIMENSIONAL DATA USING APPROXIMATE NEAREST NEIGHBOR SEARCH

Author: Dharmajee Rao D.T.V
Manasa Kornana
Publication venue: International Journal of Innovative Technology and Research
Publication date: 23/04/2017
Field of study

K-nearest neighbor's classification and regression is broadly utilized as a part of data mining because of its easiness and precision. At the point when a prediction is required for an inconspicuous data case, the KNN algorithm will search through the preparation dataset for the k most comparable occasions. Finding the esteem k is application subordinate, thus a nearby esteem is set which expands the exactness of the issue. Grouping the question the lion's share class of its k neighbors is called K-nearest neighbors classification. In this paper the occurrence or question be grouped is known as the issue protest or venture in short. Worldwide KNN approach utilizes the entire data for searching the k-nearest neighbors of the venture. For data KNN approach is utilized where test objects are arbitrarily chosen from the preparation data space. Keeping in mind the end goal to enhance the exactness of finding the correct k-neighbors of nearby KNN, among various ANN approaches proposed in the current years, the ones in light of vector quantization emerge, accomplishing best in class comes about. Product quantization (PQ) decays vectors into subspaces for independent handling, taking into account quick query based separation estimations. This postulation work intends to lessen the intricacy of AQ by changing a solitary most costly stride in the process – that of vector encoding. Both the remarkable search execution and high expenses of AQ originated from its all-inclusive statement, along these lines by forcing some novel outside imperatives it is conceivable to accomplish a superior trade off: lessen many-sided quality while holding the precision advantage over other ANN strategies

International Journal of Innovative Technology and Research (IJITR)

Consistent visual words mining with adaptive sampling

Author: Buisson Olivier
Joly Alexis
Letessier Pierre
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

International audienceState-of-the-art large-scale object retrieval systems usually combine efficient Bag-of-Words indexing models with a spatial verification re-ranking stage to improve query performance. In this paper we propose to directly discover spatially verified visual words as a batch process. Contrary to previous related methods based on feature sets hashing or clustering, we suggest not trading recall for efficiency by sticking on an accurate two-stage matching strategy. The problem then rather becomes a sampling issue: how to effectively and efficiently select relevant query regions while minimizing the number of tentative probes? We therefore introduce an adaptive weighted sampling scheme, starting with some prior distribution and iteratively converging to unvisited regions. Interestingly, the proposed paradigm is generalizable to any input prior distribution, including specific visual concept detectors or efficient hashing-based methods. We show in the experiments that the proposed method allows to discover highly interpretable visual words while providing excellent recall and image representativity

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot