20 research outputs found

    The Power of Asymmetry in Binary Hashing

    Full text link
    When approximating binary similarity using the hamming distance between short binary hashes, we show that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps. I.e. by approximating the similarity between xx and xâ€Čx' as the hamming distance between f(x)f(x) and g(xâ€Č)g(x'), for two distinct binary codes f,gf,g, rather than as the hamming distance between f(x)f(x) and f(xâ€Č)f(x').Comment: Accepted to NIPS 2013, 9 pages, 5 figure

    Anti-sparse coding for approximate nearest neighbor search

    Get PDF
    This paper proposes a binarization scheme for vectors of high dimension based on the recent concept of anti-sparse coding, and shows its excellent performance for approximate nearest neighbor search. Unlike other binarization schemes, this framework allows, up to a scaling factor, the explicit reconstruction from the binary representation of the original vector. The paper also shows that random projections which are used in Locality Sensitive Hashing algorithms, are significantly outperformed by regular frames for both synthetic and real data if the number of bits exceeds the vector dimensionality, i.e., when high precision is required.Comment: submitted to ICASSP'2012; RR-7771 (2011

    Exploiting multimedia in creating and analysing multimedia Web archives

    No full text
    The data contained on the web and the social web are inherently multimedia and consist of a mixture of textual, visual and audio modalities. Community memories embodied on the web and social web contain a rich mixture of data from these modalities. In many ways, the web is the greatest resource ever created by human-kind. However, due to the dynamic and distributed nature of the web, its content changes, appears and disappears on a daily basis. Web archiving provides a way of capturing snapshots of (parts of) the web for preservation and future analysis. This paper provides an overview of techniques we have developed within the context of the EU funded ARCOMEM (ARchiving COmmunity MEMories) project to allow multimedia web content to be leveraged during the archival process and for post-archival analysis. Through a set of use cases, we explore several practical applications of multimedia analytics within the realm of web archiving, web archive analysis and multimedia data on the web in general

    SADIH: Semantic-Aware DIscrete Hashing

    Full text link
    Due to its low storage cost and fast query speed, hashing has been recognized to accomplish similarity search in large-scale multimedia retrieval applications. Particularly supervised hashing has recently received considerable research attention by leveraging the label information to preserve the pairwise similarities of data points in the Hamming space. However, there still remain two crucial bottlenecks: 1) the learning process of the full pairwise similarity preservation is computationally unaffordable and unscalable to deal with big data; 2) the available category information of data are not well-explored to learn discriminative hash functions. To overcome these challenges, we propose a unified Semantic-Aware DIscrete Hashing (SADIH) framework, which aims to directly embed the transformed semantic information into the asymmetric similarity approximation and discriminative hashing function learning. Specifically, a semantic-aware latent embedding is introduced to asymmetrically preserve the full pairwise similarities while skillfully handle the cumbersome n times n pairwise similarity matrix. Meanwhile, a semantic-aware autoencoder is developed to jointly preserve the data structures in the discriminative latent semantic space and perform data reconstruction. Moreover, an efficient alternating optimization algorithm is proposed to solve the resulting discrete optimization problem. Extensive experimental results on multiple large-scale datasets demonstrate that our SADIH can clearly outperform the state-of-the-art baselines with the additional benefit of lower computational costs.Comment: Accepted by The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19

    Memory vectors for similarity search in high-dimensional spaces

    Get PDF
    We study an indexing architecture to store and search in a database of high-dimensional vectors from the perspective of statistical signal processing and decision theory. This architecture is composed of several memory units, each of which summarizes a fraction of the database by a single representative vector. The potential similarity of the query to one of the vectors stored in the memory unit is gauged by a simple correlation with the memory unit's representative vector. This representative optimizes the test of the following hypothesis: the query is independent from any vector in the memory unit vs. the query is a simple perturbation of one of the stored vectors. Compared to exhaustive search, our approach finds the most similar database vectors significantly faster without a noticeable reduction in search quality. Interestingly, the reduction of complexity is provably better in high-dimensional spaces. We empirically demonstrate its practical interest in a large-scale image search scenario with off-the-shelf state-of-the-art descriptors.Comment: Accepted to IEEE Transactions on Big Dat

    Asymmetric Hamming Embedding

    Get PDF
    International audienceThis paper proposes an asymmetric Hamming Embedding scheme for large scale image search based on local descriptors. The comparison of two descriptors relies on an vector-to-binary code comparison, which limits the quantization error associated with the query compared with the original Hamming Embedding method. The approach is used in combination with an inverted file structure that offers high efficiency, comparable to that of a regular bag-of-features retrieval systems. The comparison is performed on two popular datasets. Our method consistently improves the search quality over the symmetric version. The trade-off between memory usage and precision is evaluated, showing that the method is especially useful for short binary signatures

    A fuzzy asymmetric TOPSIS model for optimizing investment in online advertising campaigns

    Get PDF
    The high penetration of the Internet and e-commerce in Spain during recent years has increased companies' interest in this medium for advertising planning. In this context Google offers a great advertising inventory and perfectly segmented content pages. This work is concerned with the optimization of online advertising investments based on pay-per-click campaigns. Our main goal is to rank and select different alternative keyword sets aimed at maximizing the awareness of and traffic to a company's website. The keyword selection problem with online advertising purposes is clearly a multiple-criteria decision-making problem additionally characterized by the imprecise, ambiguous and uncertain nature of the available data. To address this problem, we propose a technique for order of preference by similarity to ideal solution (TOPSIS)-based approach, which allows us to rank the alternative keyword sets, taking into account the fuzzy nature of the available data. The TOPSIS is based on the concept that the chosen alternative should have the shortest distance from the positive ideal solution and the longest distance from the negative ideal solution. In this work, due to the characteristics of the studied problem, we propose the use of an asymmetric distance, allowing us to work with ideal solutions that differ from the maximum or the minimum. The suitability of the proposed model is illustrated with an empirical case of a stock exchange broker's advertising investment problem aimed at generating awareness about the brand and increasing the traffic to the corporative website

    PROFICIENT PRODUCT QUANTIZATION FOR LARGE-SCALE HIGH DIMENSIONAL DATA USING APPROXIMATE NEAREST NEIGHBOR SEARCH

    Get PDF
    K-nearest neighbor's classification and regression is broadly utilized as a part of data mining because of its easiness and precision. At the point when a prediction is required for an inconspicuous data case, the KNN algorithm will search through the preparation dataset for the k most comparable occasions. Finding the esteem k is application subordinate, thus a nearby esteem is set which expands the exactness of the issue. Grouping the question the lion's share class of its k neighbors is called K-nearest neighbors classification. In this paper the occurrence or question be grouped is known as the issue protest or venture in short. Worldwide KNN approach utilizes the entire data for searching the k-nearest neighbors of the venture. For data KNN approach is utilized where test objects are arbitrarily chosen from the preparation data space. Keeping in mind the end goal to enhance the exactness of finding the correct k-neighbors of nearby KNN, among various ANN approaches proposed in the current years, the ones in light of vector quantization emerge, accomplishing best in class comes about. Product quantization (PQ) decays vectors into subspaces for independent handling, taking into account quick query based separation estimations. This postulation work intends to lessen the intricacy of AQ by changing a solitary most costly stride in the process – that of vector encoding. Both the remarkable search execution and high expenses of AQ originated from its all-inclusive statement, along these lines by forcing some novel outside imperatives it is conceivable to accomplish a superior trade off: lessen many-sided quality while holding the precision advantage over other ANN strategies

    Consistent visual words mining with adaptive sampling

    Get PDF
    International audienceState-of-the-art large-scale object retrieval systems usually combine efficient Bag-of-Words indexing models with a spatial verification re-ranking stage to improve query performance. In this paper we propose to directly discover spatially verified visual words as a batch process. Contrary to previous related methods based on feature sets hashing or clustering, we suggest not trading recall for efficiency by sticking on an accurate two-stage matching strategy. The problem then rather becomes a sampling issue: how to effectively and efficiently select relevant query regions while minimizing the number of tentative probes? We therefore introduce an adaptive weighted sampling scheme, starting with some prior distribution and iteratively converging to unvisited regions. Interestingly, the proposed paradigm is generalizable to any input prior distribution, including specific visual concept detectors or efficient hashing-based methods. We show in the experiments that the proposed method allows to discover highly interpretable visual words while providing excellent recall and image representativity
    corecore