8,909 research outputs found

    Supporting Exploratory Queries in Database Centric Web Applications

    Full text link
    Users of database-centric Web applications, especially in the e-commerce domain, often resort to exploratory ``trial-and-error'' queries since the underlying data space is huge and unfamiliar, and there are several alternatives for search attributes in this space. For example, scouting for cheap airfares typically involves posing multiple queries, varying flight times, dates, and airport locations. Exploratory queries are problematic from the perspective of both the user and the server. For the database server, it results in a drastic reduction in effective throughput since much of the processing is duplicated in each successive query. For the client, it results in a marked increase in response times, especially when accessing the service through wireless channels. In this paper, we investigate the design of automated techniques to minimize the need for repetitive exploratory queries. Specifically, we present SAUNA, a server-side query relaxation algorithm that, given the user's initial range query and a desired cardinality for the answer set, produces a relaxed query that is expected to contain the required number of answers. The algorithm incorporates a range-query-specific distance metric that is weighted to produce relaxed queries of a desired shape (e.g. aspect ratio preserving), and utilizes multi-dimensional histograms for query size estimation. A detailed performance evaluation of SAUNA over a variety of multi-dimensional data sets indicates that its relaxed queries can significantly reduce the costs associated with exploratory query processing.Comment: Version After chaning the authors, earlier by mistake only first two Authors were take

    Hashing with Mutual Information

    Full text link
    Binary vector embeddings enable fast nearest neighbor retrieval in large databases of high-dimensional objects, and play an important role in many practical applications, such as image and video retrieval. We study the problem of learning binary vector embeddings under a supervised setting, also known as hashing. We propose a novel supervised hashing method based on optimizing an information-theoretic quantity: mutual information. We show that optimizing mutual information can reduce ambiguity in the induced neighborhood structure in the learned Hamming space, which is essential in obtaining high retrieval performance. To this end, we optimize mutual information in deep neural networks with minibatch stochastic gradient descent, with a formulation that maximally and efficiently utilizes available supervision. Experiments on four image retrieval benchmarks, including ImageNet, confirm the effectiveness of our method in learning high-quality binary embeddings for nearest neighbor retrieval

    Learning to Hash for Indexing Big Data - A Survey

    Full text link
    The explosive growth in big data has attracted much attention in designing efficient indexing and search methods recently. In many critical applications such as large-scale search and pattern matching, finding the nearest neighbors to a query is a fundamental research problem. However, the straightforward solution using exhaustive comparison is infeasible due to the prohibitive computational complexity and memory requirement. In response, Approximate Nearest Neighbor (ANN) search based on hashing techniques has become popular due to its promising performance in both efficiency and accuracy. Prior randomized hashing methods, e.g., Locality-Sensitive Hashing (LSH), explore data-independent hash functions with random projections or permutations. Although having elegant theoretic guarantees on the search quality in certain metric spaces, performance of randomized hashing has been shown insufficient in many real-world applications. As a remedy, new approaches incorporating data-driven learning methods in development of advanced hash functions have emerged. Such learning to hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions. Importantly, the learned hash codes are able to preserve the proximity of neighboring data in the original feature spaces in the hash code spaces. The goal of this paper is to provide readers with systematic understanding of insights, pros and cons of the emerging techniques. We provide a comprehensive survey of the learning to hash framework and representative techniques of various types, including unsupervised, semi-supervised, and supervised. In addition, we also summarize recent hashing approaches utilizing the deep learning models. Finally, we discuss the future direction and trends of research in this area

    A Survey on Learning to Hash

    Full text link
    Nearest neighbor search is a problem of finding the data points from the database such that the distances from them to the query point are the smallest. Learning to hash is one of the major solutions to this problem and has been widely studied recently. In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations. We separate quantization from pairwise similarity preserving as the objective function is very different though quantization, as we show, can be derived from preserving the pairwise similarities. In addition, we present the evaluation protocols, and the general performance analysis, and point out that the quantization algorithms perform superiorly in terms of search accuracy, search time cost, and space cost. Finally, we introduce a few emerging topics.Comment: To appear in IEEE Transactions On Pattern Analysis and Machine Intelligence (TPAMI

    Exploring Auxiliary Context: Discrete Semantic Transfer Hashing for Scalable Image Retrieval

    Full text link
    Unsupervised hashing can desirably support scalable content-based image retrieval (SCBIR) for its appealing advantages of semantic label independence, memory and search efficiency. However, the learned hash codes are embedded with limited discriminative semantics due to the intrinsic limitation of image representation. To address the problem, in this paper, we propose a novel hashing approach, dubbed as \emph{Discrete Semantic Transfer Hashing} (DSTH). The key idea is to \emph{directly} augment the semantics of discrete image hash codes by exploring auxiliary contextual modalities. To this end, a unified hashing framework is formulated to simultaneously preserve visual similarities of images and perform semantic transfer from contextual modalities. Further, to guarantee direct semantic transfer and avoid information loss, we explicitly impose the discrete constraint, bit--uncorrelation constraint and bit-balance constraint on hash codes. A novel and effective discrete optimization method based on augmented Lagrangian multiplier is developed to iteratively solve the optimization problem. The whole learning process has linear computation complexity and desirable scalability. Experiments on three benchmark datasets demonstrate the superiority of DSTH compared with several state-of-the-art approaches

    Binary Generative Adversarial Networks for Image Retrieval

    Full text link
    The most striking successes in image retrieval using deep hashing have mostly involved discriminative models, which require labels. In this paper, we use binary generative adversarial networks (BGAN) to embed images to binary codes in an unsupervised way. By restricting the input noise variable of generative adversarial networks (GAN) to be binary and conditioned on the features of each input image, BGAN can simultaneously learn a binary representation per image, and generate an image plausibly similar to the original one. In the proposed framework, we address two main problems: 1) how to directly generate binary codes without relaxation? 2) how to equip the binary representation with the ability of accurate image retrieval? We resolve these problems by proposing new sign-activation strategy and a loss function steering the learning process, which consists of new models for adversarial loss, a content loss, and a neighborhood structure loss. Experimental results on standard datasets (CIFAR-10, NUSWIDE, and Flickr) demonstrate that our BGAN significantly outperforms existing hashing methods by up to 107\% in terms of~mAP (See Table tab.res.map.comp) Our anonymous code is available at: https://github.com/htconquer/BGAN.Comment: arXiv admin note: text overlap with arXiv:1702.00758 by other author

    Deep Discrete Supervised Hashing

    Full text link
    Hashing has been widely used for large-scale search due to its low storage cost and fast query speed. By using supervised information, supervised hashing can significantly outperform unsupervised hashing. Recently, discrete supervised hashing and deep hashing are two representative progresses in supervised hashing. On one hand, hashing is essentially a discrete optimization problem. Hence, utilizing supervised information to directly guide discrete (binary) coding procedure can avoid sub-optimal solution and improve the accuracy. On the other hand, deep hashing, which integrates deep feature learning and hash-code learning into an end-to-end architecture, can enhance the feedback between feature learning and hash-code learning. The key in discrete supervised hashing is to adopt supervised information to directly guide the discrete coding procedure in hashing. The key in deep hashing is to adopt the supervised information to directly guide the deep feature learning procedure. However, there have not existed works which can use the supervised information to directly guide both discrete coding procedure and deep feature learning procedure in the same framework. In this paper, we propose a novel deep hashing method, called deep discrete supervised hashing (DDSH), to address this problem. DDSH is the first deep hashing method which can utilize supervised information to directly guide both discrete coding procedure and deep feature learning procedure, and thus enhance the feedback between these two important procedures. Experiments on three real datasets show that DDSH can outperform other state-of-the-art baselines, including both discrete hashing and deep hashing baselines, for image retrieval

    Fusion-supervised Deep Cross-modal Hashing

    Full text link
    Deep hashing has recently received attention in cross-modal retrieval for its impressive advantages. However, existing hashing methods for cross-modal retrieval cannot fully capture the heterogeneous multi-modal correlation and exploit the semantic information. In this paper, we propose a novel \emph{Fusion-supervised Deep Cross-modal Hashing} (FDCH) approach. Firstly, FDCH learns unified binary codes through a fusion hash network with paired samples as input, which effectively enhances the modeling of the correlation of heterogeneous multi-modal data. Then, these high-quality unified hash codes further supervise the training of the modality-specific hash networks for encoding out-of-sample queries. Meanwhile, both pair-wise similarity information and classification information are embedded in the hash networks under one stream framework, which simultaneously preserves cross-modal similarity and keeps semantic consistency. Experimental results on two benchmark datasets demonstrate the state-of-the-art performance of FDCH

    Correlation Hashing Network for Efficient Cross-Modal Retrieval

    Full text link
    Hashing is widely applied to approximate nearest neighbor search for large-scale multimodal retrieval with storage and computation efficiency. Cross-modal hashing improves the quality of hash coding by exploiting semantic correlations across different modalities. Existing cross-modal hashing methods first transform data into low-dimensional feature vectors, and then generate binary codes by another separate quantization step. However, suboptimal hash codes may be generated since the quantization error is not explicitly minimized and the feature representation is not jointly optimized with the binary codes. This paper presents a Correlation Hashing Network (CHN) approach to cross-modal hashing, which jointly learns good data representation tailored to hash coding and formally controls the quantization error. The proposed CHN is a hybrid deep architecture that constitutes a convolutional neural network for learning good image representations, a multilayer perception for learning good text representations, two hashing layers for generating compact binary codes, and a structured max-margin loss that integrates all things together to enable learning similarity-preserving and high-quality hash codes. Extensive empirical study shows that CHN yields state of the art cross-modal retrieval performance on standard benchmarks.Comment: 7 page

    Deep Class-Wise Hashing: Semantics-Preserving Hashing via Class-wise Loss

    Full text link
    Deep supervised hashing has emerged as an influential solution to large-scale semantic image retrieval problems in computer vision. In the light of recent progress, convolutional neural network based hashing methods typically seek pair-wise or triplet labels to conduct the similarity preserving learning. However, complex semantic concepts of visual contents are hard to capture by similar/dissimilar labels, which limits the retrieval performance. Generally, pair-wise or triplet losses not only suffer from expensive training costs but also lack in extracting sufficient semantic information. In this regard, we propose a novel deep supervised hashing model to learn more compact class-level similarity preserving binary codes. Our deep learning based model is motivated by deep metric learning that directly takes semantic labels as supervised information in training and generates corresponding discriminant hashing code. Specifically, a novel cubic constraint loss function based on Gaussian distribution is proposed, which preserves semantic variations while penalizes the overlap part of different classes in the embedding space. To address the discrete optimization problem introduced by binary codes, a two-step optimization strategy is proposed to provide efficient training and avoid the problem of gradient vanishing. Extensive experiments on four large-scale benchmark databases show that our model can achieve the state-of-the-art retrieval performance. Moreover, when training samples are limited, our method surpasses other supervised deep hashing methods with non-negligible margins
    • …
    corecore