40 research outputs found

    Image Classification with the Fisher Vector: Theory and Practice

    Get PDF
    A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words (BOV) representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an ''universal'' generative Gaussian mixture model. This representation, which we call Fisher Vector (FV) has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets -- PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K -- with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique

    Bayesian models for visual information retrieval

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2000.Includes bibliographical references (leaves 192-208).This thesis presents a unified solution to visual recognition and learning in the context of visual information retrieval. Realizing that the design of an effective recognition architecture requires careful consideration of the interplay between feature selection, feature representation, and similarity function, we start by searching for a performance criteria that can simultaneously guide the design of all three components. A natural solution is to formulate visual recognition as a decision theoretical problem, where the goal is to minimize the probability of retrieval error. This leads to a Bayesian architecture that is shown to generalize a significant number of previous recognition approaches, solving some of the most challenging problems faced by these: joint modeling of color and texture, objective guidelines for controlling the trade-off between feature transformation and feature representation, and unified support for local and global queries without requiring image segmentation. The new architecture is shown to perform well on color, texture, and generic image databases, providing a good trade-off between retrieval accuracy, invariance, perceptual relevance of similarity judgments, and complexity. Because all that is needed to perform optimal Bayesian decisions is the ability to evaluate beliefs on the different hypothesis under consideration, a Bayesian architecture is not restricted to visual recognition. On the contrary, it establishes a universal recognition language (the language of probabilities) that provides a computational basis for the integration of information from multiple content sources and modalities. In result, it becomes possible to build retrieval systems that can simultaneously account for text, audio, video, or any other content modalities. Since the ability to learn follows from the ability to integrate information over time, this language is also conducive to the design of learning algorithms. We show that learning is, indeed, an important asset for visual information retrieval by designing both short and long-term learning mechanisms. Over short time scales (within a retrieval session), learning is shown to assure faster convergence to the desired target images. Over long time scales (between retrieval sessions), it allows the retrieval system to tailor itself to the preferences of particular users. In both cases, all the necessary computations are carried out through Bayesian belief propagation algorithms that, although optimal in a decision-theoretic sense, are extremely simple, intuitive, and easy to implement.by Nuno Miguel Borges de Pinho Cruz de Vasconcelos.Ph.D

    Learning to compress and search visual data in large-scale systems

    Full text link
    The problem of high-dimensional and large-scale representation of visual data is addressed from an unsupervised learning perspective. The emphasis is put on discrete representations, where the description length can be measured in bits and hence the model capacity can be controlled. The algorithmic infrastructure is developed based on the synthesis and analysis prior models whose rate-distortion properties, as well as capacity vs. sample complexity trade-offs are carefully optimized. These models are then extended to multi-layers, namely the RRQ and the ML-STC frameworks, where the latter is further evolved as a powerful deep neural network architecture with fast and sample-efficient training and discrete representations. For the developed algorithms, three important applications are developed. First, the problem of large-scale similarity search in retrieval systems is addressed, where a double-stage solution is proposed leading to faster query times and shorter database storage. Second, the problem of learned image compression is targeted, where the proposed models can capture more redundancies from the training images than the conventional compression codecs. Finally, the proposed algorithms are used to solve ill-posed inverse problems. In particular, the problems of image denoising and compressive sensing are addressed with promising results.Comment: PhD thesis dissertatio

    Efficient Vector Quantization for Fast Approximate Nearest Neighbor Search

    Get PDF
    Increasing sizes of databases and data stores mean that the traditional tasks, such as locating a nearest neighbor for a given data point, become too complex for classical solutions to handle. Exact solutions have been shown to scale poorly with dimensionality of the data. Approximate nearest neighbor search (ANN) is a practical compromise between accuracy and performance; it is widely applicable and is a subject of much research. Amongst a number of ANN approaches suggested in the recent years, the ones based on vector quantization stand out, achieving state-of-the-art results. Product quantization (PQ) decomposes vectors into subspaces for separate processing, allowing for fast lookup-based distance calculations. Additive quantization (AQ) drops most of PQ constraints, currently providing the best search accuracy on image descriptor datasets, but at a higher computational cost. This thesis work aims to reduce the complexity of AQ by changing a single most expensive step in the process – that of vector encoding. Both the outstanding search performance and high costs of AQ come from its generality, therefore by imposing some novel external constraints it is possible to achieve a better compromise: reduce complexity while retaining the accuracy advantage over other ANN methods. We propose a new encoding method for AQ – pyramid encoding. It requires significantly less calculations compared to the original “beam search” encoding, at the cost of an increased greediness of the optimization procedure. As its performance depends heavily on the initialization, the problem of choosing a starting point is also discussed. The results achieved by applying the proposed method are compared with the current state-of-the-art on two widely used benchmark datasets – GIST1M and SIFT1M, both generated from a real-world image data and therefore closely modeling practical applications. AQ with pyramid encoding, in addition to its computational benefits, is shown to achieve similar or better search performance than competing methods. However, its current advantages seem to be limited to data of a certain internal structure. Further analysis of this drawback provides us with the directions of possible future work

    p-probabilistic k-anonymous microaggregation for the anonymization of surveys with uncertain participation

    Get PDF
    We develop a probabilistic variant of k-anonymous microaggregation which we term p-probabilistic resorting to a statistical model of respondent participation in order to aggregate quasi-identifiers in such a manner that k-anonymity is concordantly enforced with a parametric probabilistic guarantee. Succinctly owing the possibility that some respondents may not finally participate, sufficiently larger cells are created striving to satisfy k-anonymity with probability at least p. The microaggregation function is designed before the respondents submit their confidential data. More precisely, a specification of the function is sent to them which they may verify and apply to their quasi-identifying demographic variables prior to submitting the microaggregated data along with the confidential attributes to an authorized repository. We propose a number of metrics to assess the performance of our probabilistic approach in terms of anonymity and distortion which we proceed to investigate theoretically in depth and empirically with synthetic and standardized data. We stress that in addition to constituting a functional extension of traditional microaggregation, thereby broadening its applicability to the anonymization of statistical databases in a wide variety of contexts, the relaxation of trust assumptions is arguably expected to have a considerable impact on user acceptance and ultimately on data utility through mere availability.Peer ReviewedPostprint (author's final draft

    Learning compact hashing codes for large-scale similarity search

    Get PDF
    Retrieval of similar objects is a key component in many applications. As databases grow larger, learning compact representations for efficient storage and fast search becomes increasingly important. Moreover, these representations should preserve similarity, i.e., similar objects should have similar representations. Hashing algorithms, which encode objects into compact binary codes to preserve similarity, have demonstrated promising results in addressing these challenges. This dissertation studies the problem of learning compact hashing codes for large-scale similarity search. Specifically, we investigate two classes of approach: regularized Adaboost and signal-to-noise ratio (SNR) maximization. The regularized Adaboost builds on the classical boosting framework for hashing, while SNR maximization is a novel hashing framework with theoretical guarantee and great flexibility in designing hashing algorithms for various scenarios. The regularized Adaboost algorithm is to learn and extract binary hash codes (fingerprints) of time-varying content by filtering and quantizing perceptually significant features. The proposed algorithm extends the recent symmetric pairwise boosting (SPB) algorithm by taking feature sequence correlation into account. An information-theoretic analysis of the SPB algorithm is given, showing that each iteration of SPB maximizes a lower bound on the mutual information between matching fingerprint pairs. Based on the analysis, two practical regularizers are proposed to penalize those filters generating highly correlated filter responses. A learning-theoretic analysis of the regularized Adaboost algorithm is given. The proposed algorithm demonstrates significant performance gains over SPB for both audio and video content identification (ID) systems. SNR maximization hashing (SRN-MH) uses the SNR metric to select a set of uncorrelated projection directions, and one hash bit is extracted from each projection direction. We first motivate this approach under a Gaussian model for the underlying signals, in which case maximizing SNR is equivalent to minimizing the hashing error probability. This theoretical guarantee differentiates SNR-MH from other hashing algorithms where learning has to be carried out with a continuous relaxation of quantization functions. A globally optimal solution can be obtained by solving a generalized eigenvalue problem. Experiments on both synthetic and real datasets demonstrate the power of SNR-MH to learn compact codes. We extend SNR-MH to two different scenarios in large-scale similarity search. The first extension aims at applications with a larger bit budget. To learn longer hash codes, we propose a multi-bit per projection algorithm, called SNR multi-bit hashing (SNR-MBH), to learn longer hash codes when the number of high-SNR projections is limited. Extensive experiments demonstrate the superior performance of SNR-MBH. The second extension aims at a multi-feature setting, where more than one feature vector is available for each object. We propose two multi-feature hashing methods, SNR joint hashing (SNR-JH) and SNR selection hashing (SNR-SH). SNR-JH jointly considers all feature correlations and learns uncorrelated hash functions that maximize SNR, while SNR-SH separately learns hash functions on each individual feature and selects the final hash functions based on the SNR associated with each hash function. The proposed methods perform favorably compared to other state-of-the-art multi-feature hashing algorithms on several benchmark datasets

    Watermarking techniques using knowledge of host database

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore