2,342 research outputs found

    Compositional Model based Fisher Vector Coding for Image Classification

    Full text link
    Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) to depict the generation process of local features. However, the representative power of the GMM could be limited because it essentially assumes that local features can be characterized by a fixed number of feature prototypes and the number of prototypes is usually small in FVC. To handle this limitation, in this paper we break the convention which assumes that a local feature is drawn from one of few Gaussian distributions. Instead, we adopt a compositional mechanism which assumes that a local feature is drawn from a Gaussian distribution whose mean vector is composed as the linear combination of multiple key components and the combination weight is a latent random variable. In this way, we can greatly enhance the representative power of the generative model of FVC. To implement our idea, we designed two particular generative models with such a compositional mechanism.Comment: Fixed typos. 16 pages. Appearing in IEEE T. Pattern Analysis and Machine Intelligence (TPAMI

    Attribute-Graph: A Graph based approach to Image Ranking

    Full text link
    We propose a novel image representation, termed Attribute-Graph, to rank images by their semantic similarity to a given query image. An Attribute-Graph is an undirected fully connected graph, incorporating both local and global image characteristics. The graph nodes characterise objects as well as the overall scene context using mid-level semantic attributes, while the edges capture the object topology. We demonstrate the effectiveness of Attribute-Graphs by applying them to the problem of image ranking. We benchmark the performance of our algorithm on the 'rPascal' and 'rImageNet' datasets, which we have created in order to evaluate the ranking performance on complex queries containing multiple objects. Our experimental evaluation shows that modelling images as Attribute-Graphs results in improved ranking performance over existing techniques.Comment: In IEEE International Conference on Computer Vision (ICCV) 201

    Temporal Extension of Scale Pyramid and Spatial Pyramid Matching for Action Recognition

    Full text link
    Historically, researchers in the field have spent a great deal of effort to create image representations that have scale invariance and retain spatial location information. This paper proposes to encode equivalent temporal characteristics in video representations for action recognition. To achieve temporal scale invariance, we develop a method called temporal scale pyramid (TSP). To encode temporal information, we present and compare two methods called temporal extension descriptor (TED) and temporal division pyramid (TDP) . Our purpose is to suggest solutions for matching complex actions that have large variation in velocity and appearance, which is missing from most current action representations. The experimental results on four benchmark datasets, UCF50, HMDB51, Hollywood2 and Olympic Sports, support our approach and significantly outperform state-of-the-art methods. Most noticeably, we achieve 65.0% mean accuracy and 68.2% mean average precision on the challenging HMDB51 and Hollywood2 datasets which constitutes an absolute improvement over the state-of-the-art by 7.8% and 3.9%, respectively
    corecore