8 research outputs found

    Efficient and effective human action recognition in video through motion boundary description with a compact set of trajectories

    Get PDF
    Human action recognition (HAR) is at the core of human-computer interaction and video scene understanding. However, achieving effective HAR in an unconstrained environment is still a challenging task. To that end, trajectory-based video representations are currently widely used. Despite the promising levels of effectiveness achieved by these approaches, problems regarding computational complexity and the presence of redundant trajectories still need to be addressed in a satisfactory way. In this paper, we propose a method for trajectory rejection, reducing the number of redundant trajectories without degrading the effectiveness of HAR. Furthermore, to realize efficient optical flow estimation prior to trajectory extraction, we integrate a method for dynamic frame skipping. Experiments with four publicly available human action datasets show that the proposed approach outperforms state-of-the-art HAR approaches in terms of effectiveness, while simultaneously mitigating the computational complexity

    Attribute-Graph: A Graph based approach to Image Ranking

    Full text link
    We propose a novel image representation, termed Attribute-Graph, to rank images by their semantic similarity to a given query image. An Attribute-Graph is an undirected fully connected graph, incorporating both local and global image characteristics. The graph nodes characterise objects as well as the overall scene context using mid-level semantic attributes, while the edges capture the object topology. We demonstrate the effectiveness of Attribute-Graphs by applying them to the problem of image ranking. We benchmark the performance of our algorithm on the 'rPascal' and 'rImageNet' datasets, which we have created in order to evaluate the ranking performance on complex queries containing multiple objects. Our experimental evaluation shows that modelling images as Attribute-Graphs results in improved ranking performance over existing techniques.Comment: In IEEE International Conference on Computer Vision (ICCV) 201

    Context-dependent random walk graph kernels and tree pattern graph matching kernels with applications to action recognition

    Get PDF
    Graphs are effective tools for modeling complex data. Setting out from two basic substructures, random walks and trees, we propose a new family of context-dependent random walk graph kernels and a new family of tree pattern graph matching kernels. In our context-dependent graph kernels, context information is incorporated into primary random walk groups. A multiple kernel learning algorithm with a proposed l12-norm regularization is applied to combine context-dependent graph kernels of different orders. This improves the similarity measurement between graphs. In our tree-pattern graph matching kernel, a quadratic optimization with a sparse constraint is proposed to select the correctly matched tree-pattern groups. This augments the discriminative power of the tree-pattern graph matching. We apply the proposed kernels to human action recognition, where each action is represented by two graphs which record the spatiotemporal relations between local feature vectors. Experimental comparisons with state-of-the-art algorithms on several benchmark datasets demonstrate the effectiveness of the proposed kernels for recognizing human actions. It is shown that our kernel based on tree pattern groups, which have more complex structures and exploit more local topologies of graphs than random walks, yields more accurate results but requires more runtime than the context-dependent walk graph kernel

    Content-Based Visual Landmark Search via Multimodal Hypergraph Learning

    Get PDF
    Formerly IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics</p

    A Multigraph Representation for Improved Unsupervised/Semi-supervised Learning of Human Actions

    No full text
    Graph-based methods are a useful class of methods for improving the performance of unsupervised and semi-supervised machine learning tasks, such as clustering or information retrieval. However, the performance of existing graph-based methods is highly dependent on how well the affinity graph reflects the original data structure. We propose that multimedia such as images or videos consist of multiple separate components, and therefore more than one graph is required to fully capture the relationship between them. Accordingly, we present a new spectral method - the Feature Grouped Spectral Multigraph (FGSM) - which comprises the following steps. First, mutually independent subsets of the original feature space are generated through feature clustering. Secondly, a separate graph is generated from each feature subset. Finally, a spectral embedding is calculated on each graph, and the embeddings are scaled/aggregated into a single representation. Using this representation, a variety of experiments are performed on three learning tasks - clustering, retrieval and recognition - on human action datasets, demonstrating considerably better performance than the state-of-the-art

    A Multigraph Representation for Improved Unsupervised/Semi-supervised Learning of Human Actions

    No full text
    Graph-based methods are a useful class of methods for improving the performance of unsupervised and semi-supervised machine learning tasks, such as clustering or information retrieval. However, the performance of existing graph-based methods is highly dependent on how well the affinity graph reflects the original data structure. We propose that multimedia such as images or videos consist of multiple separate components, and therefore more than one graph is required to fully capture the relationship between them. Accordingly, we present a new spectral method - the Feature Grouped Spectral Multigraph (FGSM) - which comprises the following steps. First, mutually independent subsets of the original feature space are generated through feature clustering. Secondly, a separate graph is generated from each feature subset. Finally, a spectral embedding is calculated on each graph, and the embeddings are scaled/aggregated into a single representation. Using this representation, a variety of experiments are performed on three learning tasks - clustering, retrieval and recognition - on human action datasets, demonstrating considerably better performance than the state-of-the-art

    Online, Supervised and Unsupervised Action Localization in Videos

    Get PDF
    Action recognition classifies a given video among a set of action labels, whereas action localization determines the location of an action in addition to its class. The overall aim of this dissertation is action localization. Many of the existing action localization approaches exhaustively search (spatially and temporally) for an action in a video. However, as the search space increases with high resolution and longer duration videos, it becomes impractical to use such sliding window techniques. The first part of this dissertation presents an efficient approach for localizing actions by learning contextual relations between different video regions in training. In testing, we use the context information to estimate the probability of each supervoxel belonging to the foreground action and use Conditional Random Field (CRF) to localize actions. In the above method and typical approaches to this problem, localization is performed in an offline manner where all the video frames are processed together. This prevents timely localization and prediction of actions/interactions - an important consideration for many tasks including surveillance and human-machine interaction. Therefore, in the second part of this dissertation we propose an online approach to the challenging problem of localization and prediction of actions/interactions in videos. In this approach, we use human poses and superpixels in each frame to train discriminative appearance models and perform online prediction of actions/interactions with Structural SVM. Above two approaches rely on human supervision in the form of assigning action class labels to videos and annotating actor bounding boxes in each frame of training videos. Therefore, in the third part of this dissertation we address the problem of unsupervised action localization. Given unlabeled videos without annotations, this approach aims at: 1) Discovering action classes using a discriminative clustering approach, and 2) Localizing actions using a variant of Knapsack problem
    corecore