3,396 research outputs found

    Temporal Cross-Media Retrieval with Soft-Smoothing

    Full text link
    Multimedia information have strong temporal correlations that shape the way modalities co-occur over time. In this paper we study the dynamic nature of multimedia and social-media information, where the temporal dimension emerges as a strong source of evidence for learning the temporal correlations across visual and textual modalities. So far, cross-media retrieval models, explored the correlations between different modalities (e.g. text and image) to learn a common subspace, in which semantically similar instances lie in the same neighbourhood. Building on such knowledge, we propose a novel temporal cross-media neural architecture, that departs from standard cross-media methods, by explicitly accounting for the temporal dimension through temporal subspace learning. The model is softly-constrained with temporal and inter-modality constraints that guide the new subspace learning task by favouring temporal correlations between semantically similar and temporally close instances. Experiments on three distinct datasets show that accounting for time turns out to be important for cross-media retrieval. Namely, the proposed method outperforms a set of baselines on the task of temporal cross-media retrieval, demonstrating its effectiveness for performing temporal subspace learning.Comment: To appear in ACM MM 201

    Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions

    Full text link
    We present a comparative evaluation of various techniques for action recognition while keeping as many variables as possible controlled. We employ two categories of Riemannian manifolds: symmetric positive definite matrices and linear subspaces. For both categories we use their corresponding nearest neighbour classifiers, kernels, and recent kernelised sparse representations. We compare against traditional action recognition techniques based on Gaussian mixture models and Fisher vectors (FVs). We evaluate these action recognition techniques under ideal conditions, as well as their sensitivity in more challenging conditions (variations in scale and translation). Despite recent advancements for handling manifolds, manifold based techniques obtain the lowest performance and their kernel representations are more unstable in the presence of challenging conditions. The FV approach obtains the highest accuracy under ideal conditions. Moreover, FV best deals with moderate scale and translation changes

    Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective

    Get PDF
    This paper takes a problem-oriented perspective and presents a comprehensive review of transfer learning methods, both shallow and deep, for cross-dataset visual recognition. Specifically, it categorises the cross-dataset recognition into seventeen problems based on a set of carefully chosen data and label attributes. Such a problem-oriented taxonomy has allowed us to examine how different transfer learning approaches tackle each problem and how well each problem has been researched to date. The comprehensive problem-oriented review of the advances in transfer learning with respect to the problem has not only revealed the challenges in transfer learning for visual recognition, but also the problems (e.g. eight of the seventeen problems) that have been scarcely studied. This survey not only presents an up-to-date technical review for researchers, but also a systematic approach and a reference for a machine learning practitioner to categorise a real problem and to look up for a possible solution accordingly
    • …
    corecore