3,587 research outputs found

    Compressive Sequential Learning for Action Similarity Labeling

    Get PDF
    Human action recognition in videos has been extensively studied in recent years due to its wide range of applications. Instead of classifying video sequences into a number of action categories, in this paper, we focus on a particular problem of action similarity labeling (ASLAN), which aims at verifying whether a pair of videos contain the same type of action or not. To address this challenge, a novel approach called compressive sequential learning (CSL) is proposed by leveraging the compressive sensing theory and sequential learning. We first project data points to a low-dimensional space by effectively exploring an important property in compressive sensing: the restricted isometry property. In particular, a very sparse measurement matrix is adopted to reduce the dimensionality efficiently. We then learn an ensemble classifier for measuring similarities between pairwise videos by iteratively minimizing its empirical risk with the AdaBoost strategy on the training set. Unlike conventional AdaBoost, the weak learner for each iteration is not explicitly defined and its parameters are learned through greedy optimization. Furthermore, an alternative of CSL named compressive sequential encoding is developed as an encoding technique and followed by a linear classifier to address the similarity-labeling problem. Our method has been systematically evaluated on four action data sets: ASLAN, KTH, HMDB51, and Hollywood2, and the results show the effectiveness and superiority of our method for ASLAN

    Multimodal Grounding for Language Processing

    Get PDF
    This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference of Computational Linguistics. Please refer to this version for citations: https://www.aclweb.org/anthology/papers/C/C18/C18-1197

    Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

    Full text link
    We present a method for learning an embedding that places images of humans in similar poses nearby. This embedding can be used as a direct method of comparing images based on human pose, avoiding potential challenges of estimating body joint positions. Pose embedding learning is formulated under a triplet-based distance criterion. A deep architecture is used to allow learning of a representation capable of making distinctions between different poses. Experiments on human pose matching and retrieval from video data demonstrate the potential of the method

    Adaptive Nonparametric Image Parsing

    Get PDF
    In this paper, we present an adaptive nonparametric solution to the image parsing task, namely annotating each image pixel with its corresponding category label. For a given test image, first, a locality-aware retrieval set is extracted from the training data based on super-pixel matching similarities, which are augmented with feature extraction for better differentiation of local super-pixels. Then, the category of each super-pixel is initialized by the majority vote of the kk-nearest-neighbor super-pixels in the retrieval set. Instead of fixing kk as in traditional non-parametric approaches, here we propose a novel adaptive nonparametric approach which determines the sample-specific k for each test image. In particular, kk is adaptively set to be the number of the fewest nearest super-pixels which the images in the retrieval set can use to get the best category prediction. Finally, the initial super-pixel labels are further refined by contextual smoothing. Extensive experiments on challenging datasets demonstrate the superiority of the new solution over other state-of-the-art nonparametric solutions.Comment: 11 page
    • …
    corecore