515 research outputs found

    Connectionist Temporal Modeling for Weakly Supervised Action Labeling

    Full text link
    We propose a weakly-supervised framework for action labeling in video, where only the order of occurring actions is required during training time. The key challenge is that the per-frame alignments between the input (video) and label (action) sequences are unknown during training. We address this by introducing the Extended Connectionist Temporal Classification (ECTC) framework to efficiently evaluate all possible alignments via dynamic programming and explicitly enforce their consistency with frame-to-frame visual similarities. This protects the model from distractions of visually inconsistent or degenerated alignments without the need of temporal supervision. We further extend our framework to the semi-supervised case when a few frames are sparsely annotated in a video. With less than 1% of labeled frames per video, our method is able to outperform existing semi-supervised approaches and achieve comparable performance to that of fully supervised approaches.Comment: To appear in ECCV 201

    Leveraging triplet loss for unsupervised action segmentation

    Get PDF
    © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper, we propose a novel fully unsupervised framework that learns action representations suitable for the action segmentation task from the single input video itself, without requiring any training data. Our method is a deep metric learning approach rooted in a shallow network with a triplet loss operating on similarity distributions and a novel triplet selection strategy that effectively models temporal and semantic priors to discover actions in the new representational space. Under these circumstances, we successfully recover temporal boundaries in the learned action representations with higher quality compared with existing unsupervised approaches. The proposed method is evaluated on two widely used benchmark datasets for the action segmentation task and it achieves competitive performance by applying a generic clustering algorithm on the learned representations.This work was supported by the project PID2019-110977GA-I00 funded by MCIN/ AEI/ 10.13039/501100011033 and by ”ESF Investing in your future”Peer ReviewedPostprint (author's final draft

    Leveraging triplet loss for unsupervised action segmentation

    Full text link
    In this paper, we propose a novel fully unsupervised framework that learns action representations suitable for the action segmentation task from the single input video itself, without requiring any training data. Our method is a deep metric learning approach rooted in a shallow network with a triplet loss operating on similarity distributions and a novel triplet selection strategy that effectively models temporal and semantic priors to discover actions in the new representational space. Under these circumstances, we successfully recover temporal boundaries in the learned action representations with higher quality compared with existing unsupervised approaches. The proposed method is evaluated on two widely used benchmark datasets for the action segmentation task and it achieves competitive performance by applying a generic clustering algorithm on the learned representations.Comment: Accepted to the Workshop on Learning with Limited Labelled Data in conjunction with CVPR 202

    Video trajectory analysis using unsupervised clustering and multi-criteria ranking

    Get PDF
    Surveillance camera usage has increased significantly for visual surveillance. Manual analysis of large video data recorded by cameras may not be feasible on a larger scale. In various applications, deep learning-guided supervised systems are used to track and identify unusual patterns. However, such systems depend on learning which may not be possible. Unsupervised methods relay on suitable features and demand cluster analysis by experts. In this paper, we propose an unsupervised trajectory clustering method referred to as t-Cluster. Our proposed method prepares indexes of object trajectories by fusing high-level interpretable features such as origin, destination, path, and deviation. Next, the clusters are fused using multi-criteria decision making and trajectories are ranked accordingly. The method is able to place abnormal patterns on the top of the list. We have evaluated our algorithm and compared it against competent baseline trajectory clustering methods applied to videos taken from publicly available benchmark datasets. We have obtained higher clustering accuracies on public datasets with significantly lesser computation overhead

    Trip purpose identification using pairwise constraints based semi-supervised clustering

    Get PDF
    Clustering of smart card data captured by automated fare collection (AFC) systems has traditionally been viewed as an unsupervised method. However, the small number of labelled data points in addition to the unlabelled smart card data can facilitate better partitioning and classification. In this paper, prior knowledge about the activities is translated into pairwise constraints and used in COP-KMEANS clustering algorithm to identify the trip purpose. The effectiveness of the method was evaluated by comparison of the results with the ground truth. The results demonstrate that semi-supervised clustering enhances the accuracy of the trip purpose identification

    First-person activity recognition: how to generalize to unseen users?

    Get PDF
    En col·laboració amb la Universitat de Barcelona (UB) i la Universitat Rovira i Virgili (URV)Recent advances in wearable technology, accompanied by the decreasing cost of data storage and increase of data availability have made possible to take pictures everywhere at every time. Wearable cameras are nowadays among the most popular wearable devices. Besides leisure, wearable cameras are attracting a lot of attention for the improvement of working conditions, productivity and safety monitoring. Since the collected data can be potentially used for memory training and extracting lifestyle patterns useful to prevent noncommunicable diseases as obesity, they are being investigated in the context of Preventive Medicine. Most of these applications require to automatically recognize the ability performed by the user. This work aims to make a step forwards towards activity recognition from photo-streams captured by a wearable camera by developing a method that allows to label new images with minial effort from the user and generalize well for unseen users
    • …
    corecore