515 research outputs found
Connectionist Temporal Modeling for Weakly Supervised Action Labeling
We propose a weakly-supervised framework for action labeling in video, where
only the order of occurring actions is required during training time. The key
challenge is that the per-frame alignments between the input (video) and label
(action) sequences are unknown during training. We address this by introducing
the Extended Connectionist Temporal Classification (ECTC) framework to
efficiently evaluate all possible alignments via dynamic programming and
explicitly enforce their consistency with frame-to-frame visual similarities.
This protects the model from distractions of visually inconsistent or
degenerated alignments without the need of temporal supervision. We further
extend our framework to the semi-supervised case when a few frames are sparsely
annotated in a video. With less than 1% of labeled frames per video, our method
is able to outperform existing semi-supervised approaches and achieve
comparable performance to that of fully supervised approaches.Comment: To appear in ECCV 201
Leveraging triplet loss for unsupervised action segmentation
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper, we propose a novel fully unsupervised framework that learns action representations suitable for the action segmentation task from the single input video itself, without requiring any training data. Our method is a deep metric learning approach rooted in a shallow network with a triplet loss operating on similarity distributions and a novel triplet selection strategy that effectively models temporal and semantic priors to discover actions in the new representational space. Under these circumstances, we successfully recover temporal boundaries in the learned action representations with higher quality compared with existing unsupervised approaches. The proposed method is evaluated on two widely used benchmark datasets for the action segmentation task and it achieves competitive performance by applying a generic clustering algorithm on the learned representations.This work was supported by the project PID2019-110977GA-I00 funded by MCIN/ AEI/ 10.13039/501100011033 and by ”ESF Investing in your future”Peer ReviewedPostprint (author's final draft
Leveraging triplet loss for unsupervised action segmentation
In this paper, we propose a novel fully unsupervised framework that learns
action representations suitable for the action segmentation task from the
single input video itself, without requiring any training data. Our method is a
deep metric learning approach rooted in a shallow network with a triplet loss
operating on similarity distributions and a novel triplet selection strategy
that effectively models temporal and semantic priors to discover actions in the
new representational space. Under these circumstances, we successfully recover
temporal boundaries in the learned action representations with higher quality
compared with existing unsupervised approaches. The proposed method is
evaluated on two widely used benchmark datasets for the action segmentation
task and it achieves competitive performance by applying a generic clustering
algorithm on the learned representations.Comment: Accepted to the Workshop on Learning with Limited Labelled Data in
conjunction with CVPR 202
Video trajectory analysis using unsupervised clustering and multi-criteria ranking
Surveillance camera usage has increased significantly for visual surveillance. Manual analysis of large video data recorded by cameras may not be feasible on a larger scale. In various applications, deep learning-guided supervised systems are used to track and identify unusual patterns. However, such systems depend on learning which may not be possible. Unsupervised methods relay on suitable features and demand cluster analysis by experts. In this paper, we propose an unsupervised trajectory clustering method referred to as t-Cluster. Our proposed method prepares indexes of object trajectories by fusing high-level interpretable features such as origin, destination, path, and deviation. Next, the clusters are fused using multi-criteria decision making and trajectories are ranked accordingly. The method is able to place abnormal patterns on the top of the list. We have evaluated our algorithm and compared it against competent baseline trajectory clustering methods applied to videos taken from publicly available benchmark datasets. We have obtained higher clustering accuracies on public datasets with significantly lesser computation overhead
Trip purpose identification using pairwise constraints based semi-supervised clustering
Clustering of smart card data captured by automated fare collection (AFC) systems has traditionally
been viewed as an unsupervised method. However, the small number of labelled data points in addition
to the unlabelled smart card data can facilitate better partitioning and classification. In this paper, prior
knowledge about the activities is translated into pairwise constraints and used in COP-KMEANS
clustering algorithm to identify the trip purpose. The effectiveness of the method was evaluated by
comparison of the results with the ground truth. The results demonstrate that semi-supervised clustering
enhances the accuracy of the trip purpose identification
First-person activity recognition: how to generalize to unseen users?
En col·laboració amb la Universitat de Barcelona (UB) i la Universitat Rovira i Virgili (URV)Recent advances in wearable technology, accompanied by the decreasing cost of data storage
and increase of data availability have made possible to take pictures everywhere at every
time. Wearable cameras are nowadays among the most popular wearable devices. Besides
leisure, wearable cameras are attracting a lot of attention for the improvement of working
conditions, productivity and safety monitoring. Since the collected data can be potentially
used for memory training and extracting lifestyle patterns useful to prevent
noncommunicable diseases as obesity, they are being investigated in the context of
Preventive Medicine. Most of these applications require to automatically recognize the
ability performed by the user. This work aims to make a step forwards towards activity
recognition from photo-streams captured by a wearable camera by developing a method that
allows to label new images with minial effort from the user and generalize well for unseen
users
- …