10,328 research outputs found
Detecting complex events in user-generated video using concept classifiers
Automatic detection of complex events in user-generated
videos (UGV) is a challenging task due to its new characteristics differing from broadcast video. In this work, we firstly summarize the new characteristics of UGV, and then explore how to utilize concept classifiers to recognize complex events in UGV content. The method starts from manually selecting a variety of relevant concepts, followed byconstructing classifiers for these concepts. Finally, complex event detectors are learned by using the concatenated probabilistic scores of these concept classifiers as features. Further, we also compare three different fusion operations of probabilistic scores, namely Maximum, Average and Minimum fusion. Experimental results suggest that our method provides promising results. It also shows that Maximum fusion tends to give better performance for most complex events
Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination
We present a method for assessing skill from video, applicable to a variety
of tasks, ranging from surgery to drawing and rolling pizza dough. We formulate
the problem as pairwise (who's better?) and overall (who's best?) ranking of
video collections, using supervised deep ranking. We propose a novel loss
function that learns discriminative features when a pair of videos exhibit
variance in skill, and learns shared features when a pair of videos exhibit
comparable skill levels. Results demonstrate our method is applicable across
tasks, with the percentage of correctly ordered pairs of videos ranging from
70% to 83% for four datasets. We demonstrate the robustness of our approach via
sensitivity analysis of its parameters. We see this work as effort toward the
automated organization of how-to video collections and overall, generic skill
determination in video.Comment: CVPR 201
Sparse And Low Rank Decomposition Based Batch Image Alignment for Speckle Reduction of retinal OCT Images
Optical Coherence Tomography (OCT) is an emerging technique in the field of
biomedical imaging, with applications in ophthalmology, dermatology, coronary
imaging etc. Due to the underlying physics, OCT images usually suffer from a
granular pattern, called speckle noise, which restricts the process of
interpretation. Here, a sparse and low rank decomposition based method is used
for speckle reduction in retinal OCT images. This technique works on input data
that consists of several B-scans of the same location. The next step is the
batch alignment of the images using a sparse and low-rank decomposition based
technique. Finally the denoised image is created by median filtering of the
low-rank component of the processed data. Simultaneous decomposition and
alignment of the images result in better performance in comparison to simple
registration-based methods that are used in the literature for noise reduction
of OCT images.Comment: Accepted for presentation at ISBI'1
Robust Mobile Object Tracking Based on Multiple Feature Similarity and Trajectory Filtering
This paper presents a new algorithm to track mobile objects in different
scene conditions. The main idea of the proposed tracker includes estimation,
multi-features similarity measures and trajectory filtering. A feature set
(distance, area, shape ratio, color histogram) is defined for each tracked
object to search for the best matching object. Its best matching object and its
state estimated by the Kalman filter are combined to update position and size
of the tracked object. However, the mobile object trajectories are usually
fragmented because of occlusions and misdetections. Therefore, we also propose
a trajectory filtering, named global tracker, aims at removing the noisy
trajectories and fusing the fragmented trajectories belonging to a same mobile
object. The method has been tested with five videos of different scene
conditions. Three of them are provided by the ETISEO benchmarking project
(http://www-sop.inria.fr/orion/ETISEO) in which the proposed tracker
performance has been compared with other seven tracking algorithms. The
advantages of our approach over the existing state of the art ones are: (i) no
prior knowledge information is required (e.g. no calibration and no contextual
models are needed), (ii) the tracker is more reliable by combining multiple
feature similarities, (iii) the tracker can perform in different scene
conditions: single/several mobile objects, weak/strong illumination,
indoor/outdoor scenes, (iv) a trajectory filtering is defined and applied to
improve the tracker performance, (v) the tracker performance outperforms many
algorithms of the state of the art
Recommended from our members
Exploiting Concepts In Videos For Video Event Detection
Video event detection is the task of searching videos for events of interest to a user where an event is a complex activity which is localized in time and space. The video event detection problem has gained more importance as the amount of online video is increasing by more than 300 hours every minute on Youtube alone.
In this thesis, we tackle three major video event detection problems: video event detection with exemplars (VED-ex), where a large number of example videos are associated with queries; video event detection with few exemplars (VED-ex_few), in which only a small number of example videos are associated with queries; and zero-shot video event detection (VED-zero), where no exemplar videos are associated with queries.
We first define a new way of describing videos concisely, one that is built around using query-independent concepts (e.g., a fixed set of concepts for all queries) with a space-efficient representation. Using query-independent concepts enables us to learn a retrieval model for any query without requiring a new set of concepts. Our space-efficient representation helps reduce the amount of time required to train/test a retrieval model and the amount of space to store video representations on disk.
When the number of example videos associated with a query decreases, the retrieval accuracy decreases as well. We present a method that incorporates multiple one-exemplar models into video event detection aiming at improving retrieval accuracies when there are few exemplars available. By incorporating multiple one-exemplar models into video event detection with few exemplars, we are able to obtain significant improvements in terms of mean average precision compared to the case of a monolithic model.
Having no exemplar videos associated with queries makes the video event detection problem more challenging as we cannot train a retrieval model using example videos. It is also more realistic since compiling a number of example videos might be costly. We tackle this problem by providing a new and effective zero-shot video event detection model that exploits dependencies of concepts in videos. Our dependency work uses a Markov Random Field (MRF) based retrieval model and assumes three dependency settings: 1) full independence, where each concept is considered independently; 2) spatial dependence, where the co-occurrence of two concepts in the same video frame is treated as important; and 3) temporal dependence, where having concepts co-occur in consecutive frames is treated as important. Our MRF based retrieval model improves retrieval accuracies significantly compared to the common bag-of-concepts approach with an independence assumption
A robust and efficient video representation for action recognition
This paper introduces a state-of-the-art video representation and applies it
to efficient action recognition and detection. We first propose to improve the
popular dense trajectory features by explicit camera motion estimation. More
specifically, we extract feature point matches between frames using SURF
descriptors and dense optical flow. The matches are used to estimate a
homography with RANSAC. To improve the robustness of homography estimation, a
human detector is employed to remove outlier matches from the human body as
human motion is not constrained by the camera. Trajectories consistent with the
homography are considered as due to camera motion, and thus removed. We also
use the homography to cancel out camera motion from the optical flow. This
results in significant improvement on motion-based HOF and MBH descriptors. We
further explore the recent Fisher vector as an alternative feature encoding
approach to the standard bag-of-words histogram, and consider different ways to
include spatial layout information in these encodings. We present a large and
varied set of evaluations, considering (i) classification of short basic
actions on six datasets, (ii) localization of such actions in feature-length
movies, and (iii) large-scale recognition of complex events. We find that our
improved trajectory features significantly outperform previous dense
trajectories, and that Fisher vectors are superior to bag-of-words encodings
for video recognition tasks. In all three tasks, we show substantial
improvements over the state-of-the-art results
Meta-Auxiliary Learning for Adaptive Human Pose Prediction
Predicting high-fidelity future human poses, from a historically observed
sequence, is decisive for intelligent robots to interact with humans. Deep
end-to-end learning approaches, which typically train a generic pre-trained
model on external datasets and then directly apply it to all test samples,
emerge as the dominant solution to solve this issue. Despite encouraging
progress, they remain non-optimal, as the unique properties (e.g., motion
style, rhythm) of a specific sequence cannot be adapted. More generally, at
test-time, once encountering unseen motion categories (out-of-distribution),
the predicted poses tend to be unreliable. Motivated by this observation, we
propose a novel test-time adaptation framework that leverages two
self-supervised auxiliary tasks to help the primary forecasting network adapt
to the test sequence. In the testing phase, our model can adjust the model
parameters by several gradient updates to improve the generation quality.
However, due to catastrophic forgetting, both auxiliary tasks typically tend to
the low ability to automatically present the desired positive incentives for
the final prediction performance. For this reason, we also propose a
meta-auxiliary learning scheme for better adaptation. In terms of general
setup, our approach obtains higher accuracy, and under two new experimental
designs for out-of-distribution data (unseen subjects and categories), achieves
significant improvements.Comment: 10 pages, 6 figures, AAAI 2023 accepte
- …