10,054 research outputs found
Segmental Spatiotemporal CNNs for Fine-grained Action Segmentation
Joint segmentation and classification of fine-grained actions is important
for applications of human-robot interaction, video surveillance, and human
skill evaluation. However, despite substantial recent progress in large-scale
action classification, the performance of state-of-the-art fine-grained action
recognition approaches remains low. We propose a model for action segmentation
which combines low-level spatiotemporal features with a high-level segmental
classifier. Our spatiotemporal CNN is comprised of a spatial component that
uses convolutional filters to capture information about objects and their
relationships, and a temporal component that uses large 1D convolutional
filters to capture information about how object relationships change across
time. These features are used in tandem with a semi-Markov model that models
transitions from one action to another. We introduce an efficient constrained
segmental inference algorithm for this model that is orders of magnitude faster
than the current approach. We highlight the effectiveness of our Segmental
Spatiotemporal CNN on cooking and surgical action datasets for which we observe
substantially improved performance relative to recent baseline methods.Comment: Updated from the ECCV 2016 version. We fixed an important
mathematical error and made the section on segmental inference cleare
CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
Temporal action localization is an important yet challenging problem. Given a
long, untrimmed video consisting of multiple action instances and complex
background contents, we need not only to recognize their action categories, but
also to localize the start time and end time of each instance. Many
state-of-the-art systems use segment-level classifiers to select and rank
proposal segments of pre-determined boundaries. However, a desirable model
should move beyond segment-level and make dense predictions at a fine
granularity in time to determine precise temporal boundaries. To this end, we
design a novel Convolutional-De-Convolutional (CDC) network that places CDC
filters on top of 3D ConvNets, which have been shown to be effective for
abstracting action semantics but reduce the temporal length of the input data.
The proposed CDC filter performs the required temporal upsampling and spatial
downsampling operations simultaneously to predict actions at the frame-level
granularity. It is unique in jointly modeling action semantics in space-time
and fine-grained temporal dynamics. We train the CDC network in an end-to-end
manner efficiently. Our model not only achieves superior performance in
detecting actions in every frame, but also significantly boosts the precision
of localizing temporal boundaries. Finally, the CDC network demonstrates a very
high efficiency with the ability to process 500 frames per second on a single
GPU server. We will update the camera-ready version and publish the source
codes online soon.Comment: IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
201
Action Recognition by Hierarchical Mid-level Action Elements
Realistic videos of human actions exhibit rich spatiotemporal structures at
multiple levels of granularity: an action can always be decomposed into
multiple finer-grained elements in both space and time. To capture this
intuition, we propose to represent videos by a hierarchy of mid-level action
elements (MAEs), where each MAE corresponds to an action-related spatiotemporal
segment in the video. We introduce an unsupervised method to generate this
representation from videos. Our method is capable of distinguishing
action-related segments from background segments and representing actions at
multiple spatiotemporal resolutions. Given a set of spatiotemporal segments
generated from the training data, we introduce a discriminative clustering
algorithm that automatically discovers MAEs at multiple levels of granularity.
We develop structured models that capture a rich set of spatial, temporal and
hierarchical relations among the segments, where the action label and multiple
levels of MAE labels are jointly inferred. The proposed model achieves
state-of-the-art performance in multiple action recognition benchmarks.
Moreover, we demonstrate the effectiveness of our model in real-world
applications such as action recognition in large-scale untrimmed videos and
action parsing
Big Data in Critical Infrastructures Security Monitoring: Challenges and Opportunities
Critical Infrastructures (CIs), such as smart power grids, transport systems,
and financial infrastructures, are more and more vulnerable to cyber threats,
due to the adoption of commodity computing facilities. Despite the use of
several monitoring tools, recent attacks have proven that current defensive
mechanisms for CIs are not effective enough against most advanced threats. In
this paper we explore the idea of a framework leveraging multiple data sources
to improve protection capabilities of CIs. Challenges and opportunities are
discussed along three main research directions: i) use of distinct and
heterogeneous data sources, ii) monitoring with adaptive granularity, and iii)
attack modeling and runtime combination of multiple data analysis techniques.Comment: EDCC-2014, BIG4CIP-201
- …