6,909 research outputs found
A system for learning statistical motion patterns
Analysis of motion patterns is an effective approach for anomaly detection and behavior prediction. Current approaches for the analysis of motion patterns depend on known scenes, where objects move in predefined ways. It is highly desirable to automatically construct object motion patterns which reflect the knowledge of the scene. In this paper, we present a system for automatically learning motion patterns for anomaly detection and behavior prediction based on a proposed algorithm for robustly tracking multiple objects. In the tracking algorithm, foreground pixels are clustered using a fast accurate fuzzy k-means algorithm. Growing and prediction of the cluster centroids of foreground pixels ensure that each cluster centroid is associated with a moving object in the scene. In the algorithm for learning motion patterns, trajectories are clustered hierarchically using spatial and temporal information and then each motion pattern is represented with a chain of Gaussian distributions. Based on the learned statistical motion patterns, statistical methods are used to detect anomalies and predict behaviors. Our system is tested using image sequences acquired, respectively, from a crowded real traffic scene and a model traffic scene. Experimental results show the robustness of the tracking algorithm, the efficiency of the algorithm for learning motion patterns, and the encouraging performance of algorithms for anomaly detection and behavior prediction
A system for learning statistical motion patterns
Analysis of motion patterns is an effective approach for anomaly detection and behavior prediction. Current approaches for the analysis of motion patterns depend on known scenes, where objects move in predefined ways. It is highly desirable to automatically construct object motion patterns which reflect the knowledge of the scene. In this paper, we present a system for automatically learning motion patterns for anomaly detection and behavior prediction based on a proposed algorithm for robustly tracking multiple objects. In the tracking algorithm, foreground pixels are clustered using a fast accurate fuzzy k-means algorithm. Growing and prediction of the cluster centroids of foreground pixels ensure that each cluster centroid is associated with a moving object in the scene. In the algorithm for learning motion patterns, trajectories are clustered hierarchically using spatial and temporal information and then each motion pattern is represented with a chain of Gaussian distributions. Based on the learned statistical motion patterns, statistical methods are used to detect anomalies and predict behaviors. Our system is tested using image sequences acquired, respectively, from a crowded real traffic scene and a model traffic scene. Experimental results show the robustness of the tracking algorithm, the efficiency of the algorithm for learning motion patterns, and the encouraging performance of algorithms for anomaly detection and behavior prediction
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
Hierarchical Neural Memory Network for Low Latency Event Processing
This paper proposes a low latency neural network architecture for event-based
dense prediction tasks. Conventional architectures encode entire scene contents
at a fixed rate regardless of their temporal characteristics. Instead, the
proposed network encodes contents at a proper temporal scale depending on its
movement speed. We achieve this by constructing temporal hierarchy using
stacked latent memories that operate at different rates. Given low latency
event steams, the multi-level memories gradually extract dynamic to static
scene contents by propagating information from the fast to the slow memory
modules. The architecture not only reduces the redundancy of conventional
architectures but also exploits long-term dependencies. Furthermore, an
attention-based event representation efficiently encodes sparse event streams
into the memory cells. We conduct extensive evaluations on three event-based
dense prediction tasks, where the proposed approach outperforms the existing
methods on accuracy and latency, while demonstrating effective event and image
fusion capabilities. The code is available at https://hamarh.github.io/hmnet/Comment: Accepted to CVPR 202
HALSIE - Hybrid Approach to Learning Segmentation by Simultaneously Exploiting Image and Event Modalities
Standard frame-based algorithms fail to retrieve accurate segmentation maps
in challenging real-time applications like autonomous navigation, owing to the
limited dynamic range and motion blur prevalent in traditional cameras. Event
cameras address these limitations by asynchronously detecting changes in
per-pixel intensity to generate event streams with high temporal resolution,
high dynamic range, and no motion blur. However, event camera outputs cannot be
directly used to generate reliable segmentation maps as they only capture
information at the pixels in motion. To augment the missing contextual
information, we postulate that fusing spatially dense frames with temporally
dense events can generate semantic maps with fine-grained predictions. To this
end, we propose HALSIE, a hybrid approach to learning segmentation by
simultaneously leveraging image and event modalities. To enable efficient
learning across modalities, our proposed hybrid framework comprises two input
branches, a Spiking Neural Network (SNN) branch and a standard Artificial
Neural Network (ANN) branch to process event and frame data respectively, while
exploiting their corresponding neural dynamics. Our hybrid network outperforms
the state-of-the-art semantic segmentation benchmarks on DDD17 and MVSEC
datasets and shows comparable performance on the DSEC-Semantic dataset with
upto 33.23 reduction in network parameters. Further, our method shows
upto 18.92 improvement in inference cost compared to existing SOTA
approaches, making it suitable for resource-constrained edge applications
- …