24,907 research outputs found
A Survey Of Activity Recognition And Understanding The Behavior In Video Survelliance
This paper presents a review of human activity recognition and behaviour
understanding in video sequence. The key objective of this paper is to provide
a general review on the overall process of a surveillance system used in the
current trend. Visual surveillance system is directed on automatic
identification of events of interest, especially on tracking and classification
of moving objects. The processing step of the video surveillance system
includes the following stages: Surrounding model, object representation, object
tracking, activity recognition and behaviour understanding. It describes
techniques that use to define a general set of activities that are applicable
to a wide range of scenes and environments in video sequence.Comment: 14 pages, 5 figures, 5 table
Machine Learning Methods for Data Association in Multi-Object Tracking
Data association is a key step within the multi-object tracking pipeline that
is notoriously challenging due to its combinatorial nature. A popular and
general way to formulate data association is as the NP-hard multidimensional
assignment problem (MDAP). Over the last few years, data-driven approaches to
assignment have become increasingly prevalent as these techniques have started
to mature. We focus this survey solely on learning algorithms for the
assignment step of multi-object tracking, and we attempt to unify various
methods by highlighting their connections to linear assignment as well as to
the MDAP. First, we review probabilistic and end-to-end optimization approaches
to data association, followed by methods that learn association affinities from
data. We then compare the performance of the methods presented in this survey,
and conclude by discussing future research directions.Comment: Accepted for publication in ACM Computing Survey
A Large Scale Urban Surveillance Video Dataset for Multiple-Object Tracking and Behavior Analysis
Multiple-object tracking and behavior analysis have been the essential parts
of surveillance video analysis for public security and urban management. With
billions of surveillance video captured all over the world, multiple-object
tracking and behavior analysis by manual labor are cumbersome and cost
expensive. Due to the rapid development of deep learning algorithms in recent
years, automatic object tracking and behavior analysis put forward an urgent
demand on a large scale well-annotated surveillance video dataset that can
reflect the diverse, congested, and complicated scenarios in real applications.
This paper introduces an urban surveillance video dataset (USVD) which is by
far the largest and most comprehensive. The dataset consists of 16 scenes
captured in 7 typical outdoor scenarios: street, crossroads, hospital entrance,
school gate, park, pedestrian mall, and public square. Over 200k video frames
are annotated carefully, resulting in more than 3:7 million object bounding
boxes and about 7:1 thousand trajectories. We further use this dataset to
evaluate the performance of typical algorithms for multiple-object tracking and
anomaly behavior analysis and explore the robustness of these methods in urban
congested scenarios.Comment: 6 pages. This dataset are not available due to the data licens
A Scalable Platform for Distributed Object Tracking across a Many-camera Network
Advances in deep neural networks (DNN) and computer vision (CV) algorithms
have made it feasible to extract meaningful insights from large-scale
deployments of urban cameras. Tracking an object of interest across the camera
network in near real-time is a canonical problem. However, current tracking
platforms have two key limitations: 1) They are monolithic, proprietary and
lack the ability to rapidly incorporate sophisticated tracking models; and 2)
They are less responsive to dynamism across wide-area computing resources that
include edge, fog and cloud abstractions. We address these gaps using Anveshak,
a runtime platform for composing and coordinating distributed tracking
applications. It provides a domain-specific dataflow programming model to
intuitively compose a tracking application, supporting contemporary CV advances
like query fusion and re-identification, and enabling dynamic scoping of the
camera network's search space to avoid wasted computation. We also offer
tunable batching and data-dropping strategies for dataflow blocks deployed on
distributed resources to respond to network and compute variability. These
balance the tracking accuracy, its real-time performance and the active
camera-set size. We illustrate the concise expressiveness of the programming
model for tracking applications. Our detailed experiments for a network of
1000 camera-feeds on modest resources exhibit the tunable scalability,
performance and quality trade-offs enabled by our dynamic tracking, batching
and dropping strategies
Integrating Graph Partitioning and Matching for Trajectory Analysis in Video Surveillance
In order to track the moving objects in long range against occlusion,
interruption, and background clutter, this paper proposes a unified approach
for global trajectory analysis. Instead of the traditional frame-by-frame
tracking, our method recovers target trajectories based on a short sequence of
video frames, e.g. frames. We initially calculate a foreground map at each
frame, as obtained from a state-of-the-art background model. An attribute graph
is then extracted from the foreground map, where the graph vertices are image
primitives represented by the composite features. With this graph
representation, we pose trajectory analysis as a joint task of spatial graph
partitioning and temporal graph matching. The task can be formulated by
maximizing a posteriori under the Bayesian framework, in which we integrate the
spatio-temporal contexts and the appearance models. The probabilistic inference
is achieved by a data-driven Markov Chain Monte Carlo (MCMC) algorithm. Given a
peroid of observed frames, the algorithm simulates a ergodic and aperiodic
Markov Chain, and it visits a sequence of solution states in the joint space of
spatial graph partitioning and temporal graph matching. In the experiments, our
method is tested on several challenging videos from the public datasets of
visual surveillance, and it outperforms the state-of-the-art methods.Comment: 10 pages, 12 figure
Design Challenges of Multi-UAV Systems in Cyber-Physical Applications: A Comprehensive Survey, and Future Directions
Unmanned Aerial Vehicles (UAVs) have recently rapidly grown to facilitate a
wide range of innovative applications that can fundamentally change the way
cyber-physical systems (CPSs) are designed. CPSs are a modern generation of
systems with synergic cooperation between computational and physical potentials
that can interact with humans through several new mechanisms. The main
advantages of using UAVs in CPS application is their exceptional features,
including their mobility, dynamism, effortless deployment, adaptive altitude,
agility, adjustability, and effective appraisal of real-world functions anytime
and anywhere. Furthermore, from the technology perspective, UAVs are predicted
to be a vital element of the development of advanced CPSs. Therefore, in this
survey, we aim to pinpoint the most fundamental and important design challenges
of multi-UAV systems for CPS applications. We highlight key and versatile
aspects that span the coverage and tracking of targets and infrastructure
objects, energy-efficient navigation, and image analysis using machine learning
for fine-grained CPS applications. Key prototypes and testbeds are also
investigated to show how these practical technologies can facilitate CPS
applications. We present and propose state-of-the-art algorithms to address
design challenges with both quantitative and qualitative methods and map these
challenges with important CPS applications to draw insightful conclusions on
the challenges of each application. Finally, we summarize potential new
directions and ideas that could shape future research in these areas
Background Subtraction in Real Applications: Challenges, Current Models and Future Directions
Computer vision applications based on videos often require the detection of
moving objects in their first step. Background subtraction is then applied in
order to separate the background and the foreground. In literature, background
subtraction is surely among the most investigated field in computer vision
providing a big amount of publications. Most of them concern the application of
mathematical and machine learning models to be more robust to the challenges
met in videos. However, the ultimate goal is that the background subtraction
methods developed in research could be employed in real applications like
traffic surveillance. But looking at the literature, we can remark that there
is often a gap between the current methods used in real applications and the
current methods in fundamental research. In addition, the videos evaluated in
large-scale datasets are not exhaustive in the way that they only covered a
part of the complete spectrum of the challenges met in real applications. In
this context, we attempt to provide the most exhaustive survey as possible on
real applications that used background subtraction in order to identify the
real challenges met in practice, the current used background models and to
provide future directions. Thus, challenges are investigated in terms of
camera, foreground objects and environments. In addition, we identify the
background models that are effectively used in these applications in order to
find potential usable recent background models in terms of robustness, time and
memory requirements.Comment: Submitted to Computer Science Revie
Multiple Object Tracking: A Literature Review
Multiple Object Tracking (MOT) is an important computer vision problem which
has gained increasing attention due to its academic and commercial potential.
Although different kinds of approaches have been proposed to tackle this
problem, it still remains challenging due to factors like abrupt appearance
changes and severe object occlusions. In this work, we contribute the first
comprehensive and most recent review on this problem. We inspect the recent
advances in various aspects and propose some interesting directions for future
research. To the best of our knowledge, there has not been any extensive review
on this topic in the community. We endeavor to provide a thorough review on the
development of this problem in recent decades. The main contributions of this
review are fourfold: 1) Key aspects in a multiple object tracking system,
including formulation, categorization, key principles, evaluation of an MOT are
discussed. 2) Instead of enumerating individual works, we discuss existing
approaches according to various aspects, in each of which methods are divided
into different groups and each group is discussed in detail for the principles,
advances and drawbacks. 3) We examine experiments of existing publications and
summarize results on popular datasets to provide quantitative comparisons. We
also point to some interesting discoveries by analyzing these results. 4) We
provide a discussion about issues of MOT research, as well as some interesting
directions which could possibly become potential research effort in the future
Long-Term Identity-Aware Multi-Person Tracking for Surveillance Video Summarization
Multi-person tracking plays a critical role in the analysis of surveillance
video. However, most existing work focus on shorter-term (e.g. minute-long or
hour-long) video sequences. Therefore, we propose a multi-person tracking
algorithm for very long-term (e.g. month-long) multi-camera surveillance
scenarios. Long-term tracking is challenging because 1) the apparel/appearance
of the same person will vary greatly over multiple days and 2) a person will
leave and re-enter the scene numerous times. To tackle these challenges, we
leverage face recognition information, which is robust to apparel change, to
automatically reinitialize our tracker over multiple days of recordings.
Unfortunately, recognized faces are unavailable oftentimes. Therefore, our
tracker propagates identity information to frames without recognized faces by
uncovering the appearance and spatial manifold formed by person detections. We
tested our algorithm on a 23-day 15-camera data set (4,935 hours total), and we
were able to localize a person 53.2% of the time with 69.8% precision. We
further performed video summarization experiments based on our tracking output.
Results on 116.25 hours of video showed that we were able to generate a
reasonable visual diary (i.e. a summary of what a person did) for different
people, thus potentially opening the door to automatic summarization of the
vast amount of surveillance video generated every day
Intelligent Intersection: Two-Stream Convolutional Networks for Real-time Near Accident Detection in Traffic Video
In Intelligent Transportation System, real-time systems that monitor and
analyze road users become increasingly critical as we march toward the smart
city era. Vision-based frameworks for Object Detection, Multiple Object
Tracking, and Traffic Near Accident Detection are important applications of
Intelligent Transportation System, particularly in video surveillance and etc.
Although deep neural networks have recently achieved great success in many
computer vision tasks, a uniformed framework for all the three tasks is still
challenging where the challenges multiply from demand for real-time
performance, complex urban setting, highly dynamic traffic event, and many
traffic movements. In this paper, we propose a two-stream Convolutional Network
architecture that performs real-time detection, tracking, and near accident
detection of road users in traffic video data. The two-stream model consists of
a spatial stream network for Object Detection and a temporal stream network to
leverage motion features for Multiple Object Tracking. We detect near accidents
by incorporating appearance features and motion features from two-stream
networks. Using aerial videos, we propose a Traffic Near Accident Dataset
(TNAD) covering various types of traffic interactions that is suitable for
vision-based traffic analysis tasks. Our experiments demonstrate the advantage
of our framework with an overall competitive qualitative and quantitative
performance at high frame rates on the TNAD dataset.Comment: Submitted to ACM Transactions on Spatial Algorithms and Systems
(TSAS); Special issue on Urban Mobility: Algorithms and Systems. arXiv admin
note: text overlap with arXiv:1703.07402 by other author
- …