33,290 research outputs found
Hyper RPCA: Joint Maximum Correntropy Criterion and Laplacian Scale Mixture Modeling On-the-Fly for Moving Object Detection
Moving object detection is critical for automated video analysis in many
vision-related tasks, such as surveillance tracking, video compression coding,
etc. Robust Principal Component Analysis (RPCA), as one of the most popular
moving object modelling methods, aims to separate the temporally varying (i.e.,
moving) foreground objects from the static background in video, assuming the
background frames to be low-rank while the foreground to be spatially sparse.
Classic RPCA imposes sparsity of the foreground component using l1-norm, and
minimizes the modeling error via 2-norm. We show that such assumptions can be
too restrictive in practice, which limits the effectiveness of the classic
RPCA, especially when processing videos with dynamic background, camera jitter,
camouflaged moving object, etc. In this paper, we propose a novel RPCA-based
model, called Hyper RPCA, to detect moving objects on the fly. Different from
classic RPCA, the proposed Hyper RPCA jointly applies the maximum correntropy
criterion (MCC) for the modeling error, and Laplacian scale mixture (LSM) model
for foreground objects. Extensive experiments have been conducted, and the
results demonstrate that the proposed Hyper RPCA has competitive performance
for foreground detection to the state-of-the-art algorithms on several
well-known benchmark datasets
Background Subtraction in Real Applications: Challenges, Current Models and Future Directions
Computer vision applications based on videos often require the detection of
moving objects in their first step. Background subtraction is then applied in
order to separate the background and the foreground. In literature, background
subtraction is surely among the most investigated field in computer vision
providing a big amount of publications. Most of them concern the application of
mathematical and machine learning models to be more robust to the challenges
met in videos. However, the ultimate goal is that the background subtraction
methods developed in research could be employed in real applications like
traffic surveillance. But looking at the literature, we can remark that there
is often a gap between the current methods used in real applications and the
current methods in fundamental research. In addition, the videos evaluated in
large-scale datasets are not exhaustive in the way that they only covered a
part of the complete spectrum of the challenges met in real applications. In
this context, we attempt to provide the most exhaustive survey as possible on
real applications that used background subtraction in order to identify the
real challenges met in practice, the current used background models and to
provide future directions. Thus, challenges are investigated in terms of
camera, foreground objects and environments. In addition, we identify the
background models that are effectively used in these applications in order to
find potential usable recent background models in terms of robustness, time and
memory requirements.Comment: Submitted to Computer Science Revie
AI Oriented Large-Scale Video Management for Smart City: Technologies, Standards and Beyond
Deep learning has achieved substantial success in a series of tasks in
computer vision. Intelligent video analysis, which can be broadly applied to
video surveillance in various smart city applications, can also be driven by
such powerful deep learning engines. To practically facilitate deep neural
network models in the large-scale video analysis, there are still unprecedented
challenges for the large-scale video data management. Deep feature coding,
instead of video coding, provides a practical solution for handling the
large-scale video surveillance data. To enable interoperability in the context
of deep feature coding, standardization is urgent and important. However, due
to the explosion of deep learning algorithms and the particularity of feature
coding, there are numerous remaining problems in the standardization process.
This paper envisions the future deep feature coding standard for the AI
oriented large-scale video management, and discusses existing techniques,
standards and possible solutions for these open problems.Comment: 8 pages, 8 figures, 5 table
Real Time Object Tracking Based on Inter-frame Coding: A Review
Inter-frame Coding plays significant role for video Compression and Computer
Vision. Computer vision systems have been incorporated in many real life
applications (e.g. surveillance systems, medical imaging, robot navigation and
identity verification systems). Object tracking is a key computer vision topic,
which aims at detecting the position of a moving object from a video sequence.
The application of Inter-frame Coding for low frame rate video, as well as for
low resolution video. Various methods based on Top-down approach just like
kernel based or mean shift technique are used to track the object for video,
So, Inter-frame Coding algorithms are widely adopted by video coding standards,
mainly due to their simplicity and good distortion performance for object
tracking.Comment: 4 page
Unsupervised Object-Level Video Summarization with Online Motion Auto-Encoder
Unsupervised video summarization plays an important role on digesting,
browsing, and searching the ever-growing videos every day, and the underlying
fine-grained semantic and motion information (i.e., objects of interest and
their key motions) in online videos has been barely touched. In this paper, we
investigate a pioneer research direction towards the fine-grained unsupervised
object-level video summarization. It can be distinguished from existing
pipelines in two aspects: extracting key motions of participated objects, and
learning to summarize in an unsupervised and online manner. To achieve this
goal, we propose a novel online motion Auto-Encoder (online motion-AE)
framework that functions on the super-segmented object motion clips.
Comprehensive experiments on a newly-collected surveillance dataset and public
datasets have demonstrated the effectiveness of our proposed method
Region of Interest (ROI) Coding for Aerial Surveillance Video using AVC & HEVC
Aerial surveillance from Unmanned Aerial Vehicles (UAVs), i.e. with moving
cameras, is of growing interest for police as well as disaster area monitoring.
For more detailed ground images the camera resolutions are steadily increasing.
Simultaneously the amount of video data to transmit is increasing
significantly, too. To reduce the amount of data, Region of Interest (ROI)
coding systems were introduced which mainly encode some regions in higher
quality at the cost of the remaining image regions. We employ an existing ROI
coding system relying on global motion compensation to retain full image
resolution over the entire image. Different ROI detectors are used to
automatically classify a video image on board of the UAV in ROI and non-ROI. We
propose to replace the modified Advanced Video Coding (AVC) video encoder by a
modified High Efficiency Video Coding (HEVC) encoder. Without any change of the
detection system itself, but by replacing the video coding back-end we are able
to improve the coding efficiency by 32% on average although regular HEVC
provides coding gains of 12-30% only for the same test sequences and similar
PSNR compared to regular AVC coding. Since the employed ROI coding mainly
relies on intra mode coding of new emerging image areas, gains of HEVC-ROI
coding over AVC-ROI coding compared to regular coding of the entire frames
including predictive modes (inter) depend on sequence characteristics. We
present a detailed analysis of bit distribution within the frames to explain
the gains. In total we can provide coding data rates of 0.7-1.0 Mbit/s for full
HDTV video sequences at 30 fps at reasonable quality of more than 37 dB.Comment: 5 pages, 7 figures, 1 tabl
VStore: A Data Store for Analytics on Large Videos
We present VStore, a data store for supporting fast, resource-efficient
analytics over large archival videos. VStore manages video ingestion, storage,
retrieval, and consumption. It controls video formats along the video data
path. It is challenged by i) the huge combinatorial space of video format
knobs; ii) the complex impacts of these knobs and their high profiling cost;
iii) optimizing for multiple resource types. It explores an idea called
backward derivation of configuration: in the opposite direction along the video
data path, VStore passes the video quantity and quality expected by analytics
backward to retrieval, to storage, and to ingestion. In this process, VStore
derives an optimal set of video formats, optimizing for different resources in
a progressive manner. VStore automatically derives large, complex
configurations consisting of more than one hundred knobs over tens of video
formats. In response to queries, VStore selects video formats catering to the
executed operators and the target accuracy. It streams video data from disks
through decoder to operators. It runs queries as fast as 362x of video
realtime
Unsupervised Synthesis of Anomalies in Videos: Transforming the Normal
Abnormal activity recognition requires detection of occurrence of anomalous
events that suffer from a severe imbalance in data. In a video, normal is used
to describe activities that conform to usual events while the irregular events
which do not conform to the normal are referred to as abnormal. It is far more
common to observe normal data than to obtain abnormal data in visual
surveillance. In this paper, we propose an approach where we can obtain
abnormal data by transforming normal data. This is a challenging task that is
solved through a multi-stage pipeline approach. We utilize a number of
techniques from unsupervised segmentation in order to synthesize new samples of
data that are transformed from an existing set of normal examples. Further,
this synthesis approach has useful applications as a data augmentation
technique. An incrementally trained Bayesian convolutional neural network (CNN)
is used to carefully select the set of abnormal samples that can be added.
Finally through this synthesis approach we obtain a comparable set of abnormal
samples that can be used for training the CNN for the classification of normal
vs abnormal samples. We show that this method generalizes to multiple settings
by evaluating it on two real world datasets and achieves improved performance
over other probabilistic techniques that have been used in the past for this
task.Comment: Accepted in IJCNN 201
Evaluation of Object Trackers in Distorted Surveillance Videos
Object tracking in realistic scenarios is a difficult problem affected by
various image factors such as occlusion, clutter, confusion, object shape,
unstable speed, and zooming. While these conditions do affect tracking
performance, there is no clear distinction between the scene dependent
challenges like occlusion, clutter, etc., and the challenges imposed by
traditional notions of impairments from capture, compression, processing, and
transmission. This paper is concerned with the latter interpretation of quality
as it affects video tracking performance. In this work we aim to evaluate two
state-of-the-art trackers (STRUCK and TLD) systematically and experimentally in
surveillance videos affected by in-capture distortions such as under-exposure
and defocus. We evaluate these trackers with the area under curve (AUC) values
of success plots and precision curves. In spite of the fact that STRUCK and TLD
have ranked high in video tracking surveys. This study concludes that incapture
distortions severely affect the performance of these trackers. For this reason,
the design and construction of a robust tracker with respect to these
distortions remains an open question that can be answered by creating
algorithms that makes use of perceptual features to compensate the degradations
provided by these distortions.Comment: 5 pages, 8 figures, presented in SPSWSIVA 201
Anomaly Detection and Localization in Crowded Scenes by Motion-field Shape Description and Similarity-based Statistical Learning
In crowded scenes, detection and localization of abnormal behaviors is
challenging in that high-density people make object segmentation and tracking
extremely difficult. We associate the optical flows of multiple frames to
capture short-term trajectories and introduce the histogram-based shape
descriptor referred to as shape contexts to describe such short-term
trajectories. Furthermore, we propose a K-NN similarity-based statistical model
to detect anomalies over time and space, which is an unsupervised one-class
learning algorithm requiring no clustering nor any prior assumption. Firstly,
we retrieve the K-NN samples from the training set in regard to the testing
sample, and then use the similarities between every pair of the K-NN samples to
construct a Gaussian model. Finally, the probabilities of the similarities from
the testing sample to the K-NN samples under the Gaussian model are calculated
in the form of a joint probability. Abnormal events can be detected by judging
whether the joint probability is below predefined thresholds in terms of time
and space, separately. Such a scheme can adapt to the whole scene, since the
probability computed as such is not affected by motion distortions arising from
perspective distortion. We conduct experiments on real-world surveillance
videos, and the results demonstrate that the proposed method can reliably
detect and locate the abnormal events in the video sequences, outperforming the
state-of-the-art approaches
- …