7,083 research outputs found
Tukey-Inspired Video Object Segmentation
We investigate the problem of strictly unsupervised video object
segmentation, i.e., the separation of a primary object from background in video
without a user-provided object mask or any training on an annotated dataset. We
find foreground objects in low-level vision data using a John Tukey-inspired
measure of "outlierness". This Tukey-inspired measure also estimates the
reliability of each data source as video characteristics change (e.g., a camera
starts moving). The proposed method achieves state-of-the-art results for
strictly unsupervised video object segmentation on the challenging DAVIS
dataset. Finally, we use a variant of the Tukey-inspired measure to combine the
output of multiple segmentation methods, including those using supervision
during training, runtime, or both. This collectively more robust method of
segmentation improves the Jaccard measure of its constituent methods by as much
as 28%
Anomaly Detection and Localization in Crowded Scenes by Motion-field Shape Description and Similarity-based Statistical Learning
In crowded scenes, detection and localization of abnormal behaviors is
challenging in that high-density people make object segmentation and tracking
extremely difficult. We associate the optical flows of multiple frames to
capture short-term trajectories and introduce the histogram-based shape
descriptor referred to as shape contexts to describe such short-term
trajectories. Furthermore, we propose a K-NN similarity-based statistical model
to detect anomalies over time and space, which is an unsupervised one-class
learning algorithm requiring no clustering nor any prior assumption. Firstly,
we retrieve the K-NN samples from the training set in regard to the testing
sample, and then use the similarities between every pair of the K-NN samples to
construct a Gaussian model. Finally, the probabilities of the similarities from
the testing sample to the K-NN samples under the Gaussian model are calculated
in the form of a joint probability. Abnormal events can be detected by judging
whether the joint probability is below predefined thresholds in terms of time
and space, separately. Such a scheme can adapt to the whole scene, since the
probability computed as such is not affected by motion distortions arising from
perspective distortion. We conduct experiments on real-world surveillance
videos, and the results demonstrate that the proposed method can reliably
detect and locate the abnormal events in the video sequences, outperforming the
state-of-the-art approaches
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
A Survey Of Activity Recognition And Understanding The Behavior In Video Survelliance
This paper presents a review of human activity recognition and behaviour
understanding in video sequence. The key objective of this paper is to provide
a general review on the overall process of a surveillance system used in the
current trend. Visual surveillance system is directed on automatic
identification of events of interest, especially on tracking and classification
of moving objects. The processing step of the video surveillance system
includes the following stages: Surrounding model, object representation, object
tracking, activity recognition and behaviour understanding. It describes
techniques that use to define a general set of activities that are applicable
to a wide range of scenes and environments in video sequence.Comment: 14 pages, 5 figures, 5 table
A Survey on Content-Aware Video Analysis for Sports
Sports data analysis is becoming increasingly large-scale, diversified, and
shared, but difficulty persists in rapidly accessing the most crucial
information. Previous surveys have focused on the methodologies of sports video
analysis from the spatiotemporal viewpoint instead of a content-based
viewpoint, and few of these studies have considered semantics. This study
develops a deeper interpretation of content-aware sports video analysis by
examining the insight offered by research into the structure of content under
different scenarios. On the basis of this insight, we provide an overview of
the themes particularly relevant to the research on content-aware systems for
broadcast sports. Specifically, we focus on the video content analysis
techniques applied in sportscasts over the past decade from the perspectives of
fundamentals and general review, a content hierarchical model, and trends and
challenges. Content-aware analysis methods are discussed with respect to
object-, event-, and context-oriented groups. In each group, the gap between
sensation and content excitement must be bridged using proper strategies. In
this regard, a content-aware approach is required to determine user demands.
Finally, the paper summarizes the future trends and challenges for sports video
analysis. We believe that our findings can advance the field of research on
content-aware video analysis for broadcast sports.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems
for Video Technology (TCSVT
Object Detection by Spatio-Temporal Analysis and Tracking of the Detected Objects in a Video with Variable Background
In this paper we propose a novel approach for detecting and tracking objects
in videos with variable background i.e. videos captured by moving cameras
without any additional sensor. In a video captured by a moving camera, both the
background and foreground are changing in each frame of the image sequence. So
for these videos, modeling a single background with traditional background
modeling methods is infeasible and thus the detection of actual moving object
in a variable background is a challenging task. To detect actual moving object
in this work, spatio-temporal blobs have been generated in each frame by
spatio-temporal analysis of the image sequence using a three-dimensional Gabor
filter. Then individual blobs, which are parts of one object are merged using
Minimum Spanning Tree to form the moving object in the variable background. The
height, width and four-bin gray-value histogram of the object are calculated as
its features and an object is tracked in each frame using these features to
generate the trajectories of the object through the video sequence. In this
work, problem of data association during tracking is solved by Linear
Assignment Problem and occlusion is handled by the application of kalman
filter. The major advantage of our method over most of the existing tracking
algorithms is that, the proposed method does not require initialization in the
first frame or training on sample data to perform. Performance of the algorithm
has been tested on benchmark videos and very satisfactory result has been
achieved. The performance of the algorithm is also comparable and superior with
respect to some benchmark algorithms
A survey on trajectory clustering analysis
This paper comprehensively surveys the development of trajectory clustering.
Considering the critical role of trajectory data mining in modern intelligent
systems for surveillance security, abnormal behavior detection, crowd behavior
analysis, and traffic control, trajectory clustering has attracted growing
attention. Existing trajectory clustering methods can be grouped into three
categories: unsupervised, supervised and semi-supervised algorithms. In spite
of achieving a certain level of development, trajectory clustering is limited
in its success by complex conditions such as application scenarios and data
dimensions. This paper provides a holistic understanding and deep insight into
trajectory clustering, and presents a comprehensive analysis of representative
methods and promising future directions
Deep Curiosity Loops in Social Environments
Inspired by infants' intrinsic motivation to learn, which values informative
sensory channels contingent on their immediate social environment, we developed
a deep curiosity loop (DCL) architecture. The DCL is composed of a learner,
which attempts to learn a forward model of the agent's state-action transition,
and a novel reinforcement-learning (RL) component, namely, an
Action-Convolution Deep Q-Network, which uses the learner's prediction error as
reward. The environment for our agent is composed of visual social scenes,
composed of sitcom video streams, thereby both the learner and the RL are
constructed as deep convolutional neural networks. The agent's learner learns
to predict the zero-th order of the dynamics of visual scenes, resulting in
intrinsic rewards proportional to changes within its social environment. The
sources of these socially informative changes within the sitcom are
predominantly motions of faces and hands, leading to the unsupervised
curiosity-based learning of social interaction features. The face and hand
detection is represented by the value function and the social interaction
optical-flow is represented by the policy. Our results suggest that face and
hand detection are emergent properties of curiosity-based learning embedded in
social environments.Comment: 10 pages, 3 figures, submitted to NIPS 201
Dynamic Environment Prediction in Urban Scenes using Recurrent Representation Learning
A key challenge for autonomous driving is safe trajectory planning in
cluttered, urban environments with dynamic obstacles, such as pedestrians,
bicyclists, and other vehicles. A reliable prediction of the future
environment, including the behavior of dynamic agents, would allow planning
algorithms to proactively generate a trajectory in response to a rapidly
changing environment. We present a novel framework that predicts the future
occupancy state of the local environment surrounding an autonomous agent by
learning a motion model from occupancy grid data using a neural network. We
take advantage of the temporal structure of the grid data by utilizing a
convolutional long-short term memory network in the form of the PredNet
architecture. This method is validated on the KITTI dataset and demonstrates
higher accuracy and better predictive power than baseline methods.Comment: 8 pages, updated final draft, accepted into Intelligent
Transportation Systems Conference (ITSC) 201
T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos
The state-of-the-art performance for object detection has been significantly
improved over the past two years. Besides the introduction of powerful deep
neural networks such as GoogleNet and VGG, novel object detection frameworks
such as R-CNN and its successors, Fast R-CNN and Faster R-CNN, play an
essential role in improving the state-of-the-art. Despite their effectiveness
on still images, those frameworks are not specifically designed for object
detection from videos. Temporal and contextual information of videos are not
fully investigated and utilized. In this work, we propose a deep learning
framework that incorporates temporal and contextual information from tubelets
obtained in videos, which dramatically improves the baseline performance of
existing still-image detection frameworks when they are applied to videos. It
is called T-CNN, i.e. tubelets with convolutional neueral networks. The
proposed framework won the recently introduced object-detection-from-video
(VID) task with provided data in the ImageNet Large-Scale Visual Recognition
Challenge 2015 (ILSVRC2015).Comment: ImageNet 2015 VID challenge tech report. The first two authors share
co-first authorship. Accepted as a Transaction paper by T-CSVT Special Issue
on Large Scale and Nonlinear Similarity Learning for Intelligent Video
Analysi
- …