4 research outputs found
FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras
In this paper, we develop deep spatio-temporal neural networks to
sequentially count vehicles from low quality videos captured by city cameras
(citycams). Citycam videos have low resolution, low frame rate, high occlusion
and large perspective, making most existing methods lose their efficacy. To
overcome limitations of existing methods and incorporate the temporal
information of traffic video, we design a novel FCN-rLSTM network to jointly
estimate vehicle density and vehicle count by connecting fully convolutional
neural networks (FCN) with long short term memory networks (LSTM) in a residual
learning fashion. Such design leverages the strengths of FCN for pixel-level
prediction and the strengths of LSTM for learning complex temporal dynamics.
The residual learning connection reformulates the vehicle count regression as
learning residual functions with reference to the sum of densities in each
frame, which significantly accelerates the training of networks. To preserve
feature map resolution, we propose a Hyper-Atrous combination to integrate
atrous convolution in FCN and combine feature maps of different convolution
layers. FCN-rLSTM enables refined feature representation and a novel end-to-end
trainable mapping from pixels to vehicle count. We extensively evaluated the
proposed method on different counting tasks with three datasets, with
experimental results demonstrating their effectiveness and robustness. In
particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21
on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process
is accelerated by 5 times on average.Comment: Accepted by International Conference on Computer Vision (ICCV), 201
Understanding Traffic Density from Large-Scale Web Camera Data
Understanding traffic density from large-scale web camera (webcam) videos is
a challenging problem because such videos have low spatial and temporal
resolution, high occlusion and large perspective. To deeply understand traffic
density, we explore both deep learning based and optimization based methods. To
avoid individual vehicle detection and tracking, both methods map the image
into vehicle density map, one based on rank constrained regression and the
other one based on fully convolution networks (FCN). The regression based
method learns different weights for different blocks in the image to increase
freedom degrees of weights and embed perspective information. The FCN based
method jointly estimates vehicle density map and vehicle count with a residual
learning framework to perform end-to-end dense prediction, allowing arbitrary
image resolution, and adapting to different vehicle scales and perspectives. We
analyze and compare both methods, and get insights from optimization based
method to improve deep model. Since existing datasets do not cover all the
challenges in our work, we collected and labelled a large-scale traffic video
dataset, containing 60 million frames from 212 webcams. Both methods are
extensively evaluated and compared on different counting tasks and datasets.
FCN based method significantly reduces the mean absolute error from 10.99 to
5.31 on the public dataset TRANCOS compared with the state-of-the-art baseline.Comment: Accepted by CVPR 2017. Preprint version was uploaded on
http://welcome.isr.tecnico.ulisboa.pt/publications/understanding-traffic-density-from-large-scale-web-camera-data
Moving Object Based Collision-Free Video Synopsis
Video synopsis, summarizing a video to generate a shorter video by exploiting
the spatial and temporal redundancies, is important for surveillance and
archiving. Existing trajectory-based video synopsis algorithms will not able to
work in real time, because of the complexity due to the number of object tubes
that need to be included in the complex energy minimization algorithm. We
propose a real-time algorithm by using a method that incrementally stitches
each frame of the synopsis by extracting object frames from the user specified
number of tubes in the buffer in contrast to global energy-minimization based
systems. This also gives flexibility to the user to set the threshold of
maximum number of objects in the synopsis video according his or her tracking
ability and creates collision-free summarized videos which are visually
pleasing. Experiments with six common test videos, indoors and outdoors with
many moving objects, show that the proposed video synopsis algorithm produces
better frame reduction rates than existing approaches.Comment: The summarized output videos are available at
https://anton-jeran.github.io/M2SYN
Forensic Video Analytic Software
Law enforcement officials heavily depend on Forensic Video Analytic (FVA)
Software in their evidence extraction process. However present-day FVA software
are complex, time consuming, equipment dependent and expensive. Developing
countries struggle to gain access to this gateway to a secure haven. The term
forensic pertains the application of scientific methods to the investigation of
crime through post-processing, whereas surveillance is the close monitoring of
real-time feeds.
The principle objective of this Final Year Project was to develop an
efficient and effective FVA Software, addressing the shortcomings through a
stringent and systematic review of scholarly research papers, online databases
and legal documentation. The scope spans multiple object detection, multiple
object tracking, anomaly detection, activity recognition, tampering detection,
general and specific image enhancement and video synopsis.
Methods employed include many machine learning techniques, GPU acceleration
and efficient, integrated architecture development both for real-time and
postprocessing. For this CNN, GMM, multithreading and OpenCV C++ coding were
used. The implications of the proposed methodology would rapidly speed up the
FVA process especially through the novel video synopsis research arena. This
project has resulted in three research outcomes Moving Object Based Collision
Free Video Synopsis, Forensic and Surveillance Analytic Tool Architecture and
Tampering Detection Inter-Frame Forgery.
The results include forensic and surveillance panel outcomes with emphasis on
video synopsis and Sri Lankan context. Principal conclusions include the
optimization and efficient algorithm integration to overcome limitations in
processing power, memory and compromise between real-time performance and
accuracy.Comment: The Forensic Video Analytic Software demo video is available
https://www.youtube.com/watch?v=vsZlYKQxSk