23,240 research outputs found
Unsupervised Object-Level Video Summarization with Online Motion Auto-Encoder
Unsupervised video summarization plays an important role on digesting,
browsing, and searching the ever-growing videos every day, and the underlying
fine-grained semantic and motion information (i.e., objects of interest and
their key motions) in online videos has been barely touched. In this paper, we
investigate a pioneer research direction towards the fine-grained unsupervised
object-level video summarization. It can be distinguished from existing
pipelines in two aspects: extracting key motions of participated objects, and
learning to summarize in an unsupervised and online manner. To achieve this
goal, we propose a novel online motion Auto-Encoder (online motion-AE)
framework that functions on the super-segmented object motion clips.
Comprehensive experiments on a newly-collected surveillance dataset and public
datasets have demonstrated the effectiveness of our proposed method
Depth Sequence Coding with Hierarchical Partitioning and Spatial-domain Quantisation
Depth coding in 3D-HEVC for the multiview video plus depth (MVD) architecture
(i) deforms object shapes due to block-level edge-approximation; (ii) misses an
opportunity for high compressibility at near-lossless quality by failing to
exploit strong homogeneity (clustering tendency) in depth syntax, motion vector
components, and residuals at frame-level; and (iii) restricts interactivity and
limits responsiveness of independent use of depth information for "non-viewing"
applications due to texture-depth coding dependency. This paper presents a
standalone depth sequence coder, which operates in the lossless to
near-lossless quality range while compressing depth data superior to lossy
3D-HEVC. It preserves edges implicitly by limiting quantisation to the
spatial-domain and exploits clustering tendency efficiently at frame-level with
a novel binary tree based decomposition (BTBD) technique. For mono-view coding
of standard MVD test sequences, on average, (i) lossless BTBD achieved compression-ratio and coding gain against the pseudo-lossless
3D-HEVC, using the lowest quantisation parameter , and (ii)
near-lossless BTBD achieved and dB Bj{\o}ntegaard delta
bitrate (BD-BR) and distortion (BD-PSNR), respectively, against 3D-HEVC. In
view-synthesis applications, decoded depth maps from BTBD rendered superior
quality synthetic-views, compared to 3D-HEVC, with depth BD-BR and
dB synthetic-texture BD-PSNR on average.Comment: Submitted to IEEE Transactions on Image Processing. 13 pages, 5
figures, and 5 table
Covariance of Motion and Appearance Featuresfor Spatio Temporal Recognition Tasks
In this paper, we introduce an end-to-end framework for video analysis
focused towards practical scenarios built on theoretical foundations from
sparse representation, including a novel descriptor for general purpose video
analysis. In our approach, we compute kinematic features from optical flow and
first and second-order derivatives of intensities to represent motion and
appearance respectively. These features are then used to construct covariance
matrices which capture joint statistics of both low-level motion and appearance
features extracted from a video. Using an over-complete dictionary of the
covariance based descriptors built from labeled training samples, we formulate
low-level event recognition as a sparse linear approximation problem. Within
this, we pose the sparse decomposition of a covariance matrix, which also
conforms to the space of semi-positive definite matrices, as a determinant
maximization problem. Also since covariance matrices lie on non-linear
Riemannian manifolds, we compare our former approach with a sparse linear
approximation alternative that is suitable for equivalent vector spaces of
covariance matrices. This is done by searching for the best projection of the
query data on a dictionary using an Orthogonal Matching pursuit algorithm. We
show the applicability of our video descriptor in two different application
domains - namely low-level event recognition in unconstrained scenarios and
gesture recognition using one shot learning. Our experiments provide promising
insights in large scale video analysis
A Robust Background Initialization Algorithm with Superpixel Motion Detection
Scene background initialization allows the recovery of a clear image without
foreground objects from a video sequence, which is generally the first step in
many computer vision and video processing applications. The process may be
strongly affected by some challenges such as illumination changes, foreground
cluttering, intermittent movement, etc. In this paper, a robust background
initialization approach based on superpixel motion detection is proposed. Both
spatial and temporal characteristics of frames are adopted to effectively
eliminate foreground objects. A subsequence with stable illumination condition
is first selected for background estimation. Images are segmented into
superpixels to preserve spatial texture information and foreground objects are
eliminated by superpixel motion filtering process. A low-complexity
density-based clustering is then performed to generate reliable background
candidates for final background determination. The approach has been evaluated
on SBMnet dataset and it achieves a performance superior or comparable to other
state-of-the-art works with faster processing speed. Moreover, in those complex
and dynamic categories, the algorithm produces the best results showing the
robustness against very challenging scenarios.Comment: submitted to Elsevier Signal Processing: Image Communicatio
Near-Lossless Deep Feature Compression for Collaborative Intelligence
Collaborative intelligence is a new paradigm for efficient deployment of deep
neural networks across the mobile-cloud infrastructure. By dividing the network
between the mobile and the cloud, it is possible to distribute the
computational workload such that the overall energy and/or latency of the
system is minimized. However, this necessitates sending deep feature data from
the mobile to the cloud in order to perform inference. In this work, we examine
the differences between the deep feature data and natural image data, and
propose a simple and effective near-lossless deep feature compressor. The
proposed method achieves up to 5% bit rate reduction compared to HEVC-Intra and
even more against other popular image codecs. Finally, we suggest an approach
for reconstructing the input image from compressed deep features in the cloud,
that could serve to supplement the inference performed by the deep model
Background Subtraction in Real Applications: Challenges, Current Models and Future Directions
Computer vision applications based on videos often require the detection of
moving objects in their first step. Background subtraction is then applied in
order to separate the background and the foreground. In literature, background
subtraction is surely among the most investigated field in computer vision
providing a big amount of publications. Most of them concern the application of
mathematical and machine learning models to be more robust to the challenges
met in videos. However, the ultimate goal is that the background subtraction
methods developed in research could be employed in real applications like
traffic surveillance. But looking at the literature, we can remark that there
is often a gap between the current methods used in real applications and the
current methods in fundamental research. In addition, the videos evaluated in
large-scale datasets are not exhaustive in the way that they only covered a
part of the complete spectrum of the challenges met in real applications. In
this context, we attempt to provide the most exhaustive survey as possible on
real applications that used background subtraction in order to identify the
real challenges met in practice, the current used background models and to
provide future directions. Thus, challenges are investigated in terms of
camera, foreground objects and environments. In addition, we identify the
background models that are effectively used in these applications in order to
find potential usable recent background models in terms of robustness, time and
memory requirements.Comment: Submitted to Computer Science Revie
Anomaly Detection in Traffic Scenes via Spatial-aware Motion Reconstruction
Anomaly detection from a driver's perspective when driving is important to
autonomous vehicles. As a part of Advanced Driver Assistance Systems (ADAS), it
can remind the driver about dangers timely. Compared with traditional studied
scenes such as the university campus and market surveillance videos, it is
difficult to detect abnormal event from a driver's perspective due to camera
waggle, abidingly moving background, drastic change of vehicle velocity, etc.
To tackle these specific problems, this paper proposes a spatial localization
constrained sparse coding approach for anomaly detection in traffic scenes,
which firstly measures the abnormality of motion orientation and magnitude
respectively and then fuses these two aspects to obtain a robust detection
result. The main contributions are threefold: 1) This work describes the motion
orientation and magnitude of the object respectively in a new way, which is
demonstrated to be better than the traditional motion descriptors. 2) The
spatial localization of object is taken into account of the sparse
reconstruction framework, which utilizes the scene's structural information and
outperforms the conventional sparse coding methods. 3) Results of motion
orientation and magnitude are adaptively weighted and fused by a Bayesian
model, which makes the proposed method more robust and handle more kinds of
abnormal events. The efficiency and effectiveness of the proposed method are
validated by testing on nine difficult video sequences captured by ourselves.
Observed from the experimental results, the proposed method is more effective
and efficient than the popular competitors, and yields a higher performance.Comment: IEEE Transactions on Intelligent Transportation System
Coded aperture compressive temporal imaging
We use mechanical translation of a coded aperture for code division multiple
access compression of video. We present experimental results for reconstruction
at 148 frames per coded snapshot.Comment: 19 pages (when compiled with Optics Express' TEX template), 15
figure
Deep Learning Algorithms with Applications to Video Analytics for A Smart City: A Survey
Deep learning has recently achieved very promising results in a wide range of
areas such as computer vision, speech recognition and natural language
processing. It aims to learn hierarchical representations of data by using deep
architecture models. In a smart city, a lot of data (e.g. videos captured from
many distributed sensors) need to be automatically processed and analyzed. In
this paper, we review the deep learning algorithms applied to video analytics
of smart city in terms of different research topics: object detection, object
tracking, face recognition, image classification and scene labeling.Comment: 8 pages, 18 figure
Efficient Multiple Line-Based Intra Prediction for HEVC
Traditional intra prediction usually utilizes the nearest reference line to
generate the predicted block when considering strong spatial correlation.
However, this kind of single line-based method does not always work well due to
at least two issues. One is the incoherence caused by the signal noise or the
texture of other object, where this texture deviates from the inherent texture
of the current block. The other reason is that the nearest reference line
usually has worse reconstruction quality in block-based video coding. Due to
these two issues, this paper proposes an efficient multiple line-based intra
prediction scheme to improve coding efficiency. Besides the nearest reference
line, further reference lines are also utilized. The further reference lines
with relatively higher quality can provide potential better prediction. At the
same time, the residue compensation is introduced to calibrate the prediction
of boundary regions in a block when we utilize further reference lines. To
speed up the encoding process, this paper designs several fast algorithms.
Experimental results show that, compared with HM-16.9, the proposed fast search
method achieves 2.0% bit saving on average and up to 3.7%, with increasing the
encoding time by 112%.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems
for Video Technolog
- …