52 research outputs found
Application of region-based video surveillance in smart cities using deep learning
Smart video surveillance helps to build more robust smart city environment. The varied angle cameras act as smart sensors and collect visual data from smart city environment and transmit it for further visual analysis. The transmitted visual data is required to be in high quality for efficient analysis which is a challenging task while transmitting videos on low capacity bandwidth communication channels. In latest smart surveillance cameras, high quality of video transmission is maintained through various video encoding techniques such as high efficiency video coding. However, these video coding techniques still provide limited capabilities and the demand of high-quality based encoding for salient regions such as pedestrians, vehicles, cyclist/motorcyclist and road in video surveillance systems is still not met. This work is a contribution towards building an efficient salient region-based surveillance framework for smart cities. The proposed framework integrates a deep learning-based video surveillance technique that extracts salient regions from a video frame without information loss, and then encodes it in reduced size. We have applied this approach in diverse case studies environments of smart city to test the applicability of the framework. The successful result in terms of bitrate 56.92%, peak signal to noise ratio 5.35 bd and SR based segmentation accuracy of 92% and 96% for two different benchmark datasets is the outcome of proposed work. Consequently, the generation of less computational region-based video data makes it adaptable to improve surveillance solution in Smart Cities
Applying the Convolutional Neural Network Deep Learning Technology to Behavioural Recognition in Intelligent Video
In order to improve the accuracy and real-time performance of abnormal behaviour identification in massive video monitoring data, the authors design intelligent video technology based on convolutional neural network deep learning and apply it to the smart city on the basis of summarizing video development technology. First, the technical framework of intelligent video monitoring algorithm is divided into bottom (object detection), middle (object identification) and high (behaviour analysis) layers. The object detection based on background modelling is applied to routine real-time detection and early warning. The object detection based on object modelling is applied to after-event data query and retrieval. The related optical flow algorithms are used to achieve the identification and detection of abnormal behaviours. In order to improve the accuracy, effectiveness and intelligence of identification, the deep learning technology based on convolutional neural network is applied to enhance the learning and identification ability of learning machine and realize the real-time upgrade of intelligence video’s "brain". This research has a good popularization value in the application field of intelligent video technology
Robust Subspace Estimation via Low-Rank and Sparse Decomposition and Applications in Computer Vision
PhDRecent advances in robust subspace estimation have made dimensionality reduction and
noise and outlier suppression an area of interest for research, along with continuous
improvements in computer vision applications. Due to the nature of image and video
signals that need a high dimensional representation, often storage, processing, transmission,
and analysis of such signals is a difficult task. It is therefore desirable to obtain a
low-dimensional representation for such signals, and at the same time correct for corruptions,
errors, and outliers, so that the signals could be readily used for later processing.
Major recent advances in low-rank modelling in this context were initiated by the work of
Cand`es et al. [17] where the authors provided a solution for the long-standing problem of
decomposing a matrix into low-rank and sparse components in a Robust Principal Component
Analysis (RPCA) framework. However, for computer vision applications RPCA
is often too complex, and/or may not yield desirable results. The low-rank component
obtained by the RPCA has usually an unnecessarily high rank, while in certain tasks
lower dimensional representations are required. The RPCA has the ability to robustly
estimate noise and outliers and separate them from the low-rank component, by a sparse
part. But, it has no mechanism of providing an insight into the structure of the sparse
solution, nor a way to further decompose the sparse part into a random noise and a structured
sparse component that would be advantageous in many computer vision tasks. As
videos signals are usually captured by a camera that is moving, obtaining a low-rank
component by RPCA becomes impossible. In this thesis, novel Approximated RPCA
algorithms are presented, targeting different shortcomings of the RPCA. The Approximated
RPCA was analysed to identify the most time consuming RPCA solutions, and
replace them with simpler yet tractable alternative solutions. The proposed method is
able to obtain the exact desired rank for the low-rank component while estimating a
global transformation to describe camera-induced motion. Furthermore, it is able to
decompose the sparse part into a foreground sparse component, and a random noise
part that contains no useful information for computer vision processing. The foreground
sparse component is obtained by several novel structured sparsity-inducing norms, that
better encapsulate the needed pixel structure in visual signals. Moreover, algorithms for
reducing complexity of low-rank estimation have been proposed that achieve significant
complexity reduction without sacrificing the visual representation of video and image
information. The proposed algorithms are applied to several fundamental computer
vision tasks, namely, high efficiency video coding, batch image alignment, inpainting,
and recovery, video stabilisation, background modelling and foreground segmentation,
robust subspace clustering and motion estimation, face recognition, and ultra high definition
image and video super-resolution. The algorithms proposed in this thesis including
batch image alignment and recovery, background modelling and foreground segmentation,
robust subspace clustering and motion segmentation, and ultra high definition
image and video super-resolution achieve either state-of-the-art or comparable results to
existing methods
Semantic-aware video compression for automotive cameras
Assisted and automated driving functions in vehicles exploit sensor data to build situational awareness, however, the data amount required by these functions might exceed the bandwidth of current wired vehicle communication networks. Consequently, sensor data reduction, and automotive camera video compression need investigation. However, conventional video compression schemes, such as H.264 and H.265, have been mainly optimised for human vision. In this paper, we propose a semantic-aware (SA) video compression (SAC) framework that compresses separately and simultaneously region-of-interest and region-out-of-interest of automotive camera video frames, before transmitting them to processing unit(s), where the data are used for perception tasks, such as object detection, semantic segmentation, etc. Using our newly proposed technique, the region-of-interest (ROI), encapsulating most of the road stakeholders, retains higher quality using lower compression ratio. The experimental results show that under the same overall compression ratio, our proposed SAC scheme maintains a similar or better image quality, measured accordingly to traditional metrics and to our newly proposed semantic-aware metrics. The newly proposed metrics, namely SA-PSNR, SA-SSIM, and iIoU, give more emphasis to ROI quality, which has an immediate impact on the planning and decisions of assisted and automated driving functions. Using our SA-X264 compression, SA-PSNR and SA-SSIM have an increase of 2.864 and 0.008 respectively compared to traditional H.264, with higher ROI quality and the same compression ratio. Finally, a segmentation-based perception algorithm has been used to compare reconstructed frames, demonstrating a 2.7% mIOU improvement, when using the proposed SAC method versus traditional compression techniques
MuLViS: Multi-Level Encryption Based Security System for Surveillance Videos
Video Surveillance (VS) systems are commonly deployed for real-time abnormal event detection and autonomous video analytics. Video captured by surveillance cameras in real-time often contains identifiable personal information, which must be privacy protected, sometimes along with the locations of the surveillance and other sensitive information. Within the Surveillance System, these videos are processed and stored on a variety of devices. The processing and storage heterogeneity of those devices, together with their network requirements, make real-time surveillance systems complex and challenging. This paper proposes a surveillance system, named as Multi-Level Video Security (MuLViS) for privacy-protected cameras. Firstly, a Smart Surveillance Security Ontology (SSSO) is integrated within the MuLViS, with the aim of autonomously selecting the privacy level matching the operating device's hardware specifications and network capabilities. Overall, along with its device-specific security, the system leads to relatively fast indexing and retrieval of surveillance video. Secondly, information within the videos are protected at the times of capturing, streaming, and storage by means of differing encryption levels. An extensive evaluation of the system, through visual inspection and statistical analysis of experimental video results, such as by the Encryption Space Ratio (ESR), has demonstrated the aptness of the security level assignments. The system is suitable for surveillance footage protection, which can be made General Data Protection Regulation (GDPR) compliant, ensuring that lawful data access respects individuals' privacy rights
Visual Saliency Estimation Via HEVC Bitstream Analysis
Abstract
Since Information Technology developed dramatically from the last century 50's, digital images and video are ubiquitous. In the last decade, image and video processing have become more and more popular in biomedical, industrial, art and other fields. People made progress in the visual information such as images or video display, storage and transmission. The attendant problem is that video processing tasks in time domain become particularly arduous.
Based on the study of the existing compressed domain video saliency detection model, a new saliency estimation model for video based on High Efficiency Video Coding (HEVC) is presented. First, the relative features are extracted from HEVC encoded bitstream. The naive Bayesian model is used to train and test features based on original YUV videos and ground truth. The intra frame saliency map can be achieved after training and testing intra features. And inter frame saliency can be achieved by intra saliency with moving motion vectors. The ROC of our proposed intra mode is 0.9561. Other classification methods such as support vector machine (SVM), k nearest neighbors (KNN) and the decision tree are presented to compare the experimental outcomes. The variety of compression ratio has been analysis to affect the saliency
- …