3,763 research outputs found

    Analysis Of Cross-Layer Optimization Of Facial Recognition In Automated Video Surveillance

    Get PDF
    Interest in automated video surveillance systems has grown dramatically and with that so too has research on the topic. Recent approaches have begun addressing the issues of scalability and cost. One method aimed to utilize cross-layer information for adjusting bandwidth allocated to each video source. Work on this topic focused on using distortion and accuracy for face detection as an adjustment metric, utilizing older, less efficient codecs. The framework was shown to increase accuracy in face detection by interpreting dynamic network conditions in order to manage application rates and transmission opportunities for video sources with the added benefit of reducing overall network load and power consumption. In this thesis, we analyze the effectiveness of an accuracy-based cross-layer bandwidth allocation solution when used in conjunction with facial recognition tasks. In addition, we consider the effectiveness of the optimization when combined with H.264. We perform analysis of the Honda/UCSD face database to characterize the relationship between facial recognition accuracy and bitrate. Utilizing OPNET, we develop a realistic automated video surveillance system that includes a full video streaming and facial recognition implementation. We conduct extensive experimentation that examines the effectiveness of the framework to maximize facial recognition accuracy while utilizing the H.264 video codec. In addition, network load and power consumption characteristics are examined to observe what benefits may exist when using a codec that maintains video quality at lower bitrates more effectively than previously tested codecs. We propose two enhancements to the accuracy-based cross-layer bandwidth optimization solution. In the first enhancement we evaluate the effectiveness of placing a cap on bandwidth to reduce excessive bandwidth usage. The second enhancement explores the effectiveness of distributing computer vision tasks to smart cameras in order to reduce network load. The results show that cross-layer optimization of facial recognition is effective in reducing load and power consumption in automated video surveillance networks. Furthermore, the analysis shows that the solution is effective when using H.264. Additionally, the proposed enhancements demonstrate further reductions to network load and power consumption while also maintaining facial recognition accuracy across larger network sizes

    Video content analysis for intelligent forensics

    Get PDF
    The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild

    Maximizing Resource Utilization In Video Streaming Systems

    Get PDF
    Video streaming has recently grown dramatically in popularity over the Internet, Cable TV, and wire-less networks. Because of the resource demanding nature of video streaming applications, maximizing resource utilization in any video streaming system is a key factor to increase the scalability and decrease the cost of the system. Resources to utilize include server bandwidth, network bandwidth, battery life in battery operated devices, and processing time in limited processing power devices. In this work, we propose new techniques to maximize the utilization of video-on-demand (VOD) server resources. In addition to that, we propose new framework to maximize the utilization of the network bandwidth in wireless video streaming systems. Providing video streaming users in a VOD system with expected waiting times enhances their perceived quality-of-service (QoS) and encourages them to wait thereby increasing server utilization by increasing server throughput. In this work, we analyze waiting-time predictability in scalable video streaming. We also propose two prediction schemes and study their effectiveness when applied with various stream merging techniques and scheduling policies. The results demonstrate that the waiting time can be predicted accurately, especially when enhanced cost-based scheduling is applied. The combination of waiting-time prediction and cost-based scheduling leads to outstanding performance benefits. The achieved resource sharing by stream merging depends greatly on how the waiting requests are scheduled for service. Motivated by the development of cost-based scheduling, we investigate its effectiveness in great detail and discuss opportunities for further tunings and enhancements. Additionally, we analyze the effectiveness of incorporating video prediction results into the scheduling decisions. We also study the interaction between scheduling policies and the stream merging techniques and explore new ways for enhancements. The interest in video surveillance systems has grown dramatically during the last decade. Auto-mated video surveillance (AVS) serves as an efficient approach for the realtime detection of threats and for monitoring their progress. Wireless networks in AVS systems have limited available bandwidth that have to be estimated accurately and distributed efficiently. In this research, we develop two cross-layer optimization frameworks that maximize the bandwidth optimization of 802.11 wireless network. We develop a distortion-based cross-layer optimization framework that manages bandwidth in the wire-less network in such a way that minimizes the overall distortion. We also develop an accuracy-based cross-layer optimization framework in which the overall detection accuracy of the computer vision algorithm(s) running in the system is maximized. Both proposed frameworks manage the application rates and transmission opportunities of various video sources based on the dynamic network conditions to achieve their goals. Each framework utilizes a novel online approach for estimating the effective airtime of the network. Moreover, we propose a bandwidth pruning mechanism that can be used with the accuracy-based framework to achieve any desired tradeoff between detection accuracy and power consumption. We demonstrate the effectiveness of the proposed frameworks, including the effective air-time estimation algorithms and the bandwidth pruning mechanism, through extensive experiments using OPNET

    VIDEO FOREGROUND LOCALIZATION FROM TRADITIONAL METHODS TO DEEP LEARNING

    Get PDF
    These days, detection of Visual Attention Regions (VAR), such as moving objects has become an integral part of many Computer Vision applications, viz. pattern recognition, object detection and classification, video surveillance, autonomous driving, human-machine interaction (HMI), and so forth. The moving object identification using bounding boxes has matured to the level of localizing the objects along their rigid borders and the process is called foreground localization (FGL). Over the decades, many image segmentation methodologies have been well studied, devised, and extended to suit the video FGL. Despite that, still, the problem of video foreground (FG) segmentation remains an intriguing task yet appealing due to its ill-posed nature and myriad of applications. Maintaining spatial and temporal coherence, particularly at object boundaries, persists challenging, and computationally burdensome. It even gets harder when the background possesses dynamic nature, like swaying tree branches or shimmering water body, and illumination variations, shadows cast by the moving objects, or when the video sequences have jittery frames caused by vibrating or unstable camera mounts on a surveillance post or moving robot. At the same time, in the analysis of traffic flow or human activity, the performance of an intelligent system substantially depends on its robustness of localizing the VAR, i.e., the FG. To this end, the natural question arises as what is the best way to deal with these challenges? Thus, the goal of this thesis is to investigate plausible real-time performant implementations from traditional approaches to modern-day deep learning (DL) models for FGL that can be applicable to many video content-aware applications (VCAA). It focuses mainly on improving existing methodologies through harnessing multimodal spatial and temporal cues for a delineated FGL. The first part of the dissertation is dedicated for enhancing conventional sample-based and Gaussian mixture model (GMM)-based video FGL using probability mass function (PMF), temporal median filtering, and fusing CIEDE2000 color similarity, color distortion, and illumination measures, and picking an appropriate adaptive threshold to extract the FG pixels. The subjective and objective evaluations are done to show the improvements over a number of similar conventional methods. The second part of the thesis focuses on exploiting and improving deep convolutional neural networks (DCNN) for the problem as mentioned earlier. Consequently, three models akin to encoder-decoder (EnDec) network are implemented with various innovative strategies to improve the quality of the FG segmentation. The strategies are not limited to double encoding - slow decoding feature learning, multi-view receptive field feature fusion, and incorporating spatiotemporal cues through long-shortterm memory (LSTM) units both in the subsampling and upsampling subnetworks. Experimental studies are carried out thoroughly on all conditions from baselines to challenging video sequences to prove the effectiveness of the proposed DCNNs. The analysis demonstrates that the architectural efficiency over other methods while quantitative and qualitative experiments show the competitive performance of the proposed models compared to the state-of-the-art

    Efficient algorithms for scalable video coding

    Get PDF
    A scalable video bitstream specifically designed for the needs of various client terminals, network conditions, and user demands is much desired in current and future video transmission and storage systems. The scalable extension of the H.264/AVC standard (SVC) has been developed to satisfy the new challenges posed by heterogeneous environments, as it permits a single video stream to be decoded fully or partially with variable quality, resolution, and frame rate in order to adapt to a specific application. This thesis presents novel improved algorithms for SVC, including: 1) a fast inter-frame and inter-layer coding mode selection algorithm based on motion activity; 2) a hierarchical fast mode selection algorithm; 3) a two-part Rate Distortion (RD) model targeting the properties of different prediction modes for the SVC rate control scheme; and 4) an optimised Mean Absolute Difference (MAD) prediction model. The proposed fast inter-frame and inter-layer mode selection algorithm is based on the empirical observation that a macroblock (MB) with slow movement is more likely to be best matched by one in the same resolution layer. However, for a macroblock with fast movement, motion estimation between layers is required. Simulation results show that the algorithm can reduce the encoding time by up to 40%, with negligible degradation in RD performance. The proposed hierarchical fast mode selection scheme comprises four levels and makes full use of inter-layer, temporal and spatial correlation aswell as the texture information of each macroblock. Overall, the new technique demonstrates the same coding performance in terms of picture quality and compression ratio as that of the SVC standard, yet produces a saving in encoding time of up to 84%. Compared with state-of-the-art SVC fast mode selection algorithms, the proposed algorithm achieves a superior computational time reduction under very similar RD performance conditions. The existing SVC rate distortion model cannot accurately represent the RD properties of the prediction modes, because it is influenced by the use of inter-layer prediction. A separate RD model for inter-layer prediction coding in the enhancement layer(s) is therefore introduced. Overall, the proposed algorithms improve the average PSNR by up to 0.34dB or produce an average saving in bit rate of up to 7.78%. Furthermore, the control accuracy is maintained to within 0.07% on average. As aMADprediction error always exists and cannot be avoided, an optimisedMADprediction model for the spatial enhancement layers is proposed that considers the MAD from previous temporal frames and previous spatial frames together, to achieve a more accurateMADprediction. Simulation results indicate that the proposedMADprediction model reduces the MAD prediction error by up to 79% compared with the JVT-W043 implementation
    corecore