868 research outputs found

    Analysis Of Cross-Layer Optimization Of Facial Recognition In Automated Video Surveillance

    Get PDF
    Interest in automated video surveillance systems has grown dramatically and with that so too has research on the topic. Recent approaches have begun addressing the issues of scalability and cost. One method aimed to utilize cross-layer information for adjusting bandwidth allocated to each video source. Work on this topic focused on using distortion and accuracy for face detection as an adjustment metric, utilizing older, less efficient codecs. The framework was shown to increase accuracy in face detection by interpreting dynamic network conditions in order to manage application rates and transmission opportunities for video sources with the added benefit of reducing overall network load and power consumption. In this thesis, we analyze the effectiveness of an accuracy-based cross-layer bandwidth allocation solution when used in conjunction with facial recognition tasks. In addition, we consider the effectiveness of the optimization when combined with H.264. We perform analysis of the Honda/UCSD face database to characterize the relationship between facial recognition accuracy and bitrate. Utilizing OPNET, we develop a realistic automated video surveillance system that includes a full video streaming and facial recognition implementation. We conduct extensive experimentation that examines the effectiveness of the framework to maximize facial recognition accuracy while utilizing the H.264 video codec. In addition, network load and power consumption characteristics are examined to observe what benefits may exist when using a codec that maintains video quality at lower bitrates more effectively than previously tested codecs. We propose two enhancements to the accuracy-based cross-layer bandwidth optimization solution. In the first enhancement we evaluate the effectiveness of placing a cap on bandwidth to reduce excessive bandwidth usage. The second enhancement explores the effectiveness of distributing computer vision tasks to smart cameras in order to reduce network load. The results show that cross-layer optimization of facial recognition is effective in reducing load and power consumption in automated video surveillance networks. Furthermore, the analysis shows that the solution is effective when using H.264. Additionally, the proposed enhancements demonstrate further reductions to network load and power consumption while also maintaining facial recognition accuracy across larger network sizes

    Secure and Efficient Video Transmission in VANET

    Get PDF
    Currently, vehicular communications have become a reality used by various applications, especially applications that broadcast video in real time. However, the video quality received is penalized by the poor characteristics of the transmission channel (availability, non-stationarity, the ration of signal-to-noise, etc.). To improve and ensure minimum video quality at reception, we propose in this work a mechanism entitled “Secure and Efficient Transmission of Videos in VANET (SETV)”. It's based on the "Quality of Experience (QoE)" and using hierarchical packet management. This last is based on the importance of the images of the stream video. To this end, the use of transmission error correction with uneven error protection has proven to be effective in delivering high quality videos with low network overhead. This is done based on the specific details of video encoding and actual network conditions such as signal to noise ratio, network density, vehicle position and current packet loss rate (PLR) not to mention the prediction of the future DPP.Machine learning models were developed on our work to estimate perceived audio-visual quality. The protocol previously gathers information about its neighbouring vehicles to perform distributed jump reinforcement learning. The simulation results obtained for several types of realistic vehicular scenarios show that our proposed mechanism offers significant improvements in terms of video quality on reception and end-to-end delay compared to conventional schemes. The results prove that the proposed mechanism has showed 11% to 18% improvement in video quality and 9% load gain compared to ShieldHEVC

    3D Medical Image Lossless Compressor Using Deep Learning Approaches

    Get PDF
    The ever-increasing importance of accelerated information processing, communica-tion, and storing are major requirements within the big-data era revolution. With the extensive rise in data availability, handy information acquisition, and growing data rate, a critical challenge emerges in efficient handling. Even with advanced technical hardware developments and multiple Graphics Processing Units (GPUs) availability, this demand is still highly promoted to utilise these technologies effectively. Health-care systems are one of the domains yielding explosive data growth. Especially when considering their modern scanners abilities, which annually produce higher-resolution and more densely sampled medical images, with increasing requirements for massive storage capacity. The bottleneck in data transmission and storage would essentially be handled with an effective compression method. Since medical information is critical and imposes an influential role in diagnosis accuracy, it is strongly encouraged to guarantee exact reconstruction with no loss in quality, which is the main objective of any lossless compression algorithm. Given the revolutionary impact of Deep Learning (DL) methods in solving many tasks while achieving the state of the art results, includ-ing data compression, this opens tremendous opportunities for contributions. While considerable efforts have been made to address lossy performance using learning-based approaches, less attention was paid to address lossless compression. This PhD thesis investigates and proposes novel learning-based approaches for compressing 3D medical images losslessly.Firstly, we formulate the lossless compression task as a supervised sequential prediction problem, whereby a model learns a projection function to predict a target voxel given sequence of samples from its spatially surrounding voxels. Using such 3D local sampling information efficiently exploits spatial similarities and redundancies in a volumetric medical context by utilising such a prediction paradigm. The proposed NN-based data predictor is trained to minimise the differences with the original data values while the residual errors are encoded using arithmetic coding to allow lossless reconstruction.Following this, we explore the effectiveness of Recurrent Neural Networks (RNNs) as a 3D predictor for learning the mapping function from the spatial medical domain (16 bit-depths). We analyse Long Short-Term Memory (LSTM) models’ generalisabil-ity and robustness in capturing the 3D spatial dependencies of a voxel’s neighbourhood while utilising samples taken from various scanning settings. We evaluate our proposed MedZip models in compressing unseen Computerized Tomography (CT) and Magnetic Resonance Imaging (MRI) modalities losslessly, compared to other state-of-the-art lossless compression standards.This work investigates input configurations and sampling schemes for a many-to-one sequence prediction model, specifically for compressing 3D medical images (16 bit-depths) losslessly. The main objective is to determine the optimal practice for enabling the proposed LSTM model to achieve a high compression ratio and fast encoding-decoding performance. A solution for a non-deterministic environments problem was also proposed, allowing models to run in parallel form without much compression performance drop. Compared to well-known lossless codecs, experimental evaluations were carried out on datasets acquired by different hospitals, representing different body segments, and have distinct scanning modalities (i.e. CT and MRI).To conclude, we present a novel data-driven sampling scheme utilising weighted gradient scores for training LSTM prediction-based models. The objective is to determine whether some training samples are significantly more informative than others, specifically in medical domains where samples are available on a scale of billions. The effectiveness of models trained on the presented importance sampling scheme was evaluated compared to alternative strategies such as uniform, Gaussian, and sliced-based sampling

    Real-time neural network based video super-resolution as a service: design and implementation of a real-time video super-resolution service using public cloud services

    Get PDF
    Despite the advancements in video streaming, we still find limitations when there is the necessity to stream real-time video in a higher resolution (e.g., in super- resolution) through mobile devices with limited resources. This thesis work aims to give an option to address this challenge through a cloud service. There were two main code components to create this service. The first component was aiortc (e.g., the WebRTC python version), the streaming protocol. The second component was the Efficient Sub-Pixel Convolutional Neural Network (ESPCN)-model, one of the outstanding methods to upscale video at the present time. These two code components were implemented in a virtual machine in the Microsoft Azure cloud environment with a customized configuration. Qualitative as well as quantitative results of this work were obtained and analyzed. To obtain the qualitative results two versions of the ESPCN-model were developed and for the quantitative outcomes three different configurations of HW/SW codecs and CPU/GPU utilization were produced and analyzed. Besides finding and defining the code components mentioned before as optimal to create an efficient real-time video super-resolution service based on the cloud, an- other conclusion of this project is that sending or receiving information (frames) from the CPU to the GPU and vice-versa has a very big negative impact in the efficiency of the whole service. Hence, to limit this CPU-GPU interaction or to only use GPU (e.g., with the NVIDIA Virtual Processing Framework [VPF]) is critical for an efficient service. This issue can be avoided, as the quantitative results show, if a codec that only makes use of the GPU (e.g., a NVIDIA HW codec) is employed. Furthermore, the Azure cloud environment component, enables an efficient execution of the service in diverse mobile devices. In future, the quality measure of the video super-resolution done by the ESPCN- model is suggested as a next step to do

    Cloud media video encoding:review and challenges

    Get PDF
    In recent years, Internet traffic patterns have been changing. Most of the traffic demand by end users is multimedia, in particular, video streaming accounts for over 53%. This demand has led to improved network infrastructures and computing architectures to meet the challenges of delivering these multimedia services while maintaining an adequate quality of experience. Focusing on the preparation and adequacy of multimedia content for broadcasting, Cloud and Edge Computing infrastructures have been and will be crucial to offer high and ultra-high definition multimedia content in live, real-time, or video-on-demand scenarios. For these reasons, this review paper presents a detailed study of research papers related to encoding and transcoding techniques in cloud computing environments. It begins by discussing the evolution of streaming and the importance of the encoding process, with a focus on the latest streaming methods and codecs. Then, it examines the role of cloud systems in multimedia environments and provides details on the cloud infrastructure for media scenarios. After doing a systematic literature review, we have been able to find 49 valid papers that meet the requirements specified in the research questions. Each paper has been analyzed and classified according to several criteria, besides to inspect their relevance. To conclude this review, we have identified and elaborated on several challenges and open research issues associated with the development of video codecs optimized for diverse factors within both cloud and edge architectures. Additionally, we have discussed emerging challenges in designing new cloud/edge architectures aimed at more efficient delivery of media traffic. This involves investigating ways to improve the overall performance, reliability, and resource utilization of architectures that support the transmission of multimedia content over both cloud and edge computing environments ensuring a good quality of experience for the final user

    Using machine learning to select and optimise multiple objectives in media compression

    Get PDF
    The growing complexity of emerging image and video compression standards means additional demands on computational time and energy resources in a variety of environments. Additionally, the steady increase in sensor resolution, display resolution, and the demand for increasingly high-quality media in consumer and professional applications also mean that there is an increasing quantity of media being compressed. This work focuses on a methodology for improving and understanding the quality of media compression algorithms using an empirical approach. Consequently, the outcomes of this research can be deployed on existing standard compression algorithms, but are also likely to be applicable to future standards without substantial redevelopment, increasing productivity and decreasing time-to-market. Using machine learning techniques, this thesis proposes a means of using past information about how images and videos are compressed in terms of content, and leveraging this information to guide and improve industry standard media compressors in order to achieve the desired outcome in a time and energy e cient way. The methodology is implemented and evaluated on JPEG, WebP and x265 codecs, allowing the system to automatically target multiple performance characteristics like le size, image quality, compression time and e ciency, based on user preferences. Compared to previous work, this system is able to achieve a prediction error three times smaller for quality and size for JPEG, and a speed up of compression of four times for WebP, targeting the same objectives. For x265 video compression, the system allows multiple objectives to be considered simultaneously, allowing speedier encoding for similar levels of quality
    corecore