26 research outputs found

    Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement

    Full text link
    In this paper, we propose a Hierarchical Learned Video Compression (HLVC) method with three hierarchical quality layers and a recurrent enhancement network. The frames in the first layer are compressed by an image compression method with the highest quality. Using these frames as references, we propose the Bi-Directional Deep Compression (BDDC) network to compress the second layer with relatively high quality. Then, the third layer frames are compressed with the lowest quality, by the proposed Single Motion Deep Compression (SMDC) network, which adopts a single motion map to estimate the motions of multiple frames, thus saving bits for motion information. In our deep decoder, we develop the Weighted Recurrent Quality Enhancement (WRQE) network, which takes both compressed frames and the bit stream as inputs. In the recurrent cell of WRQE, the memory and update signal are weighted by quality features to reasonably leverage multi-frame information for enhancement. In our HLVC approach, the hierarchical quality benefits the coding efficiency, since the high quality information facilitates the compression and enhancement of low quality frames at encoder and decoder sides, respectively. Finally, the experiments validate that our HLVC approach advances the state-of-the-art of deep video compression methods, and outperforms the "Low-Delay P (LDP) very fast" mode of x265 in terms of both PSNR and MS-SSIM. The project page is at https://github.com/RenYang-home/HLVC.Comment: Published in CVPR 2020; corrected a minor typo in the footnote of Table 1; corrected Figure 1

    Video Compression using Neural Weight Step and Huffman Coding Techniques

    Get PDF
    مقدمة: تقترح هذه الورقة طريقة مخطط ضغط الفيديو الهرمي (HVCS) مع ثلاث طبقات هرمية من الجودة مع شبكة تحسين الجودة المتكررة (RQEN).  تستخدم تقنيات ضغط الصور لضغط الإطارات في الطبقة الأولى، حيث تتمتع الإطارات بأعلى جودة. باستخدام إطار عالي الجودة كمرجع ، تم اقتراح شبكة الضغط العميق ثنائي الاتجاه (BDC)  لضغط الإطار في الطبقة الثانية بجودة كبيرة. في الطبقة الثالثة، يتم استخدام جودة منخفضة لضغط الإطار باستخدام شبكة ضغط الحركة الواحدة(SMC) المعتمدة، والتي تقترح خريطة الحركة الواحدة لتقدير الحركة داخل إطارات متعددة. نتيجة لذلك ، يوفر SMC معلومات الحركة باستخدام عدد أقل من البتات. في مرحلة فك التشفير ، يتم تطوير شبكة تحسين الجودة المتكررة ((RQEN  المرجحة لأخذ كل من تدفق البتات والإطارات المضغوطة كمدخلات. في خلية RQEN ، يتم ترجيح إشارة التحديث والذاكرة باستخدام ميزات الجودة للتأثير بشكل إيجابي على معلومات الإطارات المتعددة ... طرق العمل: يوضح الجدولان 1 و 2 تمثيل القيم الناتجة لتشويه المعدل في مجموعتي بيانات الفيديو. كما ذكرنا سابقا ، يتم استخدام PSNR و MS-SSIM لتقييم الجودة، حيث يتم حساب معدلات البتات باستخدام بت لكل بكسل(bpp)  . يوضح الجدول 1 أداء PSNR، حيث يظهرون أداء PSNR أفضل لنموذج الضغط المقترح من الطرق الأخرى مثل Chao et al [7] أو الطرق المحسنة [1]. بالإضافة إلى ذلك ، يتفوقون في تطبيق H.265 على مجموعة بيانات JCT-VC القياسية. على الجانب الآخر ، أسفر مخطط الضغط المقترح عن أداء معدل بت أفضل من تطبيق H.265 على UVG. كما هو الحال في الجدول 2 ، قدم تقييم MS-SSIM أداء أفضل للمخطط المقترح من جميع النهج المستفادة الأخرى، حيث وصل إلى أداء أفضل من H.264 و .H.265 نظرا لأداء معدل البت على UVG ، يتمتع Lee et al. [11] بأداء مماثل، وحقق Guo et al [10] أداء أقل من H.265. التقديم على JCT-VC ، DVC [10] يمكن مقارنته فقط ب H.265 . على العكس من ذلك ، فإن أداء تشويه معدل HVCS له أداء أفضل واضح من H.265. علاوة على ذلك، يتم حساب معدل بت دلتا BjꝊntegaard (BDBR) [47] أيضا اعتمادا على H.265. يحسب مقياس BDBR متوسط الفرق في معدل البت مع الأخذ في الاعتبار مرساة H.265 ، حيث يشار إلى أداء أفضل على القيم المنخفضة ل BDBR [48] . يحسب مقياس BDBR متوسط الفرق في معدل البت مع الأخذ في الاعتبار مرساة H.265، حيث يشار إلى أداء أفضل على القيم المنخفضة ل BDBR [48]. في الجدول 3، يتم توضيح أداء BDBR اعتمادا على PSNR و MS-SSIM ، حيث يشار إلى تخفيض معدل البتات بالنظر إلى المرساة بأرقام سالبة معروضة. تتفوق هذه النتائج على أداء H.265، حيث تمثل الأرقام الجريئة أفضل النتائج التي تم تحقيقها من خلال الأساليب المستفادة. قدم الجدول 3 مقارنة عادلة حول تقنيات DVC المحسنة (MS-SSIM & PSNR) [10]  مع الأخذ في الاعتبار المرساة H.265. الاستنتاجات: يقترح هذا العمل مخطط ضغط فيديو مستفاد باستخدام جودة الإطار الهرمي مع التحسين المتكرر. على وجه التحديد، يقترح هذا العمل تقسيم الإطارات إلى مستويات هرمية 1 و 2 و 3 في انخفاض الجودة.  بالنسبة للطبقة الأولى، يتم اقتراح طرق ضغط الصور، مع اقتراح BDC وSMC للطبقات 2 و 3 على التوالي. تم تطوير شبكة RQEN بإطارات مضغوطة بجودة الإطار ومعلومات معدل البت كمدخلات لتحسين الإطارات المتعددة. أثبتت النتائج التجريبية كفاءة مخطط ضغط HVCS المقترح. وبالمثل مع تقنيات الضغط الأخرى ، يتم تعيين هيكل الإطار يدويا في هذا المخطط. يمكن تحقيق توصية واعدة للعمل المستقبلي من خلال تطوير شبكات DNN التي يتم تعلمها تلقائيا للتنبؤ والتسلسل الهرمي.Background: This paper proposes a Hierarchical Video Compression Scheme (HVCS) method with three hierarchical layers of quality with Recurrent Quality Enhancement (RQEN‎‎) network. Image compression techniques are used to compress frames in the first layer, where frames have the highest quality. Using high-quality frame as a reference, the Bi-Directional Deep Compression (BDC) network is proposed for frame compression in the second layer with considerable quality. In the third layer, low quality is used for frame compression using adopted Single Motion Compression (SMC) network, which proposes the single motion map for motion estimation within multiple frames. As a result, SMC provide motion information using fewer bits. In decoding stage, a weighted Recurrent Quality Enhancement (RQEN‎‎) network is developed to take both bit stream and the compressed frames as inputs. In RQEN cell‎‎, the update signal and memory are weighted using quality features to positively influence information of multi-frame for enhancement. In this paper, HVCS adopts hierarchical quality to benefit the efficiency of frame coding, whereas high-quality information improves frame compression and enhances the low-quality frames at encoding and decoding stages, respectively. Experimental results validate that proposed HVCS approach overcomes the state-of-the-art of compression methods. Materials and Methods: Tables 1& 2 illustrate representing yielded values for rate-distortion on both video datasets. As aforementioned, PSNR and MS-SSIM are used for quality evaluation, where bit-rates are calculated using bits per pixel (bpp). Table 1 illustrates PSNR performance, where they show better PSNR performance for the proposed compression model than other methods such as Chao et al [7] or optimized methods [1]. In addition, they outperform applying H.265 on standard JCT-VC dataset. On the other side, proposed compression scheme yielded better bit-rate performance than applying H.265 on UVG. As in Table 2, the MS-SSIM evaluation provided better performance of proposed scheme than all other learned approaches, where it reached better performance than H.264 and H.265. Due to bit-rate performance on UVG, Lee ‎et al. [11] has comparable performance, and Guo ‎et al [10] yielded lower performance than H.265. Applying on JCT-VC, DVC [10] is only comparable with H.265. On the opposite, the preformance of HVCS-rate-distortion have obvious better performance than H.265. Furthermore, BjꝊntegaard Delta Bit-Rate (BDBR) [47] is also computed depending on H.265. A BDBR measure computes the average difference of bit-rate considering H.265 anchor, where better performance is indicated on lower values of BDBR [48]. In Table 3, BDBR performance is illustrated depending on PSNR and MS-SSIM, in which, bit-rate reduction considering the anchor is indicated by showed negative numbers. Such results outperform H.265 performance, where bold numbers represent best yielded results by learned methods. Table 3 provided a fair comparison on (MS-SSIM & PSNR) optimized techniques DVC [10] considering the anchor H.265.   Results: Tables 1& 2 illustrate representing yielded values for rate-distortion on both video datasets. As aforementioned, PSNR and MS-SSIM are used for quality evaluation, where bit-rates are calculated using bits per pixel (bpp). Table 1 illustrates PSNR performance, where they show better PSNR performance for the proposed compression model than other methods such as Chao et al [7] or optimized methods [1]. In addition, they outperform applying H.265 on standard JCT-VC dataset. On the other side, proposed compression scheme yielded better bit-rate performance than applying H.265 on UVG. As in Table 2, the MS-SSIM evaluation provided better performance of proposed scheme than all other learned approaches, where it reached better performance than H.264 and H.265. Due to bit-rate performance on UVG, Lee ‎et al. [11] has comparable performance, and Guo ‎et al [10] yielded lower performance than H.265. Applying on JCT-VC, DVC [10] is only comparable with H.265. On the opposite, the preformance of HVCS-rate-distortion have obvious better performance than H.265. Furthermore, BjꝊntegaard Delta Bit-Rate (BDBR) [47] is also computed depending on H.265. A BDBR measure computes the average difference of bit-rate considering H.265 anchor, where better performance is indicated on lower values of BDBR [48]. In Table 3, BDBR performance is illustrated depending on PSNR and MS-SSIM, in which, bit-rate reduction considering the anchor is indicated by showed negative numbers. Such results outperform H.265 performance, where bold numbers represent best yielded results by learned methods. Table 3 provided a fair comparison on (MS-SSIM & PSNR) optimized techniques DVC [10] considering the anchor H.265. Conclusion: This work proposes a learned video compression scheme utilizing the hierarchical frame quality with recurrent enhancement. Specifically, this work proposes dividing frames into hierarchical levels 1, 2 and 3 in decreasing quality.  For the first layer, image compression methods are proposed, while proposing BDC and SMC for layers 2 and 3 respectively. RQEN‎‎ network is developed with frame quality compressed frames and bit-rate information as inputs for multi-frame enhancement. Experimental results validated the efficiency of proposed HVCS compression scheme. Similarly with other compression techniques, frame structure is manually set the in this scheme. A promising recommendation for future work can be accomplished by developing DNN networks which are automatically learned for the prediction and hierarchy

    Spatial-temporal autoencoder with attention network for video compression

    Get PDF
    Deep learning-based approaches are now state of the art in numerous tasks, including video compression, and are having a revolutionary influence in video processing. Recently, learned video compression methods exhibit a fast development trend with promising results. In this paper, taking advantage of the powerful non-linear representation ability of neural networks, we replace each standard component of video compression with a neural network. We propose a spatial-temporal video compression network (STVC) using the spatial-temporal priors with an attention module (STPA). On the one hand, joint spatial-temporal priors are used for generating latent representations and reconstructing compressed outputs because efficient temporal and spatial information representation plays a crucial role in video coding. On the other hand, we also added an efficient and effective Attention module such that the model pays more effort on restoring the artifact-rich areas. Moreover, we formalize the rate-distortion optimization into a single loss function, in which the network learns to leverage the Spatial-temporal redundancy presented in the frames and decreases the bit rate while maintaining visual quality in the decoded frames. The experiment results show that our approach delivers the state-of-the-art learning video compression performance in terms of MS-SSIM and PSNR

    Advanced methods and deep learning for video and satellite data compression

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Spatial-Temporal Autoencoder with Attention Network for Video Compression

    Get PDF
    Deep learning-based approaches are now state of the art in numerous tasks, including video compression, and are having a revolutionary influence in video processing. Recently, learned video compression methods exhibit a fast development trend with promising results. In this paper, taking advantage of the powerful non-linear representation ability of neural networks, we replace each standard component of video compression with a neural network. We propose a spatial-temporal video compression network (STVC) using the spatial-temporal priors with an attention module (STPA). On the one hand, joint spatial-temporal priors are used for generating latent representations and reconstructing compressed outputs because efficient temporal and spatial information representation plays a crucial role in video coding. On the other hand, we also added an efficient and effective Attention module such that the model pays more effort on restoring the artifact-rich areas. Moreover, we formalize the rate-distortion optimization into a single loss function, in which the network learns to leverage the Spatial-temporal redundancy presented in the frames and decreases the bit rate while maintaining visual quality in the decoded frames. The experiment results show that our approach delivers the state-of-the-art learning video compression performance in terms of MS-SSIM and PSNR

    Applications in Electronics Pervading Industry, Environment and Society

    Get PDF
    This book features the manuscripts accepted for the Special Issue “Applications in Electronics Pervading Industry, Environment and Society—Sensing Systems and Pervasive Intelligence” of the MDPI journal Sensors. Most of the papers come from a selection of the best papers of the 2019 edition of the “Applications in Electronics Pervading Industry, Environment and Society” (APPLEPIES) Conference, which was held in November 2019. All these papers have been significantly enhanced with novel experimental results. The papers give an overview of the trends in research and development activities concerning the pervasive application of electronics in industry, the environment, and society. The focus of these papers is on cyber physical systems (CPS), with research proposals for new sensor acquisition and ADC (analog to digital converter) methods, high-speed communication systems, cybersecurity, big data management, and data processing including emerging machine learning techniques. Physical implementation aspects are discussed as well as the trade-off found between functional performance and hardware/system costs

    Artificial Intelligence in the Creative Industries: A Review

    Full text link
    This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity
    corecore