26 research outputs found
Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement
In this paper, we propose a Hierarchical Learned Video Compression (HLVC)
method with three hierarchical quality layers and a recurrent enhancement
network. The frames in the first layer are compressed by an image compression
method with the highest quality. Using these frames as references, we propose
the Bi-Directional Deep Compression (BDDC) network to compress the second layer
with relatively high quality. Then, the third layer frames are compressed with
the lowest quality, by the proposed Single Motion Deep Compression (SMDC)
network, which adopts a single motion map to estimate the motions of multiple
frames, thus saving bits for motion information. In our deep decoder, we
develop the Weighted Recurrent Quality Enhancement (WRQE) network, which takes
both compressed frames and the bit stream as inputs. In the recurrent cell of
WRQE, the memory and update signal are weighted by quality features to
reasonably leverage multi-frame information for enhancement. In our HLVC
approach, the hierarchical quality benefits the coding efficiency, since the
high quality information facilitates the compression and enhancement of low
quality frames at encoder and decoder sides, respectively. Finally, the
experiments validate that our HLVC approach advances the state-of-the-art of
deep video compression methods, and outperforms the "Low-Delay P (LDP) very
fast" mode of x265 in terms of both PSNR and MS-SSIM. The project page is at
https://github.com/RenYang-home/HLVC.Comment: Published in CVPR 2020; corrected a minor typo in the footnote of
Table 1; corrected Figure 1
Video Compression using Neural Weight Step and Huffman Coding Techniques
مقدمة:
تقترح هذه الورقة طريقة مخطط ضغط الفيديو الهرمي (HVCS) مع ثلاث طبقات هرمية من الجودة مع شبكة تحسين الجودة المتكررة (RQEN). تستخدم تقنيات ضغط الصور لضغط الإطارات في الطبقة الأولى، حيث تتمتع الإطارات بأعلى جودة. باستخدام إطار عالي الجودة كمرجع ، تم اقتراح شبكة الضغط العميق ثنائي الاتجاه (BDC) لضغط الإطار في الطبقة الثانية بجودة كبيرة. في الطبقة الثالثة، يتم استخدام جودة منخفضة لضغط الإطار باستخدام شبكة ضغط الحركة الواحدة(SMC) المعتمدة، والتي تقترح خريطة الحركة الواحدة لتقدير الحركة داخل إطارات متعددة. نتيجة لذلك ، يوفر SMC معلومات الحركة باستخدام عدد أقل من البتات. في مرحلة فك التشفير ، يتم تطوير شبكة تحسين الجودة المتكررة ((RQEN المرجحة لأخذ كل من تدفق البتات والإطارات المضغوطة كمدخلات. في خلية RQEN ، يتم ترجيح إشارة التحديث والذاكرة باستخدام ميزات الجودة للتأثير بشكل إيجابي على معلومات الإطارات المتعددة ...
طرق العمل:
يوضح الجدولان 1 و 2 تمثيل القيم الناتجة لتشويه المعدل في مجموعتي بيانات الفيديو. كما ذكرنا سابقا ، يتم استخدام PSNR و MS-SSIM لتقييم الجودة، حيث يتم حساب معدلات البتات باستخدام بت لكل بكسل(bpp) . يوضح الجدول 1 أداء PSNR، حيث يظهرون أداء PSNR أفضل لنموذج الضغط المقترح من الطرق الأخرى مثل Chao et al [7] أو الطرق المحسنة [1]. بالإضافة إلى ذلك ، يتفوقون في تطبيق H.265 على مجموعة بيانات JCT-VC القياسية. على الجانب الآخر ، أسفر مخطط الضغط المقترح عن أداء معدل بت أفضل من تطبيق H.265 على UVG. كما هو الحال في الجدول 2 ، قدم تقييم MS-SSIM أداء أفضل للمخطط المقترح من جميع النهج المستفادة الأخرى، حيث وصل إلى أداء أفضل من H.264 و .H.265 نظرا لأداء معدل البت على UVG ، يتمتع Lee et al. [11] بأداء مماثل، وحقق Guo et al [10] أداء أقل من H.265. التقديم على JCT-VC ، DVC [10] يمكن مقارنته فقط ب H.265 . على العكس من ذلك ، فإن أداء تشويه معدل HVCS له أداء أفضل واضح من H.265. علاوة على ذلك، يتم حساب معدل بت دلتا BjꝊntegaard (BDBR) [47] أيضا اعتمادا على H.265. يحسب مقياس BDBR متوسط الفرق في معدل البت مع الأخذ في الاعتبار مرساة H.265 ، حيث يشار إلى أداء أفضل على القيم المنخفضة ل BDBR [48] . يحسب مقياس BDBR متوسط الفرق في معدل البت مع الأخذ في الاعتبار مرساة H.265، حيث يشار إلى أداء أفضل على القيم المنخفضة ل BDBR [48]. في الجدول 3، يتم توضيح أداء BDBR اعتمادا على PSNR و MS-SSIM ، حيث يشار إلى تخفيض معدل البتات بالنظر إلى المرساة بأرقام سالبة معروضة. تتفوق هذه النتائج على أداء H.265، حيث تمثل الأرقام الجريئة أفضل النتائج التي تم تحقيقها من خلال الأساليب المستفادة. قدم الجدول 3 مقارنة عادلة حول تقنيات DVC المحسنة (MS-SSIM & PSNR) [10] مع الأخذ في الاعتبار المرساة H.265.
الاستنتاجات:
يقترح هذا العمل مخطط ضغط فيديو مستفاد باستخدام جودة الإطار الهرمي مع التحسين المتكرر. على وجه التحديد، يقترح هذا العمل تقسيم الإطارات إلى مستويات هرمية 1 و 2 و 3 في انخفاض الجودة. بالنسبة للطبقة الأولى، يتم اقتراح طرق ضغط الصور، مع اقتراح BDC وSMC للطبقات 2 و 3 على التوالي. تم تطوير شبكة RQEN بإطارات مضغوطة بجودة الإطار ومعلومات معدل البت كمدخلات لتحسين الإطارات المتعددة. أثبتت النتائج التجريبية كفاءة مخطط ضغط HVCS المقترح. وبالمثل مع تقنيات الضغط الأخرى ، يتم تعيين هيكل الإطار يدويا في هذا المخطط. يمكن تحقيق توصية واعدة للعمل المستقبلي من خلال تطوير شبكات DNN التي يتم تعلمها تلقائيا للتنبؤ والتسلسل الهرمي.Background:
This paper proposes a Hierarchical Video Compression Scheme (HVCS) method with three hierarchical layers of quality with Recurrent Quality Enhancement (RQEN) network. Image compression techniques are used to compress frames in the first layer, where frames have the highest quality. Using high-quality frame as a reference, the Bi-Directional Deep Compression (BDC) network is proposed for frame compression in the second layer with considerable quality. In the third layer, low quality is used for frame compression using adopted Single Motion Compression (SMC) network, which proposes the single motion map for motion estimation within multiple frames. As a result, SMC provide motion information using fewer bits. In decoding stage, a weighted Recurrent Quality Enhancement (RQEN) network is developed to take both bit stream and the compressed frames as inputs. In RQEN cell, the update signal and memory are weighted using quality features to positively influence information of multi-frame for enhancement. In this paper, HVCS adopts hierarchical quality to benefit the efficiency of frame coding, whereas high-quality information improves frame compression and enhances the low-quality frames at encoding and decoding stages, respectively. Experimental results validate that proposed HVCS approach overcomes the state-of-the-art of compression methods.
Materials and Methods:
Tables 1& 2 illustrate representing yielded values for rate-distortion on both video datasets. As aforementioned, PSNR and MS-SSIM are used for quality evaluation, where bit-rates are calculated using bits per pixel (bpp). Table 1 illustrates PSNR performance, where they show better PSNR performance for the proposed compression model than other methods such as Chao et al [7] or optimized methods [1]. In addition, they outperform applying H.265 on standard JCT-VC dataset. On the other side, proposed compression scheme yielded better bit-rate performance than applying H.265 on UVG. As in Table 2, the MS-SSIM evaluation provided better performance of proposed scheme than all other learned approaches, where it reached better performance than H.264 and H.265. Due to bit-rate performance on UVG, Lee et al. [11] has comparable performance, and Guo et al [10] yielded lower performance than H.265. Applying on JCT-VC, DVC [10] is only comparable with H.265. On the opposite, the preformance of HVCS-rate-distortion have obvious better performance than H.265.
Furthermore, BjꝊntegaard Delta Bit-Rate (BDBR) [47] is also computed depending on H.265. A BDBR measure computes the average difference of bit-rate considering H.265 anchor, where better performance is indicated on lower values of BDBR [48]. In Table 3, BDBR performance is illustrated depending on PSNR and MS-SSIM, in which, bit-rate reduction considering the anchor is indicated by showed negative numbers. Such results outperform H.265 performance, where bold numbers represent best yielded results by learned methods. Table 3 provided a fair comparison on (MS-SSIM & PSNR) optimized techniques DVC [10] considering the anchor H.265.
Results:
Tables 1& 2 illustrate representing yielded values for rate-distortion on both video datasets. As aforementioned, PSNR and MS-SSIM are used for quality evaluation, where bit-rates are calculated using bits per pixel (bpp). Table 1 illustrates PSNR performance, where they show better PSNR performance for the proposed compression model than other methods such as Chao et al [7] or optimized methods [1]. In addition, they outperform applying H.265 on standard JCT-VC dataset. On the other side, proposed compression scheme yielded better bit-rate performance than applying H.265 on UVG. As in Table 2, the MS-SSIM evaluation provided better performance of proposed scheme than all other learned approaches, where it reached better performance than H.264 and H.265. Due to bit-rate performance on UVG, Lee et al. [11] has comparable performance, and Guo et al [10] yielded lower performance than H.265. Applying on JCT-VC, DVC [10] is only comparable with H.265. On the opposite, the preformance of HVCS-rate-distortion have obvious better performance than H.265.
Furthermore, BjꝊntegaard Delta Bit-Rate (BDBR) [47] is also computed depending on H.265. A BDBR measure computes the average difference of bit-rate considering H.265 anchor, where better performance is indicated on lower values of BDBR [48]. In Table 3, BDBR performance is illustrated depending on PSNR and MS-SSIM, in which, bit-rate reduction considering the anchor is indicated by showed negative numbers. Such results outperform H.265 performance, where bold numbers represent best yielded results by learned methods. Table 3 provided a fair comparison on (MS-SSIM & PSNR) optimized techniques DVC [10] considering the anchor H.265.
Conclusion:
This work proposes a learned video compression scheme utilizing the hierarchical frame quality with recurrent enhancement. Specifically, this work proposes dividing frames into hierarchical levels 1, 2 and 3 in decreasing quality. For the first layer, image compression methods are proposed, while proposing BDC and SMC for layers 2 and 3 respectively. RQEN network is developed with frame quality compressed frames and bit-rate information as inputs for multi-frame enhancement. Experimental results validated the efficiency of proposed HVCS compression scheme.
Similarly with other compression techniques, frame structure is manually set the in this scheme. A promising recommendation for future work can be accomplished by developing DNN networks which are automatically learned for the prediction and hierarchy
Spatial-temporal autoencoder with attention network for video compression
Deep learning-based approaches are now state of the art in numerous tasks, including video compression, and are having a revolutionary influence in video processing. Recently, learned video compression methods exhibit a fast development trend with promising results. In this paper, taking advantage of the powerful non-linear representation ability of neural networks, we replace each standard component of video compression with a neural network. We propose a spatial-temporal video compression network (STVC) using the spatial-temporal priors with an attention module (STPA). On the one hand, joint spatial-temporal priors are used for generating latent representations and reconstructing compressed outputs because efficient temporal and spatial information representation plays a crucial role in video coding. On the other hand, we also added an efficient and effective Attention module such that the model pays more effort on restoring the artifact-rich areas. Moreover, we formalize the rate-distortion optimization into a single loss function, in which the network learns to leverage the Spatial-temporal redundancy presented in the frames and decreases the bit rate while maintaining visual quality in the decoded frames. The experiment results show that our approach delivers the state-of-the-art learning video compression performance in terms of MS-SSIM and PSNR
Advanced methods and deep learning for video and satellite data compression
L'abstract è presente nell'allegato / the abstract is in the attachmen
Spatial-Temporal Autoencoder with Attention Network for Video Compression
Deep learning-based approaches are now state of the art in
numerous tasks, including video compression, and are having a revolutionary
influence in video processing. Recently, learned video compression methods
exhibit a fast development trend with promising results. In this paper, taking
advantage of the powerful non-linear representation ability of neural networks,
we replace each standard component of video compression with a neural network.
We propose a spatial-temporal video compression network (STVC) using the
spatial-temporal priors with an attention module (STPA). On the one hand, joint
spatial-temporal priors are used for generating latent representations and
reconstructing compressed outputs because efficient temporal and spatial
information representation plays a crucial role in video coding. On the other hand,
we also added an efficient and effective Attention module such that the model
pays more effort on restoring the artifact-rich areas. Moreover, we formalize the
rate-distortion optimization into a single loss function, in which the network
learns to leverage the Spatial-temporal redundancy presented in the frames and
decreases the bit rate while maintaining visual quality in the decoded frames. The
experiment results show that our approach delivers the state-of-the-art learning
video compression performance in terms of MS-SSIM and PSNR
Applications in Electronics Pervading Industry, Environment and Society
This book features the manuscripts accepted for the Special Issue “Applications in Electronics Pervading Industry, Environment and Society—Sensing Systems and Pervasive Intelligence” of the MDPI journal Sensors. Most of the papers come from a selection of the best papers of the 2019 edition of the “Applications in Electronics Pervading Industry, Environment and Society” (APPLEPIES) Conference, which was held in November 2019. All these papers have been significantly enhanced with novel experimental results. The papers give an overview of the trends in research and development activities concerning the pervasive application of electronics in industry, the environment, and society. The focus of these papers is on cyber physical systems (CPS), with research proposals for new sensor acquisition and ADC (analog to digital converter) methods, high-speed communication systems, cybersecurity, big data management, and data processing including emerging machine learning techniques. Physical implementation aspects are discussed as well as the trade-off found between functional performance and hardware/system costs
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity