295 research outputs found

    Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs

    Full text link
    Recently deep learning-based methods have been applied in image compression and achieved many promising results. In this paper, we propose an improved hybrid layered image compression framework by combining deep learning and the traditional image codecs. At the encoder, we first use a convolutional neural network (CNN) to obtain a compact representation of the input image, which is losslessly encoded by the FLIF codec as the base layer of the bit stream. A coarse reconstruction of the input is obtained by another CNN from the reconstructed compact representation. The residual between the input and the coarse reconstruction is then obtained and encoded by the H.265/HEVC-based BPG codec as the enhancement layer of the bit stream. Experimental results using the Kodak and Tecnick datasets show that the proposed scheme outperforms the state-of-the-art deep learning-based layered coding scheme and traditional codecs including BPG in both PSNR and MS-SSIM metrics across a wide range of bit rates, when the images are coded in the RGB444 domain.Comment: Submitted to Signal Processing: Image Communicatio

    MPAI-EEV: Standardization Efforts of Artificial Intelligence based End-to-End Video Coding

    Full text link
    The rapid advancement of artificial intelligence (AI) technology has led to the prioritization of standardizing the processing, coding, and transmission of video using neural networks. To address this priority area, the Moving Picture, Audio, and Data Coding by Artificial Intelligence (MPAI) group is developing a suite of standards called MPAI-EEV for "end-to-end optimized neural video coding." The aim of this AI-based video standard project is to compress the number of bits required to represent high-fidelity video data by utilizing data-trained neural coding technologies. This approach is not constrained by how data coding has traditionally been applied in the context of a hybrid framework. This paper presents an overview of recent and ongoing standardization efforts in this area and highlights the key technologies and design philosophy of EEV. It also provides a comparison and report on some primary efforts such as the coding efficiency of the reference model. Additionally, it discusses emerging activities such as learned Unmanned-Aerial-Vehicles (UAVs) video coding which are currently planned, under development, or in the exploration phase. With a focus on UAV video signals, this paper addresses the current status of these preliminary efforts. It also indicates development timelines, summarizes the main technical details, and provides pointers to further points of reference. The exploration experiment shows that the EEV model performs better than the state-of-the-art video coding standard H.266/VVC in terms of perceptual evaluation metric

    The Comprehensive Review of Neural Network: An Intelligent Medical Image Compression for Data Sharing

    Get PDF
    In the healthcare environment, digital images are the most commonly shared information. It has become a vital resource in health care services that facilitates decision-making and treatment procedures. The medical image requires large volumes of storage and the storage scale continues to grow because of the advancement of medical image technology. To enhance the interaction and coordination between healthcare institutions, the efficient exchange of medical information is necessary. Therefore, the sharing of the medical image with zero loss of information and efficiency needs to be guaranteed exactly. Image compression helps ensure that the purpose of sharing this data from a medical image must be as intelligent as possible to contain valuable information while at the same time minimizing unnecessary diagnostic information. Artificial Neural Network has been used to solve many issues in the processing of images. It has proved its dominance in the handling of noisy or incomplete image compression applications over traditional methods. It contributes to the resulting image by a high compression ratio and noise reduction. This paper reviews previous studies on the compression of intelligent medical images with the neural network approach to data sharing

    Improved Nonlinear Transform Source-Channel Coding to Catalyze Semantic Communications

    Full text link
    Recent deep learning methods have led to increased interest in solving high-efficiency end-to-end transmission problems. These methods, we call nonlinear transform source-channel coding (NTSCC), extract the semantic latent features of source signal, and learn entropy model to guide the joint source-channel coding with variable rate to transmit latent features over wireless channels. In this paper, we propose a comprehensive framework for improving NTSCC, thereby higher system coding gain, better model versatility, and more flexible adaptation strategy aligned with semantic guidance are all achieved. This new sophisticated NTSCC model is now ready to support large-size data interaction in emerging XR, which catalyzes the application of semantic communications. Specifically, we propose three useful improvement approaches. First, we introduce a contextual entropy model to better capture the spatial correlations among the semantic latent features, thereby more accurate rate allocation and contextual joint source-channel coding are developed accordingly to enable higher coding gain. On that basis, we further propose response network architectures to formulate versatile NTSCC, i.e., once-trained model supports various rates and channel states that benefits the practical deployment. Following this, we propose an online latent feature editing method to enable more flexible coding rate control aligned with some specific semantic guidance. By comprehensively applying the above three improvement methods for NTSCC, a deployment-friendly semantic coded transmission system stands out finally. Our improved NTSCC system has been experimentally verified to achieve considerable bandwidth saving versus the state-of-the-art engineered VTM + 5G LDPC coded transmission system with lower processing latency

    Learned Image Compression with Generalized Octave Convolution and Cross-Resolution Parameter Estimation

    Full text link
    The application of the context-adaptive entropy model significantly improves the rate-distortion (R-D) performance, in which hyperpriors and autoregressive models are jointly utilized to effectively capture the spatial redundancy of the latent representations. However, the latent representations still contain some spatial correlations. In addition, these methods based on the context-adaptive entropy model cannot be accelerated in the decoding process by parallel computing devices, e.g. FPGA or GPU. To alleviate these limitations, we propose a learned multi-resolution image compression framework, which exploits the recently developed octave convolutions to factorize the latent representations into the high-resolution (HR) and low-resolution (LR) parts, similar to wavelet transform, which further improves the R-D performance. To speed up the decoding, our scheme does not use context-adaptive entropy model. Instead, we exploit an additional hyper layer including hyper encoder and hyper decoder to further remove the spatial redundancy of the latent representation. Moreover, the cross-resolution parameter estimation (CRPE) is introduced into the proposed framework to enhance the flow of information and further improve the rate-distortion performance. An additional information-fidelity loss is proposed to the total loss function to adjust the contribution of the LR part to the final bit stream. Experimental results show that our method separately reduces the decoding time by approximately 73.35 % and 93.44 % compared with that of state-of-the-art learned image compression methods, and the R-D performance is still better than H.266/VVC(4:2:0) and some learning-based methods on both PSNR and MS-SSIM metrics across a wide bit rates.Comment: Accepted by signal processin

    Adaptive Semantic Communications: Overfitting the Source and Channel for Profit

    Full text link
    Most semantic communication systems leverage deep learning models to provide end-to-end transmission performance surpassing the established source and channel coding approaches. While, so far, research has mainly focused on architecture and model improvements, but such a model trained over a full dataset and ergodic channel responses is unlikely to be optimal for every test instance. Due to limitations on the model capacity and imperfect optimization and generalization, such learned models will be suboptimal especially when the testing data distribution or channel response is different from that in the training phase, as is likely to be the case in practice. To tackle this, in this paper, we propose a novel semantic communication paradigm by leveraging the deep learning model's overfitting property. Our model can for instance be updated after deployment, which can further lead to substantial gains in terms of the transmission rate-distortion (RD) performance. This new system is named adaptive semantic communication (ASC). In our ASC system, the ingredients of wireless transmitted stream include both the semantic representations of source data and the adapted decoder model parameters. Specifically, we take the overfitting concept to the extreme, proposing a series of ingenious methods to adapt the semantic codec or representations to an individual data or channel state instance. The whole ASC system design is formulated as an optimization problem whose goal is to minimize the loss function that is a tripartite tradeoff among the data rate, model rate, and distortion terms. The experiments (including user study) verify the effectiveness and efficiency of our ASC system. Notably, the substantial gain of our overfitted coding paradigm can catalyze semantic communication upgrading to a new era

    Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation

    Full text link
    Deep learning-based image compression has made great progresses recently. However, many leading schemes use serial context-adaptive entropy model to improve the rate-distortion (R-D) performance, which is very slow. In addition, the complexities of the encoding and decoding networks are quite high and not suitable for many practical applications. In this paper, we introduce four techniques to balance the trade-off between the complexity and performance. We are the first to introduce deformable convolutional module in compression framework, which can remove more redundancies in the input image, thereby enhancing compression performance. Second, we design a checkerboard context model with two separate distribution parameter estimation networks and different probability models, which enables parallel decoding without sacrificing the performance compared to the sequential context-adaptive model. Third, we develop an improved three-step knowledge distillation and training scheme to achieve different trade-offs between the complexity and the performance of the decoder network, which transfers both the final and intermediate results of the teacher network to the student network to help its training. Fourth, we introduce L1L_{1} regularization to make the numerical values of the latent representation more sparse. Then we only encode non-zero channels in the encoding and decoding process, which can greatly reduce the encoding and decoding time. Experiments show that compared to the state-of-the-art learned image coding scheme, our method can be about 20 times faster in encoding and 70-90 times faster in decoding, and our R-D performance is also 2.3%2.3 \% higher. Our method outperforms the traditional approach in H.266/VVC-intra (4:4:4) and some leading learned schemes in terms of PSNR and MS-SSIM metrics when testing on Kodak and Tecnick-40 datasets.Comment: Submitted to Trans. Journa

    Data-driven visual quality estimation using machine learning

    Get PDF
    Heutzutage werden viele visuelle Inhalte erstellt und sind zugĂ€nglich, was auf Verbesserungen der Technologie wie Smartphones und das Internet zurĂŒckzufĂŒhren ist. Es ist daher notwendig, die von den Nutzern wahrgenommene QualitĂ€t zu bewerten, um das Erlebnis weiter zu verbessern. Allerdings sind nur wenige der aktuellen QualitĂ€tsmodelle speziell fĂŒr höhere Auflösungen konzipiert, sagen mehr als nur den Mean Opinion Score vorher oder nutzen maschinelles Lernen. Ein Ziel dieser Arbeit ist es, solche maschinellen Modelle fĂŒr höhere Auflösungen mit verschiedenen DatensĂ€tzen zu trainieren und zu evaluieren. Als Erstes wird eine objektive Analyse der BildqualitĂ€t bei höheren Auflösungen durchgefĂŒhrt. Die Bilder wurden mit Video-Encodern komprimiert, hierbei weist AV1 die beste QualitĂ€t und Kompression auf. Anschließend werden die Ergebnisse eines Crowd-Sourcing-Tests mit einem Labortest bezĂŒglich BildqualitĂ€t verglichen. Weiterhin werden auf Deep Learning basierende Modelle fĂŒr die Vorhersage von Bild- und VideoqualitĂ€t beschrieben. Das auf Deep Learning basierende Modell ist aufgrund der benötigten Ressourcen fĂŒr die Vorhersage der VideoqualitĂ€t in der Praxis nicht anwendbar. Aus diesem Grund werden pixelbasierte VideoqualitĂ€tsmodelle vorgeschlagen und ausgewertet, die aussagekrĂ€ftige Features verwenden, welche Bild- und Bewegungsaspekte abdecken. Diese Modelle können zur Vorhersage von Mean Opinion Scores fĂŒr Videos oder sogar fĂŒr anderer Werte im Zusammenhang mit der VideoqualitĂ€t verwendet werden, wie z.B. einer Bewertungsverteilung. Die vorgestellte Modellarchitektur kann auf andere Videoprobleme angewandt werden, wie z.B. Videoklassifizierung, Vorhersage der QualitĂ€t von Spielevideos, Klassifikation von Spielegenres oder der Klassifikation von Kodierungsparametern. Ein wichtiger Aspekt ist auch die Verarbeitungszeit solcher Modelle. Daher wird ein allgemeiner Ansatz zur Beschleunigung von State-of-the-Art-VideoqualitĂ€tsmodellen vorgestellt, der zeigt, dass ein erheblicher Teil der Verarbeitungszeit eingespart werden kann, wĂ€hrend eine Ă€hnliche Vorhersagegenauigkeit erhalten bleibt. Die Modelle sind als Open Source veröffentlicht, so dass die entwickelten Frameworks fĂŒr weitere Forschungsarbeiten genutzt werden können. Außerdem können die vorgestellten AnsĂ€tze als Bausteine fĂŒr neuere Medienformate verwendet werden.Today a lot of visual content is accessible and produced, due to improvements in technology such as smartphones and the internet. This results in a need to assess the quality perceived by users to further improve the experience. However, only a few of the state-of-the-art quality models are specifically designed for higher resolutions, predict more than mean opinion score, or use machine learning. One goal of the thesis is to train and evaluate such machine learning models of higher resolutions with several datasets. At first, an objective evaluation of image quality in case of higher resolutions is performed. The images are compressed using video encoders, and it is shown that AV1 is best considering quality and compression. This evaluation is followed by the analysis of a crowdsourcing test in comparison with a lab test investigating image quality. Afterward, deep learning-based models for image quality prediction and an extension for video quality are proposed. However, the deep learning-based video quality model is not practically usable because of performance constrains. For this reason, pixel-based video quality models using well-motivated features covering image and motion aspects are proposed and evaluated. These models can be used to predict mean opinion scores for videos, or even to predict other video quality-related information, such as a rating distributions. The introduced model architecture can be applied to other video problems, such as video classification, gaming video quality prediction, gaming genre classification or encoding parameter estimation. Furthermore, one important aspect is the processing time of such models. Hence, a generic approach to speed up state-of-the-art video quality models is introduced, which shows that a significant amount of processing time can be saved, while achieving similar prediction accuracy. The models have been made publicly available as open source so that the developed frameworks can be used for further research. Moreover, the presented approaches may be usable as building blocks for newer media formats
    • 

    corecore