5 research outputs found

    Adaptive Semantic Communications: Overfitting the Source and Channel for Profit

    Full text link
    Most semantic communication systems leverage deep learning models to provide end-to-end transmission performance surpassing the established source and channel coding approaches. While, so far, research has mainly focused on architecture and model improvements, but such a model trained over a full dataset and ergodic channel responses is unlikely to be optimal for every test instance. Due to limitations on the model capacity and imperfect optimization and generalization, such learned models will be suboptimal especially when the testing data distribution or channel response is different from that in the training phase, as is likely to be the case in practice. To tackle this, in this paper, we propose a novel semantic communication paradigm by leveraging the deep learning model's overfitting property. Our model can for instance be updated after deployment, which can further lead to substantial gains in terms of the transmission rate-distortion (RD) performance. This new system is named adaptive semantic communication (ASC). In our ASC system, the ingredients of wireless transmitted stream include both the semantic representations of source data and the adapted decoder model parameters. Specifically, we take the overfitting concept to the extreme, proposing a series of ingenious methods to adapt the semantic codec or representations to an individual data or channel state instance. The whole ASC system design is formulated as an optimization problem whose goal is to minimize the loss function that is a tripartite tradeoff among the data rate, model rate, and distortion terms. The experiments (including user study) verify the effectiveness and efficiency of our ASC system. Notably, the substantial gain of our overfitted coding paradigm can catalyze semantic communication upgrading to a new era

    Frequency Disentangled Features in Neural Image Compression

    Full text link
    The design of a neural image compression network is governed by how well the entropy model matches the true distribution of the latent code. Apart from the model capacity, this ability is indirectly under the effect of how close the relaxed quantization is to the actual hard quantization. Optimizing the parameters of a rate-distortion variational autoencoder (R-D VAE) is ruled by this approximated quantization scheme. In this paper, we propose a feature-level frequency disentanglement to help the relaxed scalar quantization achieve lower bit rates by guiding the high entropy latent features to include most of the low-frequency texture of the image. In addition, to strengthen the de-correlating power of the transformer-based analysis/synthesis transform, an augmented self-attention score calculation based on the Hadamard product is utilized during both encoding and decoding. Channel-wise autoregressive entropy modeling takes advantage of the proposed frequency separation as it inherently directs high-informational low-frequency channels to the first chunks and conditions the future chunks on it. The proposed network not only outperforms hand-engineered codecs, but also neural network-based codecs built on computation-heavy spatially autoregressive entropy models.Comment: Accepted to 30th^{th} IEEE International Conference on Image Processing (ICIP 2023

    Learning-based Wavelet-like Transforms For Fully Scalable and Accessible Image Compression

    Full text link
    The goal of this thesis is to improve the existing wavelet transform with the aid of machine learning techniques, so as to enhance coding efficiency of wavelet-based image compression frameworks, such as JPEG 2000. In this thesis, we first propose to augment the conventional base wavelet transform with two additional learned lifting steps -- a high-to-low step followed by a low-to-high step. The high-to-low step suppresses aliasing in the low-pass band by using the detail bands at the same resolution, while the low-to-high step aims to further remove redundancy from detail bands by using the corresponding low-pass band. These two additional steps reduce redundancy (notably aliasing information) amongst the wavelet subbands, and also improve the visual quality of reconstructed images at reduced resolutions. To train these two networks in an end-to-end fashion, we develop a backward annealing approach to overcome the non-differentiability of the quantization and cost functions during back-propagation. Importantly, the two additional networks share a common architecture, named a proposal-opacity topology, which is inspired and guided by a specific theoretical argument related to geometric flow. This particular network topology is compact and with limited non-linearities, allowing a fully scalable system; one pair of trained network parameters are applied for all levels of decomposition and for all bit-rates of interest. By employing the additional lifting networks within the JPEG2000 image coding standard, we can achieve up to 17.4% average BD bit-rate saving over a wide range of bit-rates, while retaining the quality and resolution scalability features of JPEG2000. Built upon the success of the high-to-low and low-to-high steps, we then study more broadly the extension of neural networks to all lifting steps that correspond to the base wavelet transform. The purpose of this comprehensive study is to understand what is the most effective way to develop learned wavelet-like transforms for highly scalable and accessible image compression. Specifically, we examine the impact of the number of learned lifting steps, the number of layers and the number of channels in each learned lifting network, and kernel support in each layer. To facilitate the study, we develop a generic training methodology that is simultaneously appropriate to all lifting structures considered. Experimental results ultimately suggest that to improve the existing wavelet transform, it is more profitable to augment a larger wavelet transform with more diverse high-to-low and low-to-high steps, rather than developing deep fully learned lifting structures

    Task-specific and interpretable feature learning

    Get PDF
    Deep learning models have had tremendous impacts in recent years, while a question has been raised by many: Is deep learning just a triumph of empiricism? There has been emerging interest in reducing the gap between the theoretical soundness and interpretability, and the empirical success of deep models. This dissertation provides a comprehensive discussion on bridging traditional model-based learning approaches that emphasize problem-specific reasoning, and deep models that allow for larger learning capacity. The overall goal is to devise the next-generation feature learning architectures that are: 1) task-specific, namely, optimizing the entire pipeline from end to end while taking advantage of available prior knowledge and domain expertise; and 2) interpretable, namely, being able to learn a representation consisting of semantically sensible variables, and to display predictable behaviors. This dissertation starts by showing how the classical sparse coding models could be improved in a task-specific way, by formulating the entire pipeline as bi-level optimization. Then, it mainly illustrates how to incorporate the structure of classical learning models, e.g., sparse coding, into the design of deep architectures. A few concrete model examples are presented, ranging from the ℓ0\ell_0 and ℓ1\ell_1 sparse approximation models, to the ℓ∞\ell_\infty constrained model and the dual-sparsity model. The analytic tools in the optimization problems can be translated to guide the architecture design and performance analysis of deep models. As a result, those customized deep models demonstrate improved performance, intuitive interpretation, and efficient parameter initialization. On the other hand, deep networks are shown to be analogous to brain mechanisms. They exhibit the ability to describe semantic content from the primitive level to the abstract level. This dissertation thus also presents a preliminary investigation of the synergy between feature learning with cognitive science and neuroscience. Two novel application domains, image aesthetics assessment and brain encoding, are explored, with promising preliminary results achieved

    Annealed learning based block transforms for HEVC video coding

    No full text
    International audienc
    corecore