3,569 research outputs found

    Extreme Image Compression using Fine-tuned VQGANs

    Full text link
    Recent advances in generative compression methods have demonstrated remarkable progress in enhancing the perceptual quality of compressed data, especially in scenarios with low bitrates. However, their efficacy and applicability to achieve extreme compression ratios (<0.05<0.05 bpp) remain constrained. In this work, we propose a simple yet effective coding framework by introducing vector quantization (VQ)--based generative models into the image compression domain. The main insight is that the codebook learned by the VQGAN model yields a strong expressive capacity, facilitating efficient compression of continuous information in the latent space while maintaining reconstruction quality. Specifically, an image can be represented as VQ-indices by finding the nearest codeword, which can be encoded using lossless compression methods into bitstreams. We propose clustering a pre-trained large-scale codebook into smaller codebooks through the K-means algorithm, yielding variable bitrates and different levels of reconstruction quality within the coding framework. Furthermore, we introduce a transformer to predict lost indices and restore images in unstable environments. Extensive qualitative and quantitative experiments on various benchmark datasets demonstrate that the proposed framework outperforms state-of-the-art codecs in terms of perceptual quality-oriented metrics and human perception at extremely low bitrates (≤0.04\le 0.04 bpp). Remarkably, even with the loss of up to 20%20\% of indices, the images can be effectively restored with minimal perceptual loss.Comment: Generative Compression, Extreme Compression, VQGANs, Low Bitrat

    Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

    Full text link
    The last decade of machine learning has seen drastic increases in scale and capabilities. Deep neural networks (DNNs) are increasingly being deployed in the real world. However, they are difficult to analyze, raising concerns about using them without a rigorous understanding of how they function. Effective tools for interpreting them will be important for building more trustworthy AI by helping to identify problems, fix bugs, and improve basic understanding. In particular, "inner" interpretability techniques, which focus on explaining the internal components of DNNs, are well-suited for developing a mechanistic understanding, guiding manual modifications, and reverse engineering solutions. Much recent work has focused on DNN interpretability, and rapid progress has thus far made a thorough systematization of methods difficult. In this survey, we review over 300 works with a focus on inner interpretability tools. We introduce a taxonomy that classifies methods by what part of the network they help to explain (weights, neurons, subnetworks, or latent representations) and whether they are implemented during (intrinsic) or after (post hoc) training. To our knowledge, we are also the first to survey a number of connections between interpretability research and work in adversarial robustness, continual learning, modularity, network compression, and studying the human visual system. We discuss key challenges and argue that the status quo in interpretability research is largely unproductive. Finally, we highlight the importance of future work that emphasizes diagnostics, debugging, adversaries, and benchmarking in order to make interpretability tools more useful to engineers in practical applications

    Semiotic Aggregation in Deep Learning

    Get PDF
    Convolutional neural networks utilize a hierarchy of neural network layers. The statistical aspects of information concentration in successive layers can bring an insight into the feature abstraction process. We analyze the saliency maps of these layers from the perspective of semiotics, also known as the study of signs and sign-using behavior. In computational semiotics, this aggregation operation (known as superization) is accompanied by a decrease of spatial entropy: signs are aggregated into supersign. Using spatial entropy, we compute the information content of the saliency maps and study the superization processes which take place between successive layers of the network. In our experiments, we visualize the superization process and show how the obtained knowledge can be used to explain the neural decision model. In addition, we attempt to optimize the architecture of the neural model employing a semiotic greedy technique. To the extent of our knowledge, this is the first application of computational semiotics in the analysis and interpretation of deep neural networks

    Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling

    Full text link
    Video Coding for Machines (VCM) aims to compress visual signals for machine analysis. However, existing methods only consider a few machines, neglecting the majority. Moreover, the machine perceptual characteristics are not effectively leveraged, leading to suboptimal compression efficiency. In this paper, we introduce Satisfied Machine Ratio (SMR) to address these issues. SMR statistically measures the quality of compressed images and videos for machines by aggregating satisfaction scores from them. Each score is calculated based on the difference in machine perceptions between original and compressed images. Targeting image classification and object detection tasks, we build two representative machine libraries for SMR annotation and construct a large-scale SMR dataset to facilitate SMR studies. We then propose an SMR prediction model based on the correlation between deep features differences and SMR. Furthermore, we introduce an auxiliary task to increase the prediction accuracy by predicting the SMR difference between two images in different quality levels. Extensive experiments demonstrate that using the SMR models significantly improves compression performance for VCM, and the SMR models generalize well to unseen machines, traditional and neural codecs, and datasets. In summary, SMR enables perceptual coding for machines and advances VCM from specificity to generality. Code is available at \url{https://github.com/ywwynm/SMR}

    Combined AI Capabilities for Enhancing Maritime Safety in a Common Information Sharing Environment

    Get PDF
    The complexity of maritime traffic operations indicates an unprecedented necessity for joint introduction and exploitation of artificial intelligence (AI) technologies, that take advantage of the vast amount of vessels’ data, offered by disparate surveillance systems to face challenges at sea. This paper reviews the recent Big Data and AI technology implementations for enhancing the maritime safety level in the common information sharing environment (CISE) of the maritime agencies, including vessel behavior and anomaly monitoring, and ship collision risk assessment. Specifically, the trajectory fusion implemented with InSyTo module for soft information fusion and management toolbox, and the Early Notification module for Vessel Collision are presented within EFFECTOR Project. The focus is to elaborate technical architecture features of these modules and combined AI capabilities for achieving the desired interoperability and complementarity between maritime systems, aiming to provide better decision support and proper information to be distributed among CISE maritime safety stakeholders
    • …
    corecore