3,569 research outputs found
Extreme Image Compression using Fine-tuned VQGANs
Recent advances in generative compression methods have demonstrated
remarkable progress in enhancing the perceptual quality of compressed data,
especially in scenarios with low bitrates. However, their efficacy and
applicability to achieve extreme compression ratios ( bpp) remain
constrained. In this work, we propose a simple yet effective coding framework
by introducing vector quantization (VQ)--based generative models into the image
compression domain. The main insight is that the codebook learned by the VQGAN
model yields a strong expressive capacity, facilitating efficient compression
of continuous information in the latent space while maintaining reconstruction
quality. Specifically, an image can be represented as VQ-indices by finding the
nearest codeword, which can be encoded using lossless compression methods into
bitstreams. We propose clustering a pre-trained large-scale codebook into
smaller codebooks through the K-means algorithm, yielding variable bitrates and
different levels of reconstruction quality within the coding framework.
Furthermore, we introduce a transformer to predict lost indices and restore
images in unstable environments. Extensive qualitative and quantitative
experiments on various benchmark datasets demonstrate that the proposed
framework outperforms state-of-the-art codecs in terms of perceptual
quality-oriented metrics and human perception at extremely low bitrates ( bpp). Remarkably, even with the loss of up to of indices, the
images can be effectively restored with minimal perceptual loss.Comment: Generative Compression, Extreme Compression, VQGANs, Low Bitrat
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks
The last decade of machine learning has seen drastic increases in scale and
capabilities. Deep neural networks (DNNs) are increasingly being deployed in
the real world. However, they are difficult to analyze, raising concerns about
using them without a rigorous understanding of how they function. Effective
tools for interpreting them will be important for building more trustworthy AI
by helping to identify problems, fix bugs, and improve basic understanding. In
particular, "inner" interpretability techniques, which focus on explaining the
internal components of DNNs, are well-suited for developing a mechanistic
understanding, guiding manual modifications, and reverse engineering solutions.
Much recent work has focused on DNN interpretability, and rapid progress has
thus far made a thorough systematization of methods difficult. In this survey,
we review over 300 works with a focus on inner interpretability tools. We
introduce a taxonomy that classifies methods by what part of the network they
help to explain (weights, neurons, subnetworks, or latent representations) and
whether they are implemented during (intrinsic) or after (post hoc) training.
To our knowledge, we are also the first to survey a number of connections
between interpretability research and work in adversarial robustness, continual
learning, modularity, network compression, and studying the human visual
system. We discuss key challenges and argue that the status quo in
interpretability research is largely unproductive. Finally, we highlight the
importance of future work that emphasizes diagnostics, debugging, adversaries,
and benchmarking in order to make interpretability tools more useful to
engineers in practical applications
Semiotic Aggregation in Deep Learning
Convolutional neural networks utilize a hierarchy of neural network layers. The statistical aspects of information concentration in successive layers can bring an insight into the feature abstraction process. We analyze the saliency maps of these layers from the perspective of semiotics, also known as the study of signs and sign-using behavior. In computational semiotics, this aggregation operation (known as superization) is accompanied by a decrease of spatial entropy: signs are aggregated into supersign. Using spatial entropy, we compute the information content of the saliency maps and study the superization processes which take place between successive layers of the network. In our experiments, we visualize the superization process and show how the obtained knowledge can be used to explain the neural decision model. In addition, we attempt to optimize the architecture of the neural model employing a semiotic greedy technique. To the extent of our knowledge, this is the first application of computational semiotics in the analysis and interpretation of deep neural networks
Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling
Video Coding for Machines (VCM) aims to compress visual signals for machine
analysis. However, existing methods only consider a few machines, neglecting
the majority. Moreover, the machine perceptual characteristics are not
effectively leveraged, leading to suboptimal compression efficiency. In this
paper, we introduce Satisfied Machine Ratio (SMR) to address these issues. SMR
statistically measures the quality of compressed images and videos for machines
by aggregating satisfaction scores from them. Each score is calculated based on
the difference in machine perceptions between original and compressed images.
Targeting image classification and object detection tasks, we build two
representative machine libraries for SMR annotation and construct a large-scale
SMR dataset to facilitate SMR studies. We then propose an SMR prediction model
based on the correlation between deep features differences and SMR.
Furthermore, we introduce an auxiliary task to increase the prediction accuracy
by predicting the SMR difference between two images in different quality
levels. Extensive experiments demonstrate that using the SMR models
significantly improves compression performance for VCM, and the SMR models
generalize well to unseen machines, traditional and neural codecs, and
datasets. In summary, SMR enables perceptual coding for machines and advances
VCM from specificity to generality. Code is available at
\url{https://github.com/ywwynm/SMR}
Combined AI Capabilities for Enhancing Maritime Safety in a Common Information Sharing Environment
The complexity of maritime traffic operations indicates an unprecedented necessity for joint introduction and exploitation of artificial intelligence (AI) technologies, that take advantage of the vast amount of vessels’ data, offered by disparate surveillance systems to face challenges at sea. This paper reviews the recent Big Data and AI technology implementations for enhancing the maritime safety level in the common information sharing environment (CISE) of the maritime agencies, including vessel behavior and anomaly monitoring, and ship collision risk assessment. Specifically, the trajectory fusion implemented with InSyTo module for soft information fusion and management toolbox, and the Early Notification module for Vessel Collision are presented within EFFECTOR Project. The focus is to elaborate technical architecture features of these modules and combined AI capabilities for achieving the desired interoperability and complementarity between maritime systems, aiming to provide better decision support and proper information to be distributed among CISE maritime safety stakeholders
- …