28 research outputs found
Recommended from our members
Visibility metrics and their applications in visually lossless image compression
Visibility metrics are image metrics that predict the probability that a human observer can detect differences between a pair of images. These metrics can provide localized information in the form of visibility maps, in which each value represents a probability of detection. An important application of the visibility metric is visually lossless image compression that aims at compressing a given image to the lowest fraction of bit per pixel while keeping the compression artifacts invisible at the same time.
In previous works, most visibility metrics were modeled based on largely simplified assumptions and mathematical models of human visual systems. This approach generally fits well into experimental data measured with simple stimuli, such as Gabor patches. However, it cannot predict complex non-linear effects, such as contrast masking in natural images, particularly well. To predict visibility of image differences accurately, we collected the largest visibility dataset under fixed viewing conditions for calibrating existing visibility metrics and proposed a deep neural network-based visibility metric. We demonstrated in our experiments that the deep neural network-based visibility metric significantly outperformed existing visibility metrics.
However, the deep neural network-based visibility metric cannot predict visibility under varying viewing conditions, such as display brightness and viewing distances that have great impacts on the visibility of distortions. To extend the deep neural network-based visibility metric to varying viewing conditions, we collected the largest visibility dataset under varying display brightness and viewing distances. We proposed incorporating white-box modules, in other words, luminance masking and viewing distance adaptation, into the black-box deep neural network, and we found that the combination of white-box modules and black-box deep neural networks could generalize our proposed visibility metric to varying viewing conditions.
To demonstrate the application of our proposed deep neural network-based visibility metric to visually lossless image compression, we collected the visually lossless image compression dataset under fixed viewing conditions and significantly improved the deep neural network-based visibility metric's accuracy of predicting visually lossless image compression threshold by pre-training the visibility metric with a synthetic dataset generated by the state-of-the-art white-box visibility metric---HDR-VDP \cite{Mantiuk2011}. In a large-scale study of 1000 images, we found that with our improved visibility metric, we can save around 60\% to 70\% bits for visually lossless image compression encoding as compared to the default visually lossless quality level of 90.
Because predicting image visibility and predicting image quality are closely related research topics, we also proposed a trained perceptually uniform transform for high dynamic range images and videos quality assessments by training a perceptual encoding function on a set of subjective quality assessment datasets. We have shown that when combining the trained perceptual encoding function with standard dynamic range image quality metrics, such as peak-signal-noise-ratio (PSNR), better performance was achieved compared to the untrained version
Amata: An Annealing Mechanism for Adversarial Training Acceleration
Despite the empirical success in various domains, it has been revealed that
deep neural networks are vulnerable to maliciously perturbed input data that
much degrade their performance. This is known as adversarial attacks. To
counter adversarial attacks, adversarial training formulated as a form of
robust optimization has been demonstrated to be effective. However, conducting
adversarial training brings much computational overhead compared with standard
training. In order to reduce the computational cost, we propose an annealing
mechanism, Amata, to reduce the overhead associated with adversarial training.
The proposed Amata is provably convergent, well-motivated from the lens of
optimal control theory and can be combined with existing acceleration methods
to further enhance performance. It is demonstrated that on standard datasets,
Amata can achieve similar or better robustness with around 1/3 to 1/2 the
computational time compared with traditional methods. In addition, Amata can be
incorporated into other adversarial training acceleration algorithms (e.g.
YOPO, Free, Fast, and ATTA), which leads to further reduction in computational
time on large-scale problems.Comment: accepted by AAA
Achieving Adversarial Robustness via Sparsity
Network pruning has been known to produce compact models without much
accuracy degradation. However, how the pruning process affects a network's
robustness and the working mechanism behind remain unresolved. In this work, we
theoretically prove that the sparsity of network weights is closely associated
with model robustness. Through experiments on a variety of adversarial pruning
methods, we find that weights sparsity will not hurt but improve robustness,
where both weights inheritance from the lottery ticket and adversarial training
improve model robustness in network pruning. Based on these findings, we
propose a novel adversarial training method called inverse weights inheritance,
which imposes sparse weights distribution on a large network by inheriting
weights from a small network, thereby improving the robustness of the large
network
G-NAS: Generalizable Neural Architecture Search for Single Domain Generalization Object Detection
In this paper, we focus on a realistic yet challenging task, Single Domain
Generalization Object Detection (S-DGOD), where only one source domain's data
can be used for training object detectors, but have to generalize multiple
distinct target domains. In S-DGOD, both high-capacity fitting and
generalization abilities are needed due to the task's complexity.
Differentiable Neural Architecture Search (NAS) is known for its high capacity
for complex data fitting and we propose to leverage Differentiable NAS to solve
S-DGOD. However, it may confront severe over-fitting issues due to the feature
imbalance phenomenon, where parameters optimized by gradient descent are biased
to learn from the easy-to-learn features, which are usually non-causal and
spuriously correlated to ground truth labels, such as the features of
background in object detection data. Consequently, this leads to serious
performance degradation, especially in generalizing to unseen target domains
with huge domain gaps between the source domain and target domains. To address
this issue, we propose the Generalizable loss (G-loss), which is an OoD-aware
objective, preventing NAS from over-fitting by using gradient descent to
optimize parameters not only on a subset of easy-to-learn features but also the
remaining predictive features for generalization, and the overall framework is
named G-NAS. Experimental results on the S-DGOD urban-scene datasets
demonstrate that the proposed G-NAS achieves SOTA performance compared to
baseline methods. Codes are available at https://github.com/wufan-cse/G-NAS.Comment: Accepted by AAAI2
DecAug: Out-of-Distribution Generalization via Decomposed Feature Representation and Semantic Augmentation
While deep learning demonstrates its strong ability to handle independent and
identically distributed (IID) data, it often suffers from out-of-distribution
(OoD) generalization, where the test data come from another distribution
(w.r.t. the training one). Designing a general OoD generalization framework to
a wide range of applications is challenging, mainly due to possible correlation
shift and diversity shift in the real world. Most of the previous approaches
can only solve one specific distribution shift, such as shift across domains or
the extrapolation of correlation. To address that, we propose DecAug, a novel
decomposed feature representation and semantic augmentation approach for OoD
generalization. DecAug disentangles the category-related and context-related
features. Category-related features contain causal information of the target
object, while context-related features describe the attributes, styles,
backgrounds, or scenes, causing distribution shifts between training and test
data. The decomposition is achieved by orthogonalizing the two gradients
(w.r.t. intermediate features) of losses for predicting category and context
labels. Furthermore, we perform gradient-based augmentation on context-related
features to improve the robustness of the learned representations. Experimental
results show that DecAug outperforms other state-of-the-art methods on various
OoD datasets, which is among the very few methods that can deal with different
types of OoD generalization challenges.Comment: Accepted by AAAI202
Bayesian adversarial learning
Deep neural networks have been known to be vulnerable to adversarial attacks, raising lots of security concerns in the practical deployment. Popular defensive approaches can be formulated as a (distributionally) robust optimization problem, which minimizes a "point estimate" of worst-case loss derived from either per-datum perturbation or adversary data-generating distribution within certain predefined constraints. This point estimate ignores potential test adversaries that are beyond the pre-defined constraints. The model robustness might deteriorate sharply in the scenario of stronger test adversarial data, fn this work, a novel robust training framework is proposed to alleviate this issue, Bayesian Robust Learning, in which a distribution is put on the adversarial data-generating distribution to account for the uncertainty of the adversarial data-generating process. The uncertainty directly helps to consider the potential adversaries that are stronger than the point estimate in the cases of distributionally robust optimization. The uncertainty of model parameters is also incorporated to accommodate the full Bayesian framework. We design a scalable Markov Chain Monte Carlo sampling strategy to obtain the posterior distribution over model parameters. Various experiments are conducted to verify the superiority of BAL over existing adversarial training methods. The code for BAL is available at h t t p s: / / t i n y u r l . com/yexsaewr.</p