1,489 research outputs found
When Causal Intervention Meets Adversarial Examples and Image Masking for Deep Neural Networks
Discovering and exploiting the causality in deep neural networks (DNNs) are
crucial challenges for understanding and reasoning causal effects (CE) on an
explainable visual model. "Intervention" has been widely used for recognizing a
causal relation ontologically. In this paper, we propose a causal inference
framework for visual reasoning via do-calculus. To study the intervention
effects on pixel-level features for causal reasoning, we introduce pixel-wise
masking and adversarial perturbation. In our framework, CE is calculated using
features in a latent space and perturbed prediction from a DNN-based model. We
further provide the first look into the characteristics of discovered CE of
adversarially perturbed images generated by gradient-based methods
\footnote{~~https://github.com/jjaacckkyy63/Causal-Intervention-AE-wAdvImg}.
Experimental results show that CE is a competitive and robust index for
understanding DNNs when compared with conventional methods such as
class-activation mappings (CAMs) on the Chest X-Ray-14 dataset for
human-interpretable feature(s) (e.g., symptom) reasoning. Moreover, CE holds
promises for detecting adversarial examples as it possesses distinct
characteristics in the presence of adversarial perturbations.Comment: Noted our camera-ready version has changed the title. "When Causal
Intervention Meets Adversarial Examples and Image Masking for Deep Neural
Networks" as the v3 official paper title in IEEE Proceeding. Please use it in
your formal reference. Accepted at IEEE ICIP 2019. Pytorch code has released
on https://github.com/jjaacckkyy63/Causal-Intervention-AE-wAdvIm
Space Net Optimization
Most metaheuristic algorithms rely on a few searched solutions to guide later
searches during the convergence process for a simple reason: the limited
computing resource of a computer makes it impossible to retain all the searched
solutions. This also reveals that each search of most metaheuristic algorithms
is just like a ballpark guess. To help address this issue, we present a novel
metaheuristic algorithm called space net optimization (SNO). It is equipped
with a new mechanism called space net; thus, making it possible for a
metaheuristic algorithm to use most information provided by all searched
solutions to depict the landscape of the solution space. With the space net, a
metaheuristic algorithm is kind of like having a ``vision'' on the solution
space. Simulation results show that SNO outperforms all the other metaheuristic
algorithms compared in this study for a set of well-known single objective
bound constrained problems in most cases.Comment: 12 pages, 6 figure
Treatment Learning Causal Transformer for Noisy Image Classification
Current top-notch deep learning (DL) based vision models are primarily based
on exploring and exploiting the inherent correlations between training data
samples and their associated labels. However, a known practical challenge is
their degraded performance against "noisy" data, induced by different
circumstances such as spurious correlations, irrelevant contexts, domain shift,
and adversarial attacks. In this work, we incorporate this binary information
of "existence of noise" as treatment into image classification tasks to improve
prediction accuracy by jointly estimating their treatment effects. Motivated
from causal variational inference, we propose a transformer-based architecture,
Treatment Learning Causal Transformer (TLT), that uses a latent generative
model to estimate robust feature representations from current observational
input for noise image classification. Depending on the estimated noise level
(modeled as a binary treatment factor), TLT assigns the corresponding inference
network trained by the designed causal loss for prediction. We also create new
noisy image datasets incorporating a wide range of noise factors (e.g., object
masking, style transfer, and adversarial perturbation) for performance
benchmarking. The superior performance of TLT in noisy image classification is
further validated by several refutation evaluation metrics. As a by-product,
TLT also improves visual salience methods for perceiving noisy images.Comment: Accepted to IEEE WACV 2023. The first version was finished in May
201
Interpretable Self-Attention Temporal Reasoning for Driving Behavior Understanding
Performing driving behaviors based on causal reasoning is essential to ensure
driving safety. In this work, we investigated how state-of-the-art 3D
Convolutional Neural Networks (CNNs) perform on classifying driving behaviors
based on causal reasoning. We proposed a perturbation-based visual explanation
method to inspect the models' performance visually. By examining the video
attention saliency, we found that existing models could not precisely capture
the causes (e.g., traffic light) of the specific action (e.g., stopping).
Therefore, the Temporal Reasoning Block (TRB) was proposed and introduced to
the models. With the TRB models, we achieved the accuracy of ,
which outperform the state-of-the-art 3D CNNs from previous works. The
attention saliency also demonstrated that TRB helped models focus on the causes
more precisely. With both numerical and visual evaluations, we concluded that
our proposed TRB models were able to provide accurate driving behavior
prediction by learning the causal reasoning of the behaviors.Comment: Submitted to IEEE ICASSP 2020; Pytorch code will be released soo
Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting
We explore the ability of large language models (LLMs) to act as speech
recognition post-processors that perform rescoring and error correction. Our
first focus is on instruction prompting to let LLMs perform these task without
fine-tuning, for which we evaluate different prompting schemes, both zero- and
few-shot in-context learning, and a novel task activation prompting method that
combines causal instructions and demonstration to increase its context windows.
Next, we show that rescoring only by in-context learning with frozen LLMs
achieves results that are competitive with rescoring by domain-tuned LMs, using
a pretrained first-pass recognition system and rescoring output on two
out-of-domain tasks (ATIS and WSJ). By combining prompting techniques with
fine-tuning we achieve error rates below the N-best oracle level, showcasing
the generalization power of the LLMs.Comment: Accepted to IEEE Automatic Speech Recognition and Understanding
(ASRU) 2023. 8 pages. 2nd version revised from Sep 29th's versio
- …