964 research outputs found
Adversarial Zoom Lens: A Novel Physical-World Attack to DNNs
Although deep neural networks (DNNs) are known to be fragile, no one has
studied the effects of zooming-in and zooming-out of images in the physical
world on DNNs performance. In this paper, we demonstrate a novel physical
adversarial attack technique called Adversarial Zoom Lens (AdvZL), which uses a
zoom lens to zoom in and out of pictures of the physical world, fooling DNNs
without changing the characteristics of the target object. The proposed method
is so far the only adversarial attack technique that does not add physical
adversarial perturbation attack DNNs. In a digital environment, we construct a
data set based on AdvZL to verify the antagonism of equal-scale enlarged images
to DNNs. In the physical environment, we manipulate the zoom lens to zoom in
and out of the target object, and generate adversarial samples. The
experimental results demonstrate the effectiveness of AdvZL in both digital and
physical environments. We further analyze the antagonism of the proposed data
set to the improved DNNs. On the other hand, we provide a guideline for defense
against AdvZL by means of adversarial training. Finally, we look into the
threat possibilities of the proposed approach to future autonomous driving and
variant attack ideas similar to the proposed attack
Efficient Cross-Task Prompt Tuning for Few-Shot Conversational Emotion Recognition
Emotion Recognition in Conversation (ERC) has been widely studied due to its
importance in developing emotion-aware empathetic machines. The rise of
pre-trained language models (PLMs) has further pushed the limit of ERC
performance. However, most recent works on ERC using PLMs are heavily
data-driven, and requires fine-tuning the entire PLMs. To improve both sample
and computational efficiency, we propose a derivative-free optimization method
called Cross-Task Prompt Tuning (CTPT) for few-shot conversational emotion
recognition. Unlike existing methods that learn independent knowledge from
individual tasks, CTPT leverages sharable cross-task knowledge by exploiting
external knowledge from other source tasks to improve learning performance
under the few-shot setting. Moreover, CTPT only needs to optimize a vector
under the low intrinsic dimensionality without gradient, which is highly
parameter-efficient compared with existing approaches. Experiments on five
different contextual conversation datasets demonstrate that our CTPT method has
superior results on both few-shot scenarios and zero-shot transfers.Comment: Findings of EMNLP 202
TreeMAN: Tree-enhanced Multimodal Attention Network for ICD Coding
ICD coding is designed to assign the disease codes to electronic health
records (EHRs) upon discharge, which is crucial for billing and clinical
statistics. In an attempt to improve the effectiveness and efficiency of manual
coding, many methods have been proposed to automatically predict ICD codes from
clinical notes. However, most previous works ignore the decisive information
contained in structured medical data in EHRs, which is hard to be captured from
the noisy clinical notes. In this paper, we propose a Tree-enhanced Multimodal
Attention Network (TreeMAN) to fuse tabular features and textual features into
multimodal representations by enhancing the text representations with
tree-based features via the attention mechanism. Tree-based features are
constructed according to decision trees learned from structured multimodal
medical data, which capture the decisive information about ICD coding. We can
apply the same multi-label classifier from previous text models to the
multimodal representations to predict ICD codes. Experiments on two MIMIC
datasets show that our method outperforms prior state-of-the-art ICD coding
approaches. The code is available at https://github.com/liu-zichen/TreeMAN
CG-HOI: Contact-Guided 3D Human-Object Interaction Generation
We propose CG-HOI, the first method to address the task of generating dynamic
3D human-object interactions (HOIs) from text. We model the motion of both
human and object in an interdependent fashion, as semantically rich human
motion rarely happens in isolation without any interactions. Our key insight is
that explicitly modeling contact between the human body surface and object
geometry can be used as strong proxy guidance, both during training and
inference. Using this guidance to bridge human and object motion enables
generating more realistic and physically plausible interaction sequences, where
the human body and corresponding object move in a coherent manner. Our method
first learns to model human motion, object motion, and contact in a joint
diffusion process, inter-correlated through cross-attention. We then leverage
this learned contact for guidance during inference synthesis of realistic,
coherent HOIs. Extensive evaluation shows that our joint contact-based
human-object interaction approach generates realistic and physically plausible
sequences, and we show two applications highlighting the capabilities of our
method. Conditioned on a given object trajectory, we can generate the
corresponding human motion without re-training, demonstrating strong
human-object interdependency learning. Our approach is also flexible, and can
be applied to static real-world 3D scene scans.Comment: Project page: https://cg-hoi.christian-diller.de Video:
https://www.youtube.com/watch?v=GNyQwTwZ15
- …