127 research outputs found
CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control
Intrinsic motivation is a promising exploration technique for solving
reinforcement learning tasks with sparse or absent extrinsic rewards. There
exist two technical challenges in implementing intrinsic motivation: 1) how to
design a proper intrinsic objective to facilitate efficient exploration; and 2)
how to combine the intrinsic objective with the extrinsic objective to help
find better solutions. In the current literature, the intrinsic objectives are
all designed in a task-agnostic manner and combined with the extrinsic
objective via simple addition (or used by itself for reward-free pre-training).
In this work, we show that these designs would fail in typical sparse-reward
continuous control tasks. To address the problem, we propose Constrained
Intrinsic Motivation (CIM) to leverage readily attainable task priors to
construct a constrained intrinsic objective, and at the same time, exploit the
Lagrangian method to adaptively balance the intrinsic and extrinsic objectives
via a simultaneous-maximization framework. We empirically show, on multiple
sparse-reward continuous control tasks, that our CIM approach achieves greatly
improved performance and sample efficiency over state-of-the-art methods.
Moreover, the key techniques of our CIM can also be plugged into existing
methods to boost their performances
Distilling Cognitive Backdoor Patterns within an Image
This paper proposes a simple method to distill and detect backdoor patterns
within an image: \emph{Cognitive Distillation} (CD). The idea is to extract the
"minimal essence" from an input image responsible for the model's prediction.
CD optimizes an input mask to extract a small pattern from the input image that
can lead to the same model output (i.e., logits or deep features). The
extracted pattern can help understand the cognitive mechanism of a model on
clean vs. backdoor images and is thus called a \emph{Cognitive Pattern} (CP).
Using CD and the distilled CPs, we uncover an interesting phenomenon of
backdoor attacks: despite the various forms and sizes of trigger patterns used
by different attacks, the CPs of backdoor samples are all surprisingly and
suspiciously small. One thus can leverage the learned mask to detect and remove
backdoor examples from poisoned training datasets. We conduct extensive
experiments to show that CD can robustly detect a wide range of advanced
backdoor attacks. We also show that CD can potentially be applied to help
detect potential biases from face datasets. Code is available at
\url{https://github.com/HanxunH/CognitiveDistillation}.Comment: ICLR202
Unlearnable Examples For Time Series
Unlearnable examples (UEs) refer to training samples modified to be
unlearnable to Deep Neural Networks (DNNs). These examples are usually
generated by adding error-minimizing noises that can fool a DNN model into
believing that there is nothing (no error) to learn from the data. The concept
of UE has been proposed as a countermeasure against unauthorized data
exploitation on personal data. While UE has been extensively studied on images,
it is unclear how to craft effective UEs for time series data. In this work, we
introduce the first UE generation method to protect time series data from
unauthorized training by deep learning models. To this end, we propose a new
form of error-minimizing noise that can be \emph{selectively} applied to
specific segments of time series, rendering them unlearnable to DNN models
while remaining imperceptible to human observers. Through extensive experiments
on a wide range of time series datasets, we demonstrate that the proposed UE
generation method is effective in both classification and generation tasks. It
can protect time series data against unauthorized exploitation, while
preserving their utility for legitimate usage, thereby contributing to the
development of secure and trustworthy machine learning systems
Hufu: A Modality-Agnositc Watermarking System for Pre-Trained Transformers via Permutation Equivariance
With the blossom of deep learning models and services, it has become an
imperative concern to safeguard the valuable model parameters from being
stolen. Watermarking is considered an important tool for ownership
verification. However, current watermarking schemes are customized for
different models and tasks, hard to be integrated as an integrated intellectual
protection service. We propose Hufu, a modality-agnostic watermarking system
for pre-trained Transformer-based models, relying on the permutation
equivariance property of Transformers. Hufu embeds watermark by fine-tuning the
pre-trained model on a set of data samples specifically permuted, and the
embedded model essentially contains two sets of weights -- one for normal use
and the other for watermark extraction which is triggered on permuted inputs.
The permutation equivariance ensures minimal interference between these two
sets of model weights and thus high fidelity on downstream tasks. Since our
method only depends on the model itself, it is naturally modality-agnostic,
task-independent, and trigger-sample-free. Extensive experiments on the
state-of-the-art vision Transformers, BERT, and GPT2 have demonstrated Hufu's
superiority in meeting watermarking requirements including effectiveness,
efficiency, fidelity, and robustness, showing its great potential to be
deployed as a uniform ownership verification service for various Transformers
- …