27 research outputs found
Using feedback in adaptive and user-dependent one-step decision making
International audienceSeveral machine learning approaches are used to train systems and agents while exploiting users' feedback over the given service. For example, different semi-supervised approaches employ this kind of information in the learning process to guide the agent to a more adaptive and possibly person-alized behavior. Whether for recommendation systems , companion robots or smart home assistance, the trained agent must face the challenges of adapting to different users (with different profiles, preferences , etc.), coping with dynamic environments (dynamic preferences, etc.) and scaling up with a minimal number of training examples. We are interested in this paper in one-step decision making for adaptive and user-dependent services using users' feedback. We focus on the quality of such services while dealing with ambiguities (noise) in the received feedback. We describe our problem and we concentrate on presenting a state of the art of possible methods that can be applied. We detail two algorithms that are based on existing approaches. We present comparative results by showing scaling and convergence analysis with clean and noisy simulated data
Neural Polarizer: A Lightweight and Effective Backdoor Defense via Purifying Poisoned Features
Recent studies have demonstrated the susceptibility of deep neural networks
to backdoor attacks. Given a backdoored model, its prediction of a poisoned
sample with trigger will be dominated by the trigger information, though
trigger information and benign information coexist. Inspired by the mechanism
of the optical polarizer that a polarizer could pass light waves with
particular polarizations while filtering light waves with other polarizations,
we propose a novel backdoor defense method by inserting a learnable neural
polarizer into the backdoored model as an intermediate layer, in order to
purify the poisoned sample via filtering trigger information while maintaining
benign information. The neural polarizer is instantiated as one lightweight
linear transformation layer, which is learned through solving a well designed
bi-level optimization problem, based on a limited clean dataset. Compared to
other fine-tuning-based defense methods which often adjust all parameters of
the backdoored model, the proposed method only needs to learn one additional
layer, such that it is more efficient and requires less clean data. Extensive
experiments demonstrate the effectiveness and efficiency of our method in
removing backdoors across various neural network architectures and datasets,
especially in the case of very limited clean data
Limitations of the Empirical Fisher Approximation for Natural Gradient Descent
Natural gradient descent, which preconditions a gradient descent update with
the Fisher information matrix of the underlying statistical model, is a way to
capture partial second-order information. Several highly visible works have
advocated an approximation known as the empirical Fisher, drawing connections
between approximate second-order methods and heuristics like Adam. We dispute
this argument by showing that the empirical Fisher---unlike the Fisher---does
not generally capture second-order information. We further argue that the
conditions under which the empirical Fisher approaches the Fisher (and the
Hessian) are unlikely to be met in practice, and that, even on simple
optimization problems, the pathologies of the empirical Fisher can have
undesirable effects.Comment: V3: Minor corrections (typographic errors
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
We propose to directly map raw visual observations and text input to actions
for instruction execution. While existing approaches assume access to
structured environment representations or use a pipeline of separately trained
models, we learn a single model to jointly reason about linguistic and visual
input. We use reinforcement learning in a contextual bandit setting to train a
neural network agent. To guide the agent's exploration, we use reward shaping
with different forms of supervision. Our approach does not require intermediate
representations, planning procedures, or training different models. We evaluate
in a simulated environment, and show significant improvements over supervised
learning and common reinforcement learning variants.Comment: In Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP), 201