27 research outputs found

    Using feedback in adaptive and user-dependent one-step decision making

    Get PDF
    International audienceSeveral machine learning approaches are used to train systems and agents while exploiting users' feedback over the given service. For example, different semi-supervised approaches employ this kind of information in the learning process to guide the agent to a more adaptive and possibly person-alized behavior. Whether for recommendation systems , companion robots or smart home assistance, the trained agent must face the challenges of adapting to different users (with different profiles, preferences , etc.), coping with dynamic environments (dynamic preferences, etc.) and scaling up with a minimal number of training examples. We are interested in this paper in one-step decision making for adaptive and user-dependent services using users' feedback. We focus on the quality of such services while dealing with ambiguities (noise) in the received feedback. We describe our problem and we concentrate on presenting a state of the art of possible methods that can be applied. We detail two algorithms that are based on existing approaches. We present comparative results by showing scaling and convergence analysis with clean and noisy simulated data

    Neural Polarizer: A Lightweight and Effective Backdoor Defense via Purifying Poisoned Features

    Full text link
    Recent studies have demonstrated the susceptibility of deep neural networks to backdoor attacks. Given a backdoored model, its prediction of a poisoned sample with trigger will be dominated by the trigger information, though trigger information and benign information coexist. Inspired by the mechanism of the optical polarizer that a polarizer could pass light waves with particular polarizations while filtering light waves with other polarizations, we propose a novel backdoor defense method by inserting a learnable neural polarizer into the backdoored model as an intermediate layer, in order to purify the poisoned sample via filtering trigger information while maintaining benign information. The neural polarizer is instantiated as one lightweight linear transformation layer, which is learned through solving a well designed bi-level optimization problem, based on a limited clean dataset. Compared to other fine-tuning-based defense methods which often adjust all parameters of the backdoored model, the proposed method only needs to learn one additional layer, such that it is more efficient and requires less clean data. Extensive experiments demonstrate the effectiveness and efficiency of our method in removing backdoors across various neural network architectures and datasets, especially in the case of very limited clean data

    Limitations of the Empirical Fisher Approximation for Natural Gradient Descent

    Full text link
    Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We dispute this argument by showing that the empirical Fisher---unlike the Fisher---does not generally capture second-order information. We further argue that the conditions under which the empirical Fisher approaches the Fisher (and the Hessian) are unlikely to be met in practice, and that, even on simple optimization problems, the pathologies of the empirical Fisher can have undesirable effects.Comment: V3: Minor corrections (typographic errors

    Mapping Instructions and Visual Observations to Actions with Reinforcement Learning

    Full text link
    We propose to directly map raw visual observations and text input to actions for instruction execution. While existing approaches assume access to structured environment representations or use a pipeline of separately trained models, we learn a single model to jointly reason about linguistic and visual input. We use reinforcement learning in a contextual bandit setting to train a neural network agent. To guide the agent's exploration, we use reward shaping with different forms of supervision. Our approach does not require intermediate representations, planning procedures, or training different models. We evaluate in a simulated environment, and show significant improvements over supervised learning and common reinforcement learning variants.Comment: In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 201
    corecore