887 research outputs found

    CopyCAT: Taking Control of Neural Policies with Constant Attacks

    Get PDF
    We propose a new perspective on adversarial attacks against deep reinforcement learning agents. Our main contribution is CopyCAT, a targeted attack able to consistently lure an agent into following an outsider's policy. It is pre-computed, therefore fast inferred, and could thus be usable in a real-time scenario. We show its effectiveness on Atari 2600 games in the novel read-only setting. In this setting, the adversary cannot directly modify the agent's state -- its representation of the environment -- but can only attack the agent's observation -- its perception of the environment. Directly modifying the agent's state would require a write-access to the agent's inner workings and we argue that this assumption is too strong in realistic settings.Comment: AAMAS 202

    Investigating Circumferential Non-Uniformities in Throughflow Calculations using an Harmonic Reconstruction

    Full text link
    peer reviewedThe flow field in a multistage turbomachine is very complex. It is 3D, unsteady and turbulent. Even if modern simulation tools can describe most of the flow features, the computation time needed and the extraction of useful information remain severe drawbacks to systematic use of URANS codes in a design procedure. In this context the throughflow simulation proved to be more convenient. Nevertheless the need for empiricism limits the potential of throughflow solvers. As an alternative, Admaczyck (1984) proposed three averaging operators (ensemble, time and passage) that lead to the average-passage model, linking the unsteady turbulent flow field to a steady flow field in a typical blade passage. This model involves additional terms that respectively bring back the mean effect of turbulence, deterministic unsteadiness and aperiodicity on the mean periodic flow. These terms need to be modelled; it is the closure problem. Harmonic closure, which consists in solving a linearized perturbation system in the frequency domain, revealed to be an efficient method to approximate deterministic stresses (He and Ning, 1998, Stridh, 2005, Vilmin, 2006). A fourth averaging can be performed, a circumferential averaging, giving rise to the throughflow model. Additional terms appear: the so-called circumferential stresses. It has been proven that these terms play an important role in the description of the flow (Jennions, 1986, Perrin, 1995), being at least as considerable as deterministic stresses. Introducing these terms in a throughflow simulation allows to reproduce the averaged 3D steady flow field (Simon, 2007). The aim of the present contribution is to prove that harmonic method can potentially be used to reconstruct circumferential stresses. The importance of circumferential stresses in a throughflow simulation is first highlighted on a single stage low speed compressor testcase, for viscous and non-viscous flow fields. The second step is the characterization of the frequency spectrum of the circumferential perturbation field. Next are compared the stresses associated to a Fourier reconstruction of the perturbation field with the real ones. Finally the approximated circumferential stresses are injected into a throughflow simulation tool to analyse and demonstrate their capability to reproduce a 3D averaged flow field

    Compressor and Turbine Blade Design by Optimization

    Full text link
    Compressor and turbine blade design involves thermodynamical, aerodynamical and mechanical aspects, resulting in an important number of iterations. Inverse methods and optimization procedures help the designer in this long and eventually frustrating process. In this paper an optimization procedure is presented which solves two types of two-dimensional or quasi-three-dimensional problems: the inverse problem, for which a target velocity distribution is imposed, and a more global problem, in which the aerodynamic load is maximized

    Compaction behavior of out-of-autoclave prepreg materials

    Get PDF
    The main challenges with composite parts manufacturing are related to the curing means, mainly autoclaves, the length of their cycles and their operating costs. In order to decrease this dependency, out of autoclave materials have been considered as a solution for high production rate parts such as spars, flaps, etc… However, most out-of-autoclave process do not possess the same maturity as their counterpart, especially concerning part quality1. Some pre-cure processes such as compaction and ply lay-up are usually less of a concern for autoclave manufacturing: the pressure applied during the cycle participates to reduce the potential defects (porosity caused by a poor quality lay-up, bad compaction, entrapped air or humidity…). For out-of-autoclave parts, those are crucial steps which may have many consequences on the final quality of the laminate2. In order to avoid this quality loss, those steps must be well understood

    Primal Wasserstein Imitation Learning

    Get PDF
    Imitation Learning (IL) methods seek to match the behavior of an agent with that of an expert. In the present work, we propose a new IL method based on a conceptually simple algorithm: Primal Wasserstein Imitation Learning (PWIL), which ties to the primal form of the Wasserstein distance between the expert and the agent state-action distributions. We present a reward function which is derived offline, as opposed to recent adversarial IL algorithms that learn a reward function through interactions with the environment, and which requires little fine-tuning. We show that we can recover expert behavior on a variety of continuous control tasks of the MuJoCo domain in a sample efficient manner in terms of agent interactions and of expert interactions with the environment. Finally, we show that the behavior of the agent we train matches the behavior of the expert with the Wasserstein distance, rather than the commonly used proxy of performance.Comment: Published in International Conference on Learning Representations (ICLR 2021

    Vibrated polar disks: spontaneous motion, binary collisions, and collective dynamics

    Full text link
    We study the spontaneous motion, binary collisions, and collective dynamics of "polar disks", i.e. purpose-built particles which, when vibrated between two horizontal plates, move coherently along a direction strongly correlated to their intrinsic polarity. The motion of our particles, although nominally three-dimensional and complicated, is well accounted for by a two-dimensional persistent random walk. Their binary collisions are spatiotemporally extended events during which multiple actual collisions happen, yielding a weak average effective alignment. We show that this well-controlled, "dry active matter" system can display collective motion with orientationally-ordered regions of the order of the system size. We provide evidence of strong number density in the most ordered regimes observed. These results are discussed in the light of the limitations of our system, notably those due to the inevitable presence of walls.Comment: 13 pages, 10 figures, 4 movie

    Offline Reinforcement Learning as Anti-Exploration

    Full text link
    Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, without interactions with the system. An agent in this setting should avoid selecting actions whose consequences cannot be predicted from the data. This is the converse of exploration in RL, which favors such actions. We thus take inspiration from the literature on bonus-based exploration to design a new offline RL agent. The core idea is to subtract a prediction-based exploration bonus from the reward, instead of adding it for exploration. This allows the policy to stay close to the support of the dataset. We connect this approach to a more common regularization of the learned policy towards the data. Instantiated with a bonus based on the prediction error of a variational autoencoder, we show that our agent is competitive with the state of the art on a set of continuous control locomotion and manipulation tasks

    Offline Reinforcement Learning with Pseudometric Learning

    Full text link
    Offline Reinforcement Learning methods seek to learn a policy from logged transitions of an environment, without any interaction. In the presence of function approximation, and under the assumption of limited coverage of the state-action space of the environment, it is necessary to enforce the policy to visit state-action pairs close to the support of logged transitions. In this work, we propose an iterative procedure to learn a pseudometric (closely related to bisimulation metrics) from logged transitions, and use it to define this notion of closeness. We show its convergence and extend it to the function approximation setting. We then use this pseudometric to define a new lookup based bonus in an actor-critic algorithm: PLOFF. This bonus encourages the actor to stay close, in terms of the defined pseudometric, to the support of logged transitions. Finally, we evaluate the method on hand manipulation and locomotion tasks.Comment: ICML 202

    Model-based verification of a security protocol for conditional access to services

    Full text link
    peer reviewedWe use the formal language LOTOS to specify and verify the robustness of the Equicrypt protocol under design in the European OKAPI project for conditional access to multimedia services. We state some desired security properties and formalize them. We describe a generic intruder process and its modelling, and show that some properties are falsified in the presence of this intruder. The diagnostic sequences can be used almost directly to exhibit the scenarios of possible attacks on the protocol. Finally, we propose an improvement of the protocol which satisfies our properties
    corecore