887 research outputs found
CopyCAT: Taking Control of Neural Policies with Constant Attacks
We propose a new perspective on adversarial attacks against deep
reinforcement learning agents. Our main contribution is CopyCAT, a targeted
attack able to consistently lure an agent into following an outsider's policy.
It is pre-computed, therefore fast inferred, and could thus be usable in a
real-time scenario. We show its effectiveness on Atari 2600 games in the novel
read-only setting. In this setting, the adversary cannot directly modify the
agent's state -- its representation of the environment -- but can only attack
the agent's observation -- its perception of the environment. Directly
modifying the agent's state would require a write-access to the agent's inner
workings and we argue that this assumption is too strong in realistic settings.Comment: AAMAS 202
Investigating Circumferential Non-Uniformities in Throughflow Calculations using an Harmonic Reconstruction
peer reviewedThe flow field in a multistage turbomachine is very complex. It is 3D, unsteady and turbulent. Even if modern simulation tools can describe most of the flow features, the computation time needed and the extraction of useful information remain severe drawbacks to systematic use of URANS codes in a design procedure. In this context the throughflow simulation proved to be more convenient. Nevertheless the need for empiricism limits the potential of throughflow solvers.
As an alternative, Admaczyck (1984) proposed three averaging operators (ensemble, time and passage) that lead to the average-passage model, linking the unsteady turbulent flow field to a steady flow field in a typical blade passage. This model involves additional terms that respectively bring back the mean effect of turbulence, deterministic unsteadiness and aperiodicity on the mean periodic flow. These terms need to be modelled; it is the closure problem. Harmonic closure, which consists in solving a linearized perturbation system in the frequency domain, revealed to be an efficient method to approximate deterministic stresses (He and Ning, 1998, Stridh, 2005, Vilmin, 2006).
A fourth averaging can be performed, a circumferential averaging, giving rise to the throughflow model. Additional terms appear: the so-called circumferential stresses. It has been proven that these terms play an important role in the description of the flow (Jennions, 1986, Perrin, 1995), being at least as considerable as deterministic stresses. Introducing these terms in a throughflow simulation allows to reproduce the averaged 3D steady flow field (Simon, 2007). The aim of the present contribution is to prove that harmonic method can potentially be used to reconstruct circumferential stresses.
The importance of circumferential stresses in a throughflow simulation is first highlighted on a single stage low speed compressor testcase, for viscous and non-viscous flow fields. The second step is the characterization of the frequency spectrum of the circumferential perturbation field. Next are compared the stresses associated to a Fourier reconstruction of the perturbation field with the real ones. Finally the approximated circumferential stresses are injected into a throughflow simulation tool to analyse and demonstrate their capability to reproduce a 3D averaged flow field
Compressor and Turbine Blade Design by Optimization
Compressor and turbine blade design involves thermodynamical, aerodynamical and mechanical aspects, resulting in an important number of iterations. Inverse methods and optimization procedures help the designer in this long and eventually frustrating process. In this paper an optimization procedure is presented which solves two types of two-dimensional or quasi-three-dimensional problems: the inverse problem, for which a target velocity distribution is imposed, and a more global problem, in which the aerodynamic load is maximized
Compaction behavior of out-of-autoclave prepreg materials
The main challenges with composite parts manufacturing are related to the curing means, mainly autoclaves, the length of their cycles and their operating costs. In order to decrease this dependency, out of autoclave materials have been considered as a solution for high production rate parts such as spars, flaps, etc… However, most out-of-autoclave process do not possess the same maturity as their counterpart, especially concerning part quality1. Some pre-cure processes such as compaction and ply lay-up are usually less of a concern for autoclave manufacturing: the pressure applied during the cycle participates to reduce the potential defects (porosity caused by a poor quality lay-up, bad compaction, entrapped air or humidity…). For out-of-autoclave parts, those are crucial steps which may have many consequences on the final quality of the laminate2. In order to avoid this quality loss, those steps must be well understood
Primal Wasserstein Imitation Learning
Imitation Learning (IL) methods seek to match the behavior of an agent with
that of an expert. In the present work, we propose a new IL method based on a
conceptually simple algorithm: Primal Wasserstein Imitation Learning (PWIL),
which ties to the primal form of the Wasserstein distance between the expert
and the agent state-action distributions. We present a reward function which is
derived offline, as opposed to recent adversarial IL algorithms that learn a
reward function through interactions with the environment, and which requires
little fine-tuning. We show that we can recover expert behavior on a variety of
continuous control tasks of the MuJoCo domain in a sample efficient manner in
terms of agent interactions and of expert interactions with the environment.
Finally, we show that the behavior of the agent we train matches the behavior
of the expert with the Wasserstein distance, rather than the commonly used
proxy of performance.Comment: Published in International Conference on Learning Representations
(ICLR 2021
Vibrated polar disks: spontaneous motion, binary collisions, and collective dynamics
We study the spontaneous motion, binary collisions, and collective dynamics
of "polar disks", i.e. purpose-built particles which, when vibrated between two
horizontal plates, move coherently along a direction strongly correlated to
their intrinsic polarity. The motion of our particles, although nominally
three-dimensional and complicated, is well accounted for by a two-dimensional
persistent random walk. Their binary collisions are spatiotemporally extended
events during which multiple actual collisions happen, yielding a weak average
effective alignment. We show that this well-controlled, "dry active matter"
system can display collective motion with orientationally-ordered regions of
the order of the system size. We provide evidence of strong number density in
the most ordered regimes observed. These results are discussed in the light of
the limitations of our system, notably those due to the inevitable presence of
walls.Comment: 13 pages, 10 figures, 4 movie
Étude et réalisation d'un système auteur avec simulateur de systèmes d'équations différentielles intégré : profCOMP et simEDO
Offline Reinforcement Learning as Anti-Exploration
Offline Reinforcement Learning (RL) aims at learning an optimal control from
a fixed dataset, without interactions with the system. An agent in this setting
should avoid selecting actions whose consequences cannot be predicted from the
data. This is the converse of exploration in RL, which favors such actions. We
thus take inspiration from the literature on bonus-based exploration to design
a new offline RL agent. The core idea is to subtract a prediction-based
exploration bonus from the reward, instead of adding it for exploration. This
allows the policy to stay close to the support of the dataset. We connect this
approach to a more common regularization of the learned policy towards the
data. Instantiated with a bonus based on the prediction error of a variational
autoencoder, we show that our agent is competitive with the state of the art on
a set of continuous control locomotion and manipulation tasks
Offline Reinforcement Learning with Pseudometric Learning
Offline Reinforcement Learning methods seek to learn a policy from logged
transitions of an environment, without any interaction. In the presence of
function approximation, and under the assumption of limited coverage of the
state-action space of the environment, it is necessary to enforce the policy to
visit state-action pairs close to the support of logged transitions. In this
work, we propose an iterative procedure to learn a pseudometric (closely
related to bisimulation metrics) from logged transitions, and use it to define
this notion of closeness. We show its convergence and extend it to the function
approximation setting. We then use this pseudometric to define a new lookup
based bonus in an actor-critic algorithm: PLOFF. This bonus encourages the
actor to stay close, in terms of the defined pseudometric, to the support of
logged transitions. Finally, we evaluate the method on hand manipulation and
locomotion tasks.Comment: ICML 202
Model-based verification of a security protocol for conditional access to services
peer reviewedWe use the formal language LOTOS to specify and verify the robustness of the Equicrypt protocol under design in the European OKAPI project for conditional access to multimedia services. We state some desired security properties and formalize them. We describe a generic intruder process and its modelling, and show that some properties are falsified in the presence of this intruder. The diagnostic sequences can be used almost directly to exhibit the scenarios of possible attacks on the protocol. Finally, we propose an improvement of the protocol which satisfies our properties
- …