615 research outputs found

    Path integral policy improvement with differential dynamic programming

    Get PDF
    Path Integral Policy Improvement with Covariance Matrix Adaptation (PI2-CMA) is a step-based model free reinforcement learning approach that combines statistical estimation techniques with fundamental results from Stochastic Optimal Control. Basically, a policy distribution is improved iteratively using reward weighted averaging of the corresponding rollouts. It was assumed that PI2-CMA somehow exploited gradient information that was contained by the reward weighted statistics. To our knowledge we are the first to expose the principle of this gradient extraction rigorously. Our findings reveal that PI2-CMA essentially obtains gradient information similar to the forward and backward passes in the Differential Dynamic Programming (DDP) method. It is then straightforward to extend the analogy with DDP by introducing a feedback term in the policy update. This suggests a novel algorithm which we coin Path Integral Policy Improvement with Differential Dynamic Programming (PI2-DDP). The resulting algorithm is similar to the previously proposed Sampled Differential Dynamic Programming (SaDDP) but we derive the method independently as a generalization of the framework of PI2-CMA. Our derivations suggest to implement some small variations to SaDDP so to increase performance. We validated our claims on a robot trajectory learning task

    On Entropy Regularized Path Integral Control for Trajectory Optimization

    Get PDF
    In this article we present a generalised view on Path Integral Control (PIC) methods. PIC refers to a particular class of policy search methods that are closely tied to the setting of Linearly Solvable Optimal Control (LSOC), a restricted subclass of nonlinear Stochastic Optimal Control (SOC) problems. This class is unique in the sense that it can be solved explicitly to yield a formal optimal state trajectory distribution. In this contribution we first review the PIC theory and discuss related algorithms tailored to policy search in general. We are able to identify a generic design strategy that relies on the existence of an optimal state trajectory distribution and finds a parametric policy by minimizing the cross entropy between the optimal and a state trajectory distribution parametrized through its policy. Inspired by this observation we then aim to formulate a SOC problem that shares traits with the LSOC setting yet that covers a less restrictive class of problem formulations. We refer to this SOC problem as Entropy Regularized Trajectory Optimization. The problem is closely related to the Entropy Regularized Stochastic Optimal Control setting which is lately often addressed by the Reinforcement Learning (RL) community. We analyse the theoretical convergence behaviour of the theoretical state trajectory distribution sequence and draw connections with stochastic search methods tailored to classic optimization problems. Finally we derive explicit updates and compare the implied Entropy Regularized PIC with earlier work in the context of both PIC and RL for derivative-free trajectory optimization

    On entropy regularized Path Integral Control for trajectory optimization

    Get PDF
    In this article, we present a generalized view on Path Integral Control (PIC) methods. PIC refers to a particular class of policy search methods that are closely tied to the setting of Linearly Solvable Optimal Control (LSOC), a restricted subclass of nonlinear Stochastic Optimal Control (SOC) problems. This class is unique in the sense that it can be solved explicitly yielding a formal optimal state trajectory distribution. In this contribution, we first review the PIC theory and discuss related algorithms tailored to policy search in general. We are able to identify a generic design strategy that relies on the existence of an optimal state trajectory distribution and finds a parametric policy by minimizing the cross-entropy between the optimal and a state trajectory distribution parametrized by a parametric stochastic policy. Inspired by this observation, we then aim to formulate a SOC problem that shares traits with the LSOC setting yet that covers a less restrictive class of problem formulations. We refer to this SOC problem as Entropy Regularized Trajectory Optimization. The problem is closely related to the Entropy Regularized Stochastic Optimal Control setting which is often addressed lately by the Reinforcement Learning (RL) community. We analyze the theoretical convergence behavior of the theoretical state trajectory distribution sequence and draw connections with stochastic search methods tailored to classic optimization problems. Finally we derive explicit updates and compare the implied Entropy Regularized PIC with earlier work in the context of both PIC and RL for derivative-free trajectory optimization

    Modification de surface de particules de talc par de la silice nanométrique hydrophobe (par enrobage à sec) : influence sur leurs propriétés physico-chimiques et leur dispersibilité dans une phase aqueuse

    Get PDF
    Les solides divisés, ou poudres, sont très utilisés dans de nombreux secteurs industriels. Dans l'industrie pharmaceutique les poudres peuvent être granulées et/ou comprimées pour la mise en oeuvre de formes galéniques solides et elles peuvent également être dispersées pour formuler des suspensions buvables par exemple. En cosmétique, le mélange des poudres ou la mise en suspension sont des étapes cruciales pour certains produits. Le secteur agro-alimentaire génère et formule de nombreux produits sous la forme de solides divisés. Dans le domaine des peintures, la dispersion des pigments dans la phase liquide est primordiale. Dans ces différents cas, les interactions interfaciales solide-solide et solide-liquide régissent très fortement les différents procédés industriels. C'est le cas de la dispersion d'une poudre dans un liquide qui est fortement influencée par l'énergie interfaciale entre la poudre et le liquide, composante directement liée à la tension de surface du liquide et l'énergie de surface de la poudre par l'intermédiaire de l'angle de contact (équation de Young). Dans cette étude, la dispersion de talc, hydrophobe, largement utilisé dans l'industrie notamment papetière, est étudiée dans une phase aqueuse. Afin de modifier les interactions solide-liquide (et solide-solide), la surface du talc est modifiée par enrobage à sec avec de la silice nanométrique hydrophobe (Aérosil® R972). Pour cela, deux dispositifs d'enrobage ont été utilisés : un mélangeur à haut cisaillement (Cyclomix®) et un broyeur à billes planétaire (BBP). L'objectif de ce travail est d'étudier l'influence des modifications des interactions solide-liquide sur la dispersibilité des particules. Les modifications de surface des particules enrobées ont été analysées par la méthode de la goutte posée, par montée capillaire, par chromatographie gazeuse inverse (CGI) et par sorption dynamique de vapeur (DVS). Ces modifications ont ensuite été mises en parallèle avec la vitesse de dispersion des particules dans l'eau. Des relations ont été établies entre la vitesse de dispersion des particules et le travail d'adhésion particules / eau. En plus des interactions solide-liquide, le passage dans les appareils d'enrobage modifie d'autres paramètres du talc comme sa densité, sa rhéologie, sa flottabilité etc., autant de paramètres modifiant également sa dispersibilité. Enfin, la puissance d'agitation de la phase aqueuse est un paramètre également étudié. ABSSTRACT : Powder technology concerns the phenomena encountered in industrial processes involving divided solids. Such processes are found in the pharmaceutical, cosmetics, and food industries (where many products are based on powders; tablets, granules, suspensions), and in the paint industry where solid pigments are dispersed in liquids. In particular this thesis examines the dispersion of powders in liquids which depends strongly on the interactions between solids and liquids and can be assessed by the solid-liquid interfacial energy. This energy is directly linked to the surface tension of the liquid and the surface energy of the solid through the contact angle in the Young equation. The contact angle is used to calculate the work of adhesion between a powder and a liquid and characterises the wettability of the powder. Talc powder, mainly used in paper industry, is used in this study. The surface of talc powder has been modified by dry coating with hydrophobic nano-silica (Aerosil® R972) using two different devices: a high shear mixer (Cyclomix®), and a planetary ball mill. The aim being to observe the effect of surface modification on the wettability and surface energy of coated talc particles and the effect of these surface energy changes on the dispersability of particles. Different parameters have been considered: the concentration of hydrophobic silica, the duration of processing without silica and with 3 % silica. Surface modifications have been assessed using the sessile drop method, capillary rise, inverse gas chromatography (IGC) and dynamic vapour sorption (DVS). This study shows that not only the surface wettability is modified by the treatment of particles in the coating devices, but also their physical characteristics such as their bulk density, their rheology, their buoyancy and thus their dispersibility. Finally, a study is made of the effect the stirring power of the aqueous phase on the dispersibality of talc powders coated with different concentrations hydrophobic silica

    Modélisation et analyse de l'hétérogénéité tumorale lors de résistance aux traitements : cas des métastases hépatiques de GIST

    Get PDF
    This thesis deals with tumor heterogeneity analysis and modeling during treatments resistances. A patient-dependent PDEs model, that takes into account two kinds of treatments, is presented. It qualitatively and quantitatively reproduces the different stage during the tumor growth undergoing treatments. In order to overcome a numerical instability linked to the type of modeling, a new numerical scheme is built : the twin-WENO5. Then,an image synthesis method is developed to enable a better comparison between the numerical results and the clinical data. Finally, a robust criteria that quantifies the tumor heterogeneity from the clinical data and from the synthesis images, is built.Cette thèse présente les travaux menés sur l’analyse et la modélisation de l’hétérogénéité tumorale lors de résistance aux traitements. Nous présentons ici un modèle EDP, dépendant de chaque patient, et prenant en compte deux types de traitements différents. Il reproduit qualitativement et quantitativement les différentes étapes de la croissance d’une tumeur soumise à ces traitements. Afin de pallier une instabilité numérique liée à ce type de modélisation, un nouveau schéma numérique est construit : le twin-WENO5.Nous développons ensuite une méthode de synthèse d’images scanners de sorte à rendre meilleure la comparaison entre les résultats numériques et les données cliniques. Enfin un critère robuste permettant de quantifier l’hétérogénéité à la fois des images cliniques et des images de synthèse, est construit

    Optimizing state trajectories using surrogate models with application on a mechatronic example

    Get PDF
    The classic design- and simulation methodologies, that are constituting today’s engineer main tools, fall behind with industry’s ever increasing complexity. The strive for technological advancement heralds new performance requirements and optimality remains no longer a concern limited to regime operation. Since the corresponding dynamic optimization problems incorporate accurate system models, the current techniques are plagued by the high computational weight these multi-disciplinary and highly dimensional system models bear with them. This imbalance advocates for the need to adapt the existing approaches. In this study we propose an algorithmic framework as an extension of the direct transcription method, which has already proven its usefulness concerning this matter. It is suggested to construct a surrogate model of the derivative function that is iteratively refined in a region of interest. Thereafter the method will be illustrated on an academic yet nonlinear example

    Polynomial chaos explicit solution of the optimal control problem in model predictive control

    Get PDF
    A difficulty still hindering the widespread application of Model Predictive Control (MPC) methodologies, remains the computational burden that is related to solving the associated Optimal Control (OC) problem for every control period. In contrast to numerous approximation techniques that pursue acceleration of the online optimization procedure, relatively few work has been devoted towards shifting the optimization effort to a precomputational phase, especially for nonlinear system dynamics. Recently, interest revived in the theory of general Polynomial Chaos (gPC) in order to appraise the influence of variable parameters on dynamic system behaviour and proved to yield reliable results. This article establishes an explicit solution of the multi-parametric Nonlinear Problem (mp-NLP) based on the theoretical framework of gPC, which enabled a polynomial approximated nonlinear feedback law formulation. This resulted in real-time computations allowing for real-time MPC, with corresponding control frequencies up to 2 kHz

    Polynomial Chaos reformulation in Nonlinear Stochastic Optimal Control with application on a drivetrain subject to bifurcation phenomena

    Full text link
    This paper discusses a method enabling optimal control of nonlinear systems that are subject to parametric uncertainty. A stochastic optimal tracking problem is formulated that can be expressed in function of the first two stochastic moments of the state. The proposed formulation allows to penalize system performance and system robustness independently. The use of polynomial chaos expansions is investigated to arrive at a computationally tractable formulation expressing the stochastic moments in function of the polynomial expansion coefficients rigorously. It is then demonstrated how the stochastic optimal control problem can be reformulated as a deterministic optimal control problem in function of these coefficients. The proposed method is applied to find a robust control input for the start-up of an eccentrically loaded drive train that is inherently prone to bifurcation behaviour. A reference trajectory is chosen to deliberately provoke a bifurcation. The proposed framework is able to avoid the bifurcation behaviour regardlessly.Comment: 7 pages; 5 figures; ICSTCC 2018, 22nd International Conference on System Theory, Control and Computing. 10 - 12 October. Sinaia - Romani
    • …
    corecore