204,192 research outputs found

    Smoothing Policies and Safe Policy Gradients

    Full text link
    Policy gradient algorithms are among the best candidates for the much anticipated application of reinforcement learning to real-world control tasks, such as the ones arising in robotics. However, the trial-and-error nature of these methods introduces safety issues whenever the learning phase itself must be performed on a physical system. In this paper, we address a specific safety formulation, where danger is encoded in the reward signal and the learning agent is constrained to never worsen its performance. By studying actor-only policy gradient from a stochastic optimization perspective, we establish improvement guarantees for a wide class of parametric policies, generalizing existing results on Gaussian policies. This, together with novel upper bounds on the variance of policy gradient estimators, allows to identify those meta-parameter schedules that guarantee monotonic improvement with high probability. The two key meta-parameters are the step size of the parameter updates and the batch size of the gradient estimators. By a joint, adaptive selection of these meta-parameters, we obtain a safe policy gradient algorithm

    The geometric Cauchy problem for developable submanifolds

    Full text link
    Given a smooth distribution D\mathscr{D} of mm-dimensional planes along a smooth regular curve γ\gamma in Rm+n\mathbb{R}^{m+n}, we consider the following problem: To find an mm-dimensional developable submanifold of Rm+n\mathbb{R}^{m+n}, that is, a ruled submanifold with constant tangent space along the rulings, such that its tangent bundle along γ\gamma coincides with D\mathscr{D}. In particular, we give sufficient conditions for the local well-posedness of the problem, together with a parametric description of the solution.Comment: 15 page

    Renormalisation of gauge theories on general anisotropic lattices and high-energy scattering in QCD

    Get PDF
    We study the renormalisation of SU(Nc)SU(N_c) gauge theories on general anisotropic lattices, to one-loop order in perturbation theory, employing the background field method. The results are then applied in the context of two different approaches to hadronic high-energy scattering. In the context of the Euclidean nonperturbative approach to soft high-energy scattering based on Wilson loops, we refine the nonperturbative justification of the analytic continuation relations of the relevant Wilson-loop correlators, required to obtain physical results. In the context of longitudinally-rescaled actions, we study the consequences of one-loop corrections on the relation between the SU(Nc)SU(N_c) gauge theory and its effective description in terms of two-dimensional principal chiral models.Comment: Revised version with minor corrections, matches published version; 40 pages, 4 figure

    An Energy-based Approach to Ensure the Stability of Learned Dynamical Systems

    Full text link
    Non-linear dynamical systems represent a compact, flexible, and robust tool for reactive motion generation. The effectiveness of dynamical systems relies on their ability to accurately represent stable motions. Several approaches have been proposed to learn stable and accurate motions from demonstration. Some approaches work by separating accuracy and stability into two learning problems, which increases the number of open parameters and the overall training time. Alternative solutions exploit single-step learning but restrict the applicability to one regression technique. This paper presents a single-step approach to learn stable and accurate motions that work with any regression technique. The approach makes energy considerations on the learned dynamics to stabilize the system at run-time while introducing small deviations from the demonstrated motion. Since the initial value of the energy injected into the system affects the reproduction accuracy, it is estimated from training data using an efficient procedure. Experiments on a real robot and a comparison on a public benchmark shows the effectiveness of the proposed approach.Comment: Accepted at the International Conference on Robotics and Automation 202

    Stochastic thermodynamics of entropic transport

    Full text link
    Seifert derived an exact fluctuation relation for diffusion processes using the concept of "stochastic system entropy". In this note we extend his formalism to entropic transport. We introduce the notion of relative stochastic entropy, or "relative surprisal", and use it to generalize Seifert's system/medium decomposition of the total entropy. This result allows to apply the concepts of stochastic thermodynamics to diffusion processes in confined geometries, such as ion channels, cellular pores or nanoporous materials. It can be seen as the equivalent for diffusion processes of Esposito and Schaller's generalized fluctuation theorem for "Maxwell demon feedbacks".Comment: 3 pages, wording change

    Preserving Co-Location Privacy in Geo-Social Networks

    Full text link
    The number of people on social networks has grown exponentially. Users share very large volumes of personal informations and content every days. This content could be tagged with geo-spatial and temporal coordinates that may be considered sensitive for some users. While there is clearly a demand for users to share this information with each other, there is also substantial demand for greater control over the conditions under which their information is shared. Content published in a geo-aware social networks (GeoSN) often involves multiple users and it is often accessible to multiple users, without the publisher being aware of the privacy preferences of those users. This makes difficult for GeoSN users to control which information about them is available and to whom it is available. Thus, the lack of means to protect users privacy scares people bothered about privacy issues. This paper addresses a particular privacy threats that occur in GeoSNs: the Co-location privacy threat. It concerns the availability of information about the presence of multiple users in a same locations at given times, against their will. The challenge addressed is that of supporting privacy while still enabling useful services.Comment: 10 pages, 5 figure

    The best paying jobs of the future

    Get PDF
    corecore