386 research outputs found

    Autonomous Driving: A Multi-Objective Deep Reinforcement Learning Approach

    Get PDF
    Autonomous driving is a challenging domain that entails multiple aspects: a vehicle should be able to drive to its destination as fast as possible while avoiding collision, obeying traffic rules and ensuring the comfort of passengers. It's representative of complex reinforcement learning tasks humans encounter in real life. The aim of this thesis is to explore the effectiveness of multi-objective reinforcement learning for such tasks characterized by autonomous driving. In particular, it shows that: 1. Multi-objective reinforcement learning is effective at overcoming some of the difficulties faced by scalar-reward reinforcement learning, and a multi-objective DQN agent based on a variant of thresholded lexicographic Q-learning is successfully trained to drive on multi-lane roads and intersections, yielding and changing lanes according to traffic rules. 2. Data efficiency of (multi-objective) reinforcement learning can be significantly improved by exploiting the factored structure of a task. Specifically, factored Q functions learned on the factored state space can be used as features to the original Q function to speed up learning. 3. Inclusion of history-dependent policies enables an intuitive exact algorithm for multi-objective reinforcement learning with thresholded lexicographic order

    The ortho-to-para ratio of interstellar NH2_2: Quasi-classical trajectory calculations and new simulations

    Full text link
    Based on recent HerschelHerschel results, the ortho-to-para ratio (OPR) of NH2_2 has been measured towards the following high-mass star-forming regions: W31C (G10.6-0.4), W49N (G43.2-0.1), W51 (G49.5-0.4), and G34.3+0.1. The OPR at thermal equilibrium ranges from the statistical limit of three at high temperatures to infinity as the temperature tends toward zero, unlike the case of H2_{2}. Depending on the position observed along the lines-of-sight, the OPR was found to lie either slightly below the high temperature limit of three (in the range 2.22.92.2-2.9) or above this limit (3.5\sim3.5, 4.2\gtrsim 4.2, and 5.0\gtrsim 5.0). In low temperature interstellar gas, where the H2_{2} is para-enriched, our nearly pure gas-phase astrochemical models with nuclear-spin chemistry can account for anomalously low observed NH2_2-OPR values. We have tentatively explained OPR values larger than three by assuming that spin thermalization of NH2_2 can proceed at least partially by H-atom exchange collisions with atomic hydrogen, thus increasing the OPR with decreasing temperature. In this paper, we present quasi-classical trajectory calculations of the H-exchange reaction NH2_2 + H, which show the reaction to proceed without a barrier, confirming that the H-exchange will be efficient in the temperature range of interest. With the inclusion of this process, our models suggest both that OPR values below three arise in regions with temperatures 2025\gtrsim20-25~K, depending on time, and values above three but lower than the thermal limit arise at still lower temperatures.Comment: 12 pages, 12 figures. Accepted for publication in A&

    Lyapunov exponent and almost sure asymptotic stability of a stochastic SIRS model

    Get PDF
    Epidemiological models with bilinear incidence rate usually have an asymptotically stable trivial equilibrium corresponding to the disease-free state, or an asymptotically stable nontrivial equilibrium (i. e. interior equilibrium) corresponding to the endemic state. In this paper, we consider an epidemiological model, which is a SIRS (susceptible-infected-removed-susceptible) model in uenced by random perturbations. We prove that the solutions of the system are positive for all positive initial conditions and that the solutions are global, that is, there is no finite explosion time. We present necessary and suficient condition for the almost sure asymptotic stability of the steady state of the stochastic system

    The cyclicity of the period annulus of a reversible quadratic system

    Get PDF
    We prove that perturbing the periodic annulus of the reversible quadratic polynomial differential system x˙ = y + ax2, y˙ = −x with a ≠ 0 inside the class of all quadratic polynomial differential systems we can obtain at most two limit cycle, including their multiplicities. Since the first integral of the unperturbed system contains an exponential function, the traditional methods can not be applied, except in [6] a computer-assisted method was used. In this paper we provide a method for studying the problem. This is also the first purely mathematical proof of the conjecture formulated by F. Dumortier and R. Roussarie in [5] for q ≤ 2. The method may be used in other problems
    corecore