4,444 research outputs found

    Understanding when Dynamics-Invariant Data Augmentations Benefit Model-Free Reinforcement Learning Updates

    Full text link
    Recently, data augmentation (DA) has emerged as a method for leveraging domain knowledge to inexpensively generate additional data in reinforcement learning (RL) tasks, often yielding substantial improvements in data efficiency. While prior work has demonstrated the utility of incorporating augmented data directly into model-free RL updates, it is not well-understood when a particular DA strategy will improve data efficiency. In this paper, we seek to identify general aspects of DA responsible for observed learning improvements. Our study focuses on sparse-reward tasks with dynamics-invariant data augmentation functions, serving as an initial step towards a more general understanding of DA and its integration into RL training. Experimentally, we isolate three relevant aspects of DA: state-action coverage, reward density, and the number of augmented transitions generated per update (the augmented replay ratio). From our experiments, we draw two conclusions: (1) increasing state-action coverage often has a much greater impact on data efficiency than increasing reward density, and (2) decreasing the augmented replay ratio substantially improves data efficiency. In fact, certain tasks in our empirical study are solvable only when the replay ratio is sufficiently low

    On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling

    Full text link
    On-policy reinforcement learning (RL) algorithms perform policy updates using i.i.d. trajectories collected by the current policy. However, after observing only a finite number of trajectories, on-policy sampling may produce data that fails to match the expected on-policy data distribution. This sampling error leads to noisy updates and data inefficient on-policy learning. Recent work in the policy evaluation setting has shown that non-i.i.d., off-policy sampling can produce data with lower sampling error than on-policy sampling can produce. Motivated by this observation, we introduce an adaptive, off-policy sampling method to improve the data efficiency of on-policy policy gradient algorithms. Our method, Proximal Robust On-Policy Sampling (PROPS), reduces sampling error by collecting data with a behavior policy that increases the probability of sampling actions that are under-sampled with respect to the current policy. Rather than discarding data from old policies -- as is commonly done in on-policy algorithms -- PROPS uses data collection to adjust the distribution of previously collected data to be approximately on-policy. We empirically evaluate PROPS on both continuous-action MuJoCo benchmark tasks as well as discrete-action tasks and demonstrate that (1) PROPS decreases sampling error throughout training and (2) improves the data efficiency of on-policy policy gradient algorithms. Our work improves the RL community's understanding of a nuance in the on-policy vs off-policy dichotomy: on-policy learning requires on-policy data, not on-policy sampling

    A Comparison of Some Methods of Deriving the Instantaneous Unit Hydrograph

    Get PDF
    The geomorphological instantaneous unit hydrograph (IUH) proposed by Gupta et al. (1980) was compared with the IUH derived by commonly used time-area and Nash methods. This comparison was performed by analyzing the effective rainfall-direct runoff relationship for four large basins in Central Italy ranging in area from 934 to 4,147 km2. The Nash method was found to be the most accurate of the three methods. The geomorphological method, with only one parameter estimated in advance from the observed data, was found to be little less accurate than the Nash method which has two parameters determined from observations. Furthermore, if the geomorphological and Nash methods employed the same information represented by basin lag, then they produced similar accuracy provided the other Nash parameter, expressed by the product of peak flow and time to peak, was empirically assessed within a wide range of values. It was concluded that it was more appropriate to use the geomorphological method for ungaged basins and the Nash method for gaged basins

    Enhancing sustainability by improving plant salt tolerance through macro-and micro-algal biostimulants

    Get PDF
    Algal biomass, extracts, or derivatives have long been considered a valuable material to bring benefits to humans and cultivated plants. In the last decades, it became evident that algal formulations can induce multiple effects on crops (including an increase in biomass, yield, and quality), and that algal extracts contain a series of bioactive compounds and signaling molecules, in addition to mineral and organic nutrients. The need to reduce the non-renewable chemical input in agriculture has recently prompted an increase in the use of algal extracts as a plant biostimulant, also because of their ability to promote plant growth in suboptimal conditions such as saline environments is beneficial. In this article, we discuss some research areas that are critical for the implementation in agriculture of macro-and microalgae extracts as plant biostimulants. Specifically, we provide an overview of current knowledge and achievements about extraction methods, compositions, and action mechanisms of algal extracts, focusing on salt-stress tolerance. We also outline current limitations and possible research avenues. We conclude that the comparison and the integration of knowledge on the molecular and physiological response of plants to salt and to algal extracts should also guide the extraction procedures and application methods. The effects of algal biostimulants have been mainly investigated from an applied perspective, and the exploitation of different scientific disciplines is still much needed for the development of new sustainable strategies to increase crop tolerance to salt stress

    Finite Fracture Mechanics extension to dynamic loading scenarios

    Get PDF
    The coupled criterion of Finite Fracture Mechanics (FFM) has already been successfully applied to assess the brittle failure initiation in cracked and notched structures subjected to quasi-static loading conditions. The FFM originality lies in addressing failure onset through the simultaneous fulfilment of a stress requirement and the energy balance, both computed over a finite distance ahead of the stress raiser. Accordingly, this length results to be a structural parameter, thus able to interact with the geometry under investigation. This work aims at extending the FFM failure criterion to dynamic loadings. To this end, the general requisites of a proper dynamic failure criterion are first shortlisted. The novel Dynamic extension of FFM (DFFM) is then put forward assuming the existence of a material time interval that is related to the coalescence period of microcracks upon macroscopic failure. On this basis, the DFFM model is investigated in case a one-to-one relation between the external solicitation and both the dynamic stress field and energy release rate holds true. Under such a condition, the DFFM is also validated against suitable experimental data on rock materials from the literature and proven to properly catch the increase of the failure load as the loading rate rises, thus proving to be a novel technique suitable for modelling the rate dependence of failure initiation in brittle and quasi-brittle materials
    • …
    corecore