19 research outputs found

    Task-Oriented Query Reformulation with Reinforcement Learning

    Full text link
    Search engines play an important role in our everyday lives by assisting us in finding the information we need. When we input a complex query, however, results are often far from satisfactory. In this work, we introduce a query reformulation system based on a neural network that rewrites a query to maximize the number of relevant documents returned. We train this neural network with reinforcement learning. The actions correspond to selecting terms to build a reformulated query, and the reward is the document recall. We evaluate our approach on three datasets against strong baselines and show a relative improvement of 5-20% in terms of recall. Furthermore, we present a simple method to estimate a conservative upper-bound performance of a model in a particular environment and verify that there is still large room for improvements.Comment: EMNLP 201

    Direct policy search reinforcement learning based on particle filtering

    No full text
    We reveal a link between particle filtering methods and direct policy search reinforcement learning, and propose a novel reinforcement learning algorithm, based heavily on ideas borrowed from particle filters. A major advantage of the proposed algorithm is its ability to perform global search in policy space and thus find the globally optimal policy. We validate the approach on one- and two-dimensional problems with multiple optima, and compare its performance to a global random sampling method, and a state-of-the-art ExpectationMaximization based reinforcement learning algorithm

    Improving the Protection of Aquatic Ecosystems by Dynamically Constraining Reservoir Operation Via Direct Policy Conditioning

    Get PDF
    Water management problems generally involve conflicting and non-commensurable objectives. Assuming a centralized perspective at the system-level, the set of Pareto-optimal alternatives represents the ideal solution of most of the problems. Yet, in typical real-world applications, only a few primary objectives are explicitly considered, taking precedence over all other concerns. These remaining concerns are then internalized as static constraints within the problem's formulation. This approach yields to solutions that fail to explore the full set of objectives tradeoffs. In this paper, we propose a novel method, called direct policy conditioning (DPC), that combines direct policy search, multi-objective evolutionary algorithms, and input variable selection to design dynamic constraints that change according to the current system conditions. The method is demonstrated for the management problem of the Conowingo Dam, located within the Lower Susquehanna River, USA. The DPC method is used to identify environmental protection mechanisms and is contrasted with traditional static constraints de fining minimum environmental flow requirements. Results show that the DPC method identifies a set of dynamically constrained control policies that overcome the current alternatives based on the minimum environmental flow constraint, in terms of environmental protection but also of the primary objectives

    Learning concurrent motor skills in versatile solution spaces

    Get PDF
    Future robots need to autonomously acquire motor skills in order to reduce their reliance on human programming. Many motor skill learning methods concentrate on learning a single solution for a given task. However, discarding information about additional solutions during learning unnecessarily limits autonomy. Such favoring of single solutions often requires re-learning of motor skills when the task, the environment or the robot’s body changes in a way that renders the learned solution infeasible. Future robots need to be able to adapt to such changes and, ideally, have a large repertoire of movements to cope with such problems. In contrast to current methods, our approach simultaneously learns multiple distinct solutions for the same task, such that a partial degeneration of this solution space does not prevent the successful completion of the task. In this paper, we present a complete framework that is capable of learning different solution strategies for a real robot Tetherball task

    Partitioning the impacts of streamflow and evaporation uncertainty on the operations of multipurpose reservoirs in arid regions

    Get PDF
    Ongoing changes in global climate are expected to alter the hydrologic regime of many river basins worldwide, expanding historically observed variability as well as increasing the frequency and intensity of extreme events. Understanding the vulnerabilities of water systems under such uncertain and variable hydrologic conditions is key to supporting strategic planning and design adaptation options. In this paper, we contribute a multiobjective assessment of the impacts of hydrologic uncertainty on the operations of multipurpose water reservoirs systems in arid climates. We focus our analysis on the Dez and Karoun river system in Iran, which is responsible for the production of more than 20% of the total hydropower generation of the country. A system of dams controls most of the water flowing to the lower part of the basin, where irrigation and domestic supply are strategic objectives, along with flood protection.We first design the optimal operations of the system using observed inflows and evaporation rates. Then, we simulate the resulting solutions over different ensembles of stochastic hydrology to partition the impacts of streamflow and evaporation uncertainty. Numerical results show that system operations are extremely sensitive to alterations of both uncertainty sources. In particular, we show that in this arid river basin, long-term objectives are mainly vulnerable to inflow uncertainty, whereas evaporation rate uncertainty mostly affects short-term objectives. Our results suggest that local water authorities should properly characterize hydrologic uncertainty in the design of future operations of the expanded network of reservoirs, possibly also investing in the improvement of the existing monitoring network to obtain more reliable data for modeling streamflow and evaporation processes

    Multiobjective direct policy search using physically based operating rules in multireservoir systems

    Get PDF
    supplemental_data_wr.1943-5452.0001159_ritter.pdf (492 KB)This study explores the ways to introduce physical interpretability into the process of optimizing operating rules for multireservoir systems with multiple objectives. Prior studies applied the concept of direct policy search (DPS), in which the release policy is expressed as a set of parameterized functions (e.g., neural networks) that are optimized by simulating the performance of different parameter value combinations over a testing period. The problem with this approach is that the operators generally avoid adopting such artificial black-box functions for the direct real-time control of their systems, preferring simpler tools with a clear connection to the system physics. This study addresses this mismatch by replacing the black-box functions in DPS with physically based parameterized operating rules, for example by directly using target levels in dams as decision variables. This leads to results that are physically interpretable and may be more acceptable to operators. The methodology proposed in this work is applied to a network of five reservoirs and four power plants in the Nechi catchment in Colombia, with four interests involved: average energy generation, firm energy generation, flood hazard, and flow regime alteration. The release policy is expressed depending on only 12 parameters, which significantly reduces the computational complexity compared to existing approaches of multiobjective DPS. The resulting four-dimensional Pareto-approximate set offers a variety of operational strategies from which operators may choose one that corresponds best to their preferences. For demonstration purposes, one particular optimized policy is selected and its parameter values are analyzed to illustrate how the physically based operating rules can be directly interpreted by the operators.Peer ReviewedPreprin

    Policy learning for time-bounded reachability in Continuous-Time Markov Decision Processes via doubly-stochastic gradient ascent

    Get PDF
    Continuous-time Markov decision processes are an important class of models in a wide range of applications, ranging from cyber-physical systems to synthetic biology. A central problem is how to devise a policy to control the system in order to maximise the probability of satisfying a set of temporal logic specifications. Here we present a novel approach based on statistical model checking and an unbiased estimation of a functional gradient in the space of possible policies. The statistical approach has several advantages over conventional approaches based on uniformisation, as it can also be applied when the model is replaced by a black box, and does not suffer from state-space explosion. The use of a stochastic gradient to guide our search considerably improves the efficiency of learning policies. We demonstrate the method on a proof-of-principle non-linear population model, showing strong performance in a non-trivial task
    corecore