169 research outputs found

    The 2014 International Planning Competition: Progress and Trends

    Get PDF
    We review the 2014 International Planning Competition (IPC-2014), the eighth in a series of competitions starting in 1998. IPC-2014 was held in three separate parts to assess state-of-the-art in three prominent areas of planning research: the deterministic (classical) part (IPCD), the learning part (IPCL), and the probabilistic part (IPPC). Each part evaluated planning systems in ways that pushed the edge of existing planner performance by introducing new challenges, novel tasks, or both. The competition surpassed again the number of competitors than its predecessor, highlighting the competition’s central role in shaping the landscape of ongoing developments in evaluating planning systems

    LEARNING TO ACT WITH ROBUSTNESS

    Get PDF
    Reinforcement Learning (RL) is learning to act in different situations to maximize a numerical reward signal. The most common approach of formalizing RL is to use the frameworkof optimal control in an inadequately known Markov Decision Process (MDP). Traditional approaches toward solving RL problems build on two common assumptions: i) exploration is allowed for the purpose of learning the MDP model and ii) optimizing for the expected objective is sufficient. These assumptions comfortably hold for many simulated domains like games (e.g. Atari, Go), but are not sufficient for many real-world problems. Consider for example the domain of precision medicine for personalized treatment. Adopting a medical treatment for the sole purpose of learning its impact is prohibitive. It is also not permissible to embrace a specific treatment procedure by considering only the expected outcome, ignoring the potential of worst-case undesirable effects. Therefore, applying RL to solve real-world problems brings some additional challenges to address. In this thesis, we assume that exploration is impossible because of the sensitivity of actions in the domain. We therefore adopt a Batch RL framework, which operates with a logged set of fixed dataset without interacting with the environment. We also accept the need of finding solutions that work well in both average and worst case situations, we label such solutions as robust. We consider the robust MDP (RMDP) framework for handling these challenges. RMDPs provide the foundations of quantifying the uncertainties about the model by using so called ambiguity sets. Ambiguity sets represent the set of plausible transition probabilities - which is usually constructed as a multi-dimensional confidence region. Ambiguity sets determine the trade-off between robustness and average-case performance of an RMDP. This thesis presents a novel approach to optimizing the shape of ambiguity sets constructed with weighted L1−norm. We derive new high-confidence sampling bounds for weighted L1 ambiguity sets and describe how to compute near-optimal weights from coarse estimates of value functions. Experimental results on a diverse set of benchmarks show that optimized ambiguity sets provide significantly tighter robustness guarantees. In addition to reshaping the ambiguity sets, it is also desirable to optimize the size and position of the sets for further improvement in performance. In this regard, this thesis presents a method for constructing ambiguity sets that can achieve less conservative solutions with the same worst-case guarantees by 1) leveraging a Bayesian prior, and 2) relaxing the requirement that the set is a confidence interval. Our theoretical analysis establishes the safety of the proposed method, and the empirical results demonstrate its practical promise. In addition to optimizing ambiguity sets for RMDPs, this thesis also proposes a new paradigm for incorporating robustness into the constrained-MDP framework. We apply robustness to both the rewards and constrained-costs, because robustness is equally (if not more) important for the constrained costs as well. We derive required gradient update rules and propose a policy gradient class of algorithm. The performance of the proposed algorithm is evaluated on several problem domains. Parallel to Robust-MDPs, a slightly different perspective on handling model uncertainties is to compute soft-robust solutions using a risk measure (e.g. Value-at-Risk or Conditional Value-at-Risk). In high-stakes domains, it is important to quantify and manage risk that arises from inherently stochastic transitions between different states of the model. Most prior work on robust RL and risk-averse RL address the inherent transition uncertainty and model uncertainty independently. This thesis proposes a unified Risk-Averse Soft-Robust (RASR) framework that quantifies both model and transition uncertainties together. We show that the RASR objective can be solved efficiently when formulated using the Entropic risk measure. We also report theoretical analysis and empirical evidences on several problem domains. The methods presented in this thesis can potentially be applied in many practical applications of artificial intelligence, such as agriculture, healthcare, robotics and so on. They help us to broaden our understanding toward computing robust solutions to safety critical domains. Having robust and more realistic solutions to sensitive practical problems can inspire widespread adoption of AI to solve challenging real world problems, potentially leading toward the pinnacle of the age of automation

    Agrobiodiversity For Pest Management: An Integrated Bioeconomic Simulation and Machine Learning Approach

    Get PDF
    A pressing challenge of modern agriculture is to develop means of decreasing the negative impacts of pesticides while maintaining low pest pressure and high crop yield. Certain crop varieties, especially wild relatives of domesticated crops, provide pest regulation ecosystem services through chemical defense mechanisms. Benefits from these ecosystem service can be realized by intercropping cash crops with repellent wild varieties to reduce pest pressure. An opportunity cost exists, however, which consists of lower yield and market value. Such is the case of heirloom apple varieties that are more resistant to the codling moth but have a lower market value compared to commercial apples such as Red Delicious and Gala. In this thesis, I first develop a model to identify the bioeconomically optimal intercropping level of commercial and wild varieties with the purpose of pest management in the specific case of the codling moth. Second, I develop a model that uses a machine learning technique to determine pesticide application policies for the multi-variety orchard, where the solution is robust to model and data uncertainty. Model 1 is a tree-level, spatially-explicit, bioeconomic simulation model. In the baseline case, we find that the bioeconomically optimal variety mix consists of 20% cider variety and 80% commercial variety. We analyze the sensitivity of the optimal mix to the market price difference of the two apple varieties and find that the optimal proportion of cider decreases linearly and that 100% commercial variety is optimal if the price difference is greater than $0.3/lb. We consider eight different spatial configurations for the intercropping, in addition to the baseline random spatial intercropping and find that the diagonal configuration yields the highest net present value and requires the lowest amount of cider intercropping (4%). Random spatial intercropping, in contrast, ranks seventh and has the second-highest optimal proportion of cider (30%). We use the certainty equivalent measure to determine how the optimal mix changes for a grower who has a moderate level of risk aversion, where production risk is driven by the effect of temperature on codling moth infestation over the years. The optimal cider variety percentage for a moderately risk-averse grower increases to 38% compared to the baseline case of 20% of a risk-neutral grower. We also document the risk-reducing effect of apple agrobiodiversity by characterizing how the risk premium decreases with increasing proportions of cider. In Model 2, we determine the robust optimal pesticide application threshold, given an infested multi-variety orchard consisting of the optimal proportion of cider varieties, arranged in a random spatial configuration. We use historical degree-day (DD) data and associated established DD threshold-based spray recommendations to add pesticide application features to our Model 1 and then use it as a simulator to generate data on infestation and damage level over time. We then use Reinforcement Learning (RL) to find the robust optimal pesticide application threshold around 1,000 insects over the entire orchard. The model solution shows a greater degree of sensitivity to pesticide application costs compared to the pest growth rate, indicating the importance of addressing the data uncertainty of these parameters

    Near-optimal PAC bounds for discounted MDPs

    No full text
    We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (mdps). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (ucrl) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends linearly on the number of non-zero transition probabilities. The lower bound strengthens previous work by being both more general (it applies to all policies) and tighter. The upper and lower bounds match up to logarithmic factors provided the transition matrix is not too dense
    corecore