64 research outputs found

    On the Locality of Action Domination in Sequential Decision Making

    Get PDF
    In the field of sequential decision making and reinforcement learning, it has been observed that good policies for most problems exhibit a significant amount of structure. In practice, this implies that when a learning agent discovers an action is better than any other in a given state, this action actually happens to also dominate in a certain neighbourhood around that state. This paper presents new results proving that this notion of locality in action domination can be linked to the smoothness of the environment's underlying stochastic model. Namely, we link the Lipschitz continuity of a Markov Decision Process to the Lispchitz continuity of its policies' value functions and introduce the key concept of influence radius to describe the neighbourhood of states where the dominating action is guaranteed to be constant. These ideas are directly exploited into the proposed Localized Policy Iteration (LPI) algorithm, which is an active learning version of Rollout-based Policy Iteration. Preliminary results on the Inverted Pendulum domain demonstrate the viability and the potential of the proposed approach

    Detecting Olives with Synthetic or Real Data? Olive the Above

    Full text link
    Modern robotics has enabled the advancement in yield estimation for precision agriculture. However, when applied to the olive industry, the high variation of olive colors and their similarity to the background leaf canopy presents a challenge. Labeling several thousands of very dense olive grove images for segmentation is a labor-intensive task. This paper presents a novel approach to detecting olives without the need to manually label data. In this work, we present the world's first olive detection dataset comprised of synthetic and real olive tree images. This is accomplished by generating an auto-labeled photorealistic 3D model of an olive tree. Its geometry is then simplified for lightweight rendering purposes. In addition, experiments are conducted with a mix of synthetically generated and real images, yielding an improvement of up to 66% compared to when only using a small sample of real data. When access to real, human-labeled data is limited, a combination of mostly synthetic data and a small amount of real data can enhance olive detection

    Rollout Sampling Approximate Policy Iteration

    Get PDF
    Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions which focus on policy representation using classifiers and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration scheme which addresses the core sampling problem in evaluating a policy through simulation as a multi-armed bandit machine. The resulting algorithm offers comparable performance to the previous algorithm achieved, however, with significantly less computational effort. An order of magnitude improvement is demonstrated experimentally in two standard reinforcement learning domains: inverted pendulum and mountain-car.Comment: 18 pages, 2 figures, to appear in Machine Learning 72(3). Presented at EWRL08, to be presented at ECML 200

    Approximate Policy Iteration using Large-Margin Classifiers

    No full text
    We present an approximate policy iteration algorithm that uses rollouts to estimate the value of each action under a given policy in a subset of states and a classifier to generalize and learn the improved policy over the entire state space. Using a multiclass support vector machine as the classifier, we obtained successful results on the inverted pendulum and the bicycle balancing and riding domains

    Spatial Knowledge in Humans, Animals and Robots

    No full text
    Humans, animals and robots are physically existing agents situated in the real world. Their common ability to extract, store and use spatial information is crucial for their successful operation. On the other hand, their idiosyncracies seem to be reflected on their spatial knowledge. The paper attempts a survey around the term cognitive map, coined to describe exactly the body of spatial knowledge held by an agent. The topic is discussed at both a global and an individual level. Keywords: Spatial Knowledge, Spatial Cognition, Cognitive Maps. 1 Introduction Humans, animals and robots are physically existent agents situated in the real world space. The ability to perceive properties of that space and the ability of locomotion in the very same space is something common to all of them. It seems that they are crucial to their successful operation for a number of reasons, including survivability. However, their relatively small size, and the need to be at different places at different time..
    • …
    corecore