365 research outputs found

    Using Monte Carlo Search With Data Aggregation to Improve Robot Soccer Policies

    Full text link
    RoboCup soccer competitions are considered among the most challenging multi-robot adversarial environments, due to their high dynamism and the partial observability of the environment. In this paper we introduce a method based on a combination of Monte Carlo search and data aggregation (MCSDA) to adapt discrete-action soccer policies for a defender robot to the strategy of the opponent team. By exploiting a simple representation of the domain, a supervised learning algorithm is trained over an initial collection of data consisting of several simulations of human expert policies. Monte Carlo policy rollouts are then generated and aggregated to previous data to improve the learned policy over multiple epochs and games. The proposed approach has been extensively tested both on a soccer-dedicated simulator and on real robots. Using this method, our learning robot soccer team achieves an improvement in ball interceptions, as well as a reduction in the number of opponents' goals. Together with a better performance, an overall more efficient positioning of the whole team within the field is achieved

    Spatial representation for planning and executing robot behaviors in complex environments

    Get PDF
    Robots are already improving our well-being and productivity in different applications such as industry, health-care and indoor service applications. However, we are still far from developing (and releasing) a fully functional robotic agent that can autonomously survive in tasks that require human-level cognitive capabilities. Robotic systems on the market, in fact, are designed to address specific applications, and can only run pre-defined behaviors to robustly repeat few tasks (e.g., assembling objects parts, vacuum cleaning). They internal representation of the world is usually constrained to the task they are performing, and does not allows for generalization to other scenarios. Unfortunately, such a paradigm only apply to a very limited set of domains, where the environment can be assumed to be static, and its dynamics can be handled before deployment. Additionally, robots configured in this way will eventually fail if their "handcrafted'' representation of the environment does not match the external world. Hence, to enable more sophisticated cognitive skills, we investigate how to design robots to properly represent the environment and behave accordingly. To this end, we formalize a representation of the environment that enhances the robot spatial knowledge to explicitly include a representation of its own actions. Spatial knowledge constitutes the core of the robot understanding of the environment, however it is not sufficient to represent what the robot is capable to do in it. To overcome such a limitation, we formalize SK4R, a spatial knowledge representation for robots which enhances spatial knowledge with a novel and "functional" point of view that explicitly models robot actions. To this end, we exploit the concept of affordances, introduced to express opportunities (actions) that objects offer to an agent. To encode affordances within SK4R, we define the "affordance semantics" of actions that is used to annotate an environment, and to represent to which extent robot actions support goal-oriented behaviors. We demonstrate the benefits of a functional representation of the environment in multiple robotic scenarios that traverse and contribute different research topics relating to: robot knowledge representations, social robotics, multi-robot systems and robot learning and planning. We show how a domain-specific representation, that explicitly encodes affordance semantics, provides the robot with a more concrete understanding of the environment and of the effects that its actions have on it. The goal of our work is to design an agent that will no longer execute an action, because of mere pre-defined routine, rather, it will execute an actions because it "knows'' that the resulting state leads one step closer to success in its task

    Deep learning based approaches for imitation learning.

    Get PDF
    Imitation learning refers to an agent's ability to mimic a desired behaviour by learning from observations. The field is rapidly gaining attention due to recent advances in computational and communication capabilities as well as rising demand for intelligent applications. The goal of imitation learning is to describe the desired behaviour by providing demonstrations rather than instructions. This enables agents to learn complex behaviours with general learning methods that require minimal task specific information. However, imitation learning faces many challenges. The objective of this thesis is to advance the state of the art in imitation learning by adopting deep learning methods to address two major challenges of learning from demonstrations. Firstly, representing the demonstrations in a manner that is adequate for learning. We propose novel Convolutional Neural Networks (CNN) based methods to automatically extract feature representations from raw visual demonstrations and learn to replicate the demonstrated behaviour. This alleviates the need for task specific feature extraction and provides a general learning process that is adequate for multiple problems. The second challenge is generalizing a policy over unseen situations in the training demonstrations. This is a common problem because demonstrations typically show the best way to perform a task and don't offer any information about recovering from suboptimal actions. Several methods are investigated to improve the agent's generalization ability based on its initial performance. Our contributions in this area are three fold. Firstly, we propose an active data aggregation method that queries the demonstrator in situations of low confidence. Secondly, we investigate combining learning from demonstrations and reinforcement learning. A deep reward shaping method is proposed that learns a potential reward function from demonstrations. Finally, memory architectures in deep neural networks are investigated to provide context to the agent when taking actions. Using recurrent neural networks addresses the dependency between the state-action sequences taken by the agent. The experiments are conducted in simulated environments on 2D and 3D navigation tasks that are learned from raw visual data, as well as a 2D soccer simulator. The proposed methods are compared to state of the art deep reinforcement learning methods. The results show that deep learning architectures can learn suitable representations from raw visual data and effectively map them to atomic actions. The proposed methods for addressing generalization show improvements over using supervised learning and reinforcement learning alone. The results are thoroughly analysed to identify the benefits of each approach and situations in which it is most suitable

    Interactive generation and learning of semantic-driven robot behaviors

    Get PDF
    The generation of adaptive and reflexive behavior is a challenging task in artificial intelligence and robotics. In this thesis, we develop a framework for knowledge representation, acquisition, and behavior generation that explicitly incorporates semantics, adaptive reasoning and knowledge revision. By using our model, semantic information can be exploited by traditional planning and decision making frameworks to generate empirically effective and adaptive robot behaviors, as well as to enable complex but natural human-robot interactions. In our work, we introduce a model of semantic mapping, we connect it with the notion of affordances, and we use those concepts to develop semantic-driven algorithms for knowledge acquisition, update, learning and robot behavior generation. In particular, we apply such models within existing planning and decision making frameworks to achieve semantic-driven and adaptive robot behaviors in a generic environment. On the one hand, this work generalizes existing semantic mapping models and extends them to include the notion of affordances. On the other hand, this work integrates semantic information within well-defined long-term planning and situated action frameworks to effectively generate adaptive robot behaviors. We validate our approach by evaluating it on a number of problems and robot tasks. In particular, we consider service robots deployed in interactive and social domains, such as offices and domestic environments. To this end, we also develop prototype applications that are useful for evaluation purposes

    Complementary Layered Learning

    Get PDF
    Layered learning is a machine learning paradigm used to develop autonomous robotic-based agents by decomposing a complex task into simpler subtasks and learns each sequentially. Although the paradigm continues to have success in multiple domains, performance can be unexpectedly unsatisfactory. Using Boolean-logic problems and autonomous agent navigation, we show poor performance is due to the learner forgetting how to perform earlier learned subtasks too quickly (favoring plasticity) or having difficulty learning new things (favoring stability). We demonstrate that this imbalance can hinder learning so that task performance is no better than that of a suboptimal learning technique, monolithic learning, which does not use decomposition. Through the resulting analyses, we have identified factors that can lead to imbalance and their negative effects, providing a deeper understanding of stability and plasticity in decomposition-based approaches, such as layered learning. To combat the negative effects of the imbalance, a complementary learning system is applied to layered learning. The new technique augments the original learning approach with dual storage region policies to preserve useful information from being removed from an agent’s policy prematurely. Through multi-agent experiments, a 28% task performance increase is obtained with the proposed augmentations over the original technique

    Machine Learning for Ad Publishers in Real Time Bidding

    Get PDF

    Advancing the Applicability of Reinforcement Learning to Autonomous Control

    Get PDF
    Mit dateneffizientem Reinforcement Learning (RL) konnten beeindruckendeErgebnisse erzielt werden, z.B. für die Regelung von Gasturbinen. In derPraxis erfordert die Anwendung von RL jedoch noch viel manuelle Arbeit, wasbisher RL für die autonome Regelung untauglich erscheinen ließ. Dievorliegende Arbeit adressiert einige der verbleibenden Probleme, insbesonderein Bezug auf die Zuverlässigkeit der Policy-Erstellung. Es werden zunächst RL-Probleme mit diskreten Zustands- und Aktionsräumenbetrachtet. Für solche Probleme wird häufig ein MDP aus Beobachtungengeschätzt, um dann auf Basis dieser MDP-Schätzung eine Policy abzuleiten. DieArbeit beschreibt, wie die Schätzer-Unsicherheit des MDP in diePolicy-Erstellung eingebracht werden kann, um mit diesem Wissen das Risikoeiner schlechten Policy aufgrund einer fehlerhaften MDP-Schätzung zuverringern. Außerdem wird so effiziente Exploration sowie Policy-Bewertungermöglicht. Anschließend wendet sich die Arbeit Problemen mit kontinuierlichenZustandsräumen zu und konzentriert sich auf auf RL-Verfahren, welche aufFitted Q-Iteration (FQI) basieren, insbesondere Neural Fitted Q-Iteration(NFQ). Zwar ist NFQ sehr dateneffizient, jedoch nicht so zuverlässig, wie fürdie autonome Regelung nötig wäre. Die Arbeit schlägt die Verwendung vonEnsembles vor, um die Zuverlässigkeit von NFQ zu erhöhen. Es werden eine Reihevon Möglichkeiten der Ensemble-Nutzung entworfen und evaluiert. Bei allenbetrachteten RL-Problemen sorgen Ensembles für eine zuverlässigere Erstellungguter Policies. Im nächsten Schritt werden Möglichkeiten der Policy-Bewertung beikontinuierlichen Zustandsräumen besprochen. Die Arbeit schlägt vor, FittedPolicy Evaluation (FPE), eine Variante von FQI für Policy Evaluation, mitanderen Regressionsverfahren und/oder anderen Datensätzen zu kombinieren, umein Maß für die Policy-Qualität zu erhalten. Experimente zeigen, dassExtra-Tree-FPE ein realistisches Qualitätsmaß für NFQ-generierte Policies liefernkann. Schließlich kombiniert die Arbeit Ensembles und Policy-Bewertung, um mit sichändernden RL-Problemen umzugehen. Der wesentliche Beitrag ist das EvolvingEnsemble, dessen Policy sich langsam ändert, indem alte, untaugliche Policiesentfernt und neue hinzugefügt werden. Es zeigt sich, dass das EvolvingEnsemble deutlich besser funktioniert als einfachere Ansätze.With data-efficient reinforcement learning (RL) methods impressive resultscould be achieved, e.g., in the context of gas turbine control. However, inpractice the application of RL still requires much human intervention, whichhinders the application of RL to autonomous control. This thesis addressessome of the remaining problems, particularly regarding the reliability of thepolicy generation process. The thesis first discusses RL problems with discrete state and action spaces.In that context, often an MDP is estimated from observations. It is describedhow to incorporate the estimators' uncertainties into the policy generationprocess. This information can then be used to reduce the risk of obtaining apoor policy due to flawed MDP estimates. Moreover, it is discussed how to usethe knowledge of uncertainty for efficient exploration and the assessment ofpolicy quality without requiring the policy's execution. The thesis then moves on to continuous state problems and focuses on methodsbased on fitted Q-iteration (FQI), particularly neural fitted Q-iteration(NFQ). Although NFQ has proven to be very data-efficient, it is not asreliable as required for autonomous control. The thesis proposes to useensembles to increase reliability. Several ways of ensemble usage in an NFQcontext are discussed and evaluated on a number of benchmark domains. It showsthat in all considered domains with ensembles good policies can be producedmore reliably. Next, policy assessment in continuous domains is discussed. The thesisproposes to use fitted policy evaluation (FPE), an adaptation of FQI to policyevaluation, combined with a different function approximator and/or differentdataset to obtain a measure for policy quality. Results of experiments showthat extra-tree FPE, applied to policies generated by NFQ, produces valuefunctions that can well be used to reason about the true policy quality. Finally, the thesis combines ensembles and policy assessment to derive methodsthat can deal with changing environments. The major contribution is theevolving ensemble. The policy of the evolving ensemble changes slowly as newpolicies are added and old policies removed. It turns out that the evolvingensemble approaches work considerably better than simpler approaches likesingle policies learned with recent observations or simple ensembles
    • …
    corecore