131 research outputs found

    Probabilistic policy reuse for safe reinforcement learning

    Get PDF
    This work introducesPolicy Reuse for Safe Reinforcement Learning, an algorithm that combines ProbabilisticPolicy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforce-ment learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. Thealgorithm uses a continuously increasing monotonic risk function that allows for the identification of theprobability to end up in failure from a given state. Such a risk function is defined in terms of how far such astate is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balancethe exploitation of actual learned knowledge, the exploration of new actions, and the request of teacher advicein parts of the state space considered dangerous. Specifically, thepi-reuse exploration strategy is used. Usingexperiments in the helicopter hover task and a business management problem, we show that thepi-reuseexploration strategy can be used to completely avoid the visit to undesirable situations while maintainingthe performance (in terms of the classical long-term accumulated reward) of the final policy achieved.This paper has been partially supported by the Spanish Ministerio de Economía y Competitividad TIN2015-65686-C5-1-R and the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No. 730086 (ERGO). Javier García is partially supported by the Comunidad de Madrid (Spain) funds under the project 2016-T2/TIC-1712

    Aprendizaje por refuerzo para la toma de decisiones seguras en dominios con espacios de estados y acciones continuos

    Get PDF
    Los problemas de decisión constituyen uno de los campos m as fértiles para la aplicación de t ecnicas de Inteligencia Artificial (IA). Entre todas ellas, el Aprendizaje por Refuerzo ha surgido como un marco útil para el aprendizaje de políticas de comportamiento para la toma de decisiones a partir de la experiencia generada en entornos dinámicos y complejos. En Aprendizaje por Refuerzo, el agente interacciona con el entorno y una función de refuerzo se encarga de indicarle si está haciendo bien o mal la tarea que está aprendiendo. Gran parte del Aprendizaje por Refuerzo se fundamenta en las funciones de valor que proporcionan información acerca de la utilidad de encontrarse en un estado durante un proceso de toma de decisiones, o acerca de la utilidad de tomar una acción en un estado. Cuando se afrontan problemas donde los espacios de estados y acciones es muy grande o incluso continuo, la tradicional representación tabular de la función de valor no es práctica debido al alto coste que exigirá su almacenamiento y su cálculo. En estos casos, es necesaria la aplicación de técnicas de generalización que permitan obtener representaciones más compactas tanto del espacio de estados como del de acciones, de forma que se puedan aplicar eficientemente las técnicas de Aprendizaje por Refuerzo. Además de los espacios de estados y acciones continuos, otro problema importante al que debe hacer frente el Aprendizaje por Refuerzo es minimizar el n umero de daños (por colisiones, caídas) que se pueden ocasionar en el agente o en el sistema durante el proceso de aprendizaje (e.g., en una tarea donde se trata de aprender a volar un helicóptero, éste puede acabar chocando; cuando se trata de enseñar a andar a un robot, éste puede caerse). En esta Tesis se plantean dos grandes objetivos. El primero es c omo afrontar problemas donde los espacios de estados y acciones son de naturaleza continua (por tanto infinito) y de grandes dimensiones. Una de las opciones se centra en las técnicas de generalización basadas en la discretización. En esta Tesis se desarrollan algoritmos que combinan con éxito el uso de aproximación de funciones y técnicas de discretización, tratando de aprovechar las ventajas que ofrecen ambas técnicas. El segundo objetivo que se plantea para esta Tesis es minimizar el n umero de daños que sufre el agente o el sistema durante el proceso de aprendizaje en problemas totalmente continuos y de grandes dimensiones. En esta Tesis se da una nueva definición del concepto de riesgo, que permite identificar estados donde el agente es más propenso a sufrir algún tipo de daño. La consecución de los objetivos planteados implicará además investigar sobre la utilización de comportamientos base o expertos subóptimos que permitirán aportar conocimiento sobre la tarea que se trata de aprender, necesario cuando se abordan problemas complejos de grandes dimensiones y donde, además, el agente puede sufrir daños

    A taxonomy for similarity metrics between Markov decision processes

    Get PDF
    Although the notion of task similarity is potentially interesting in a wide range of areas such as curriculum learning or automated planning, it has mostly been tied to transfer learning. Transfer is based on the idea of reusing the knowledge acquired in the learning of a set of source tasks to a new learning process in a target task, assuming that the target and source tasks are close enough. In recent years, transfer learning has succeeded in making reinforcement learning (RL) algorithms more efficient (e.g., by reducing the number of samples needed to achieve (near-)optimal performance). Transfer in RL is based on the core concept of similarity: whenever the tasks are similar, the transferred knowledge can be reused to solve the target task and significantly improve the learning performance. Therefore, the selection of good metrics to measure these similarities is a critical aspect when building transfer RL algorithms, especially when this knowledge is transferred from simulation to the real world. In the literature, there are many metrics to measure the similarity between MDPs, hence, many definitions of similarity or its complement distance have been considered. In this paper, we propose a categorization of these metrics and analyze the definitions of similarity proposed so far, taking into account such categorization. We also follow this taxonomy to survey the existing literature, as well as suggesting future directions for the construction of new metricsOpen Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work has also been supported by the Madrid Government (Comunidad de Madrid-Spain) under the Multiannual Agreement with UC3M in the line of Excellence of University Professors (EPUC3M17), and in the context of the V PRICIT (Regional Programme of Research and Technological Innovation)S

    On-line case-based policy learning for automated planning in probabilistic environments

    Get PDF
    Many robotic control architectures perform a continuous cycle of sensing, reasoning and acting, where that reasoning can be carried out in a reactive or deliberative form. Reactive methods are fast and provide the robot with high interaction and response capabilities. Deliberative reasoning is particularly suitable in robotic systems because it employs some form of forward projection (reasoning in depth about goals, pre-conditions, resources and timing constraints) and provides the robot reasonable responses in situations unforeseen by the designer. However, this reasoning, typically conducted using Artificial Intelligence techniques like Automated Planning (AP), is not effective for controlling autonomous agents which operate in complex and dynamic environments. Deliberative planning, although feasible in stable situations, takes too long in unexpected or changing situations which require re-planning. Therefore, planning cannot be done on-line in many complex robotic problems, where quick responses are frequently required. In this paper, we propose an alternative approach based on case-based policy learning which integrates deliberative reasoning through AP and reactive response time through reactive planning policies. The method is based on learning planning knowledge from actual experiences to obtain a case-based policy. The contribution of this paper is two fold. First, it is shown that the learned case-based policy produces reasonable and timely responses in complex environments. Second, it is also shown how one case-based policy that solves a particular problem can be reused to solve a similar but more complex problem in a transfer learning scope.This paper has been partially supported by the Spanish Ministerio de Econom a y Competitividad TIN2015-65686-C5-1-R and the European Union's Horizon 2020 Research and Innovation programme under Grant Agreement No. 730086 (ERGO)

    Challenges on the application of automated planning for comprehensive geriatric assessment using an autonomous social robot

    Get PDF
    November 22-23, 2018, Madrid, SpainComprehensive Geriatric Assessment is a medical procedure to evaluate the physical, social and psychological status of elder patients. One of its phases consists of performing different tests to the patient or relatives. In this paper we present the challenges to apply Automated Planning to control an autonomous robot helping the clinician to perform such tests. On the one hand the paper focuses on the modelling decisions taken, from an initial approach where each test was encoded using slightly different domains, to the final unified domain allowing any test to be represented. On the other hand, the paper deals with practical issues arisen when executing the plans. Preliminary tests performed with real users show that the proposed approach is able to seamlessly handle the patient-robot interaction in real time, recovering from unexpected events and adapting to the users' preferred input method, while being able to gather all the information needed by the clinician.This work has been partially funded by the European Union ECHORD++ project (FP7-ICT-601116) and the TIN2015-65686-C5 Spanish Ministerio de Economía y Competitividad project. Javier García is partially supported by the Comunidad de Madrid (Spain) funds under the project 2016-T2/TIC-1712

    An Automated Planning Model for HRI: Use Cases on Social Assistive Robotics

    Get PDF
    Using Automated Planning for the high level control of robotic architectures is becoming very popular thanks mainly to its capability to define the tasks to perform in a declarative way. However, classical planning tasks, even in its basic standard Planning Domain Definition Language (PDDL) format, are still very hard to formalize for non expert engineers when the use case to model is complex. Human Robot Interaction (HRI) is one of those complex environments. This manuscript describes the rationale followed to design a planning model able to control social autonomous robots interacting with humans. It is the result of the authors’ experience in modeling use cases for Social Assistive Robotics (SAR) in two areas related to healthcare: Comprehensive Geriatric Assessment (CGA) and non-contact rehabilitation therapies for patients with physical impairments. In this work a general definition of these two use cases in a unique planning domain is proposed, which favors the management and integration with the software robotic architecture, as well as the addition of new use cases. Results show that the model is able to capture all the relevant aspects of the Human-Robot interaction in those scenarios, allowing the robot to autonomously perform the tasks by using a standard planning-execution architecture.This work has been partially funded by the European Union ECHORD++ project (FP7-ICT-601116), and grants TIN2017-88476-C2-2-R and RTI2018-099522-B-C43 of FEDER/Ministerio de Ciencia e Innovación-Ministerio de Universidades-Agencia Estatal de Investigación. Javier García is partially supported by the Comunidad de Madrid funds under the project 2016-T2/TIC-1712

    Procedimiento para la detección de grietas en ejes de pala de aerogeneradores

    Get PDF
    Se presenta un procedimiento para detectar la existencia de una grieta en los ejes de pala de un aerogenerador, sin que sea necesario desmontar la maquina para el proceso de inspección. El procedimiento es sencillo de aplicar y con costes de inspección bastante reducidos. El método se basa en la variación de las tensiones en la superficie del eje, en una zona cercana a la de inicio de la grieta, al ir creciendo esta. Se analiza numéricamente mediante elementos finitos un modelo del eje y posteriormente se realizan dos ensayos con ejes de pala instrumentados mediante bandas extensométricas, analizando la evolución de las tensiones con la longitud de grieta y la influencia de las condiciones de contorno.A procedure is presented to detect the existence of a crack in the blade shaft of wind power turbine, without it is necessary to dismount the it schemes for the inspection process. The procedure is simple of applying and with quite reduced inspection costs. The method is based on the variation of the stress in the surface of the shaft, in a near area to that of beginning of the crack, when going growing this. lt is analysed numerically by means of finite elements. They are carried out tests with two shafts with strain gauges, analysing the evolution of the stress with the crack dimension and the influence of the boundary conditions

    ENSO coupling to the equatorial Atlantic: analysis with an extended improved recharge oscillator model

    Get PDF
    © 2023 Crespo-Miguel, Polo, Mechoso, Rodríguez-Fonseca and Cao-García. Weacknowledge Javier Jarillo and Lander R. Crespo for their help during the early stages of manuscript writing. We acknowledge the World Climate Research Programme’s Working Group on Coupled Modeling, responsible for CMIP, and we thank the climate modeling groups for producing and making available their model output. This work was financially supported by 817578 TRIATLAS project of the Horizon 2020 Programme (EU) and RTI2018095802-B-I00 and PID2021-125806NB-I00 of Ministerio de Economía y Competitividad (Spain), Fondo Europeo de Desarrollo Regional (FEDER, EU), the European Union Seventh Framework Programme (EU-FP7/2007–2013) PREFACE (Grant Agreement No. 603521), the ERC STERCP project (grant 648982), the ARC Centre of Excellence in Climate Extremes (CE170100023) and the Spanish project (CGL201786415-R).Introduction: Observational and modeling studies have examined the interactions between El Niño-Southern Oscillation (ENSO) and the equatorial Atlantic variability as incorporated into the classical charge-recharge oscillator model of ENSO. These studies included the role of the Atlantic in the predictability of ENSO but assumed stationarity in the relationships, i.e., that models’ coefficients do not change overtime. Arecentworkbytheauthors has challenged the stationarity assumption in the ENSO framework but without considering the equatorial Atlantic influence on ENSO. Methods: The present paper addresses the changing relationship between ENSO and the Atlantic El Niño using an extended version of the recharge oscillator model. The classical two-variable model of ENSO is extended by adding a linear coupling on the SST anomalies in the equatorial Atlantic. The model’s coefficients are computed for different periods. This calculation is done using two methods tofitthemodel tothe data: (1) the traditional method (ReOsc), and (2) a novel method (ReOsc+) based on fitting the Fisher’s Z transform of the auto and cross-correlation functions. Results: Weshowthat, duringthe 20th century, the characteristic dampingrate of the SST and thermocline depth anomalies in the Pacific have decreased in time by a factor of 2 and 3, respectively. Moreover, the damping time of the ENSO fluctuations has doubled from 10 to 20 months, and the oscillation period of ENSO has decreased from 60-70 months before the 1960s to 50 months afterward. These two changes have contributed to enhancing ENSO amplitude. The results also show that correlations between ENSO and the Atlantic SST strengthened after the 70s and the way in which the impact of the equatorial Atlantic is added to the internal ENSO variability. Conclusions: The remote effects of the equatorial Atlantic on ENSO must be considered in studies of ENSO dynamics and predictability during specific time-periods. Our results provide further insight into the evolution of the ENSO dynamics anditscoupling to the equatorial Atlantic, as well as an improved tool to study the coupling of climatic and ecological variables.Depto. de Estructura de la Materia, Física Térmica y ElectrónicaDepto. de Física de la Tierra y AstrofísicaFac. de Ciencias FísicasTRUEHorizon 2020 Programme (EU)Ministerio de Economía y CompetitividadFondo Europeo de Desarrollo Regional (FEDER, EU)European Union Seventh Framework ProgrammeERC STERCP projectARC Centre of Excellence in Climate ExtremesSpanish projectpu

    Los derechos de la infancia y la ciudadanía global en la práctica de las ciencias de la comunicación: propuesta formativa para los estudios universitarios de Periodismo, Comunicación audiovisual y Publicidad

    Get PDF
    Presentamos esta guía con el propósito de que los profesores y estudiantes de los grados universitarios de Comunicación, Periodismo, Publicidad y Relaciones Públicas, y Comunicación Audiovisual se acerquen a los derechos de la infancia y ganen conciencia del poder de su trabajo profesional en la transformación de las normas sociales, actitudes y estereotipos para facilitar el cumplimiento de los derechos

    Selection of levels for the properties of industrial products in the design phase using machine learning techniques

    Get PDF
    En la constante carrera para alcanzar la eficiencia organizacional, las empresas se han dado a la tarea de buscar herramientas que permitan el mejoramiento productivo de los procesos dentro de una cadena de valor empresarial. La Ingeniería del Diseño de Productos Industriales, no se ha quedado atrás, ha estado evolucionando incorporando herramientas tecnológicas que permitirán elevar su rendimiento, disminuyendo desperdicios operacionales y apuntando hacia los objetivos, satisfacer las necesidades puntuales de los clientes. La fusión de los procesos tradicionales y las tecnologías de vanguardia, como lo es, la aplicación de Técnicas de Aprendizaje Automático para la predicción de niveles en las propiedades de productos en su fase inicial. Los datos obtenidos para su procesamiento, derivados directamente de los procesos de las Neurociencias e Ingeniería Kansei, que demuestran el comportamiento en el ámbito neuro sensorial de los productos con respecto a sus consumidores. No obstante, se busca crear una herramienta aplicable en las empresas, para que la puesta en marcha de estas aplicaciones sean lo mas amigable y eficiente para el entorno de cualquier Departamento de Desarrollos de Productos.In the constant race to achieve organizational efficiency, companies have taken on the task of looking for tools that allow the productive improvement of processes within a business value chain. Industrial Product Design Engineering has not been left behind, it has been evolving incorporating technological tools that will allow it to increase its performance, reduce operational waste and aim towards the objectives, satisfying the specific needs of customers. The fusion of traditional processes and cutting-edge technologies, such as the application of Machine Learning Techniques for the prediction of levels in the properties of products in their initial phase. The data obtained for its processing, derived directly from the Kansei Engineering and Neuroscience, which demonstrate the behavior in the neuro-sensory field of the products with respect to their consumers. However, it seeks to create a tool applicable to companies, so that the implementation of these applications is as friendly and efficient as possible for the environment of any Product Development Department
    corecore