22 research outputs found

    EMBEDDED LEARNING ROBOT WITH FUZZY Q-LEARNING FOR OBSTACLE AVOIDANCE BEHAVIOR

    Get PDF
    Fuzzy Q-learning is extending of Q-learning algorithm that uses fuzzy inference system to enable Q-learning holding continuous action and state. This learning has been implemented in various robot learning application like obstacle avoidance and target searching. However, most of them have not been realized in embedded robot. This paper presents implementation of fuzzy Q-learning for obstacle avoidance navigation in embedded mobile robot. The experimental result demonstrates that fuzzy Q-learning enables robot to be able to learn the right policy i.e. to avoid obstacle

    BEHAVIOR BASED CONTROL AND FUZZY Q-LEARNING FOR AUTONOMOUS FIVE LEGS ROBOT NAVIGATION

    Get PDF
    This paper presents collaboration of behavior based control and fuzzy Q-learning for five legs robot navigation systems. There are many fuzzy Q-learning algorithms that have been proposed to yield individual behavior like obstacle avoidance, find target and so on. However, for complicated tasks, it is needed to combine all behaviors in one control schema using behavior based control. Based this fact, this paper proposes a control schema that incorporate fuzzy q-learning in behavior based schema to overcome complicated tasks in navigation systems of autonomous five legs robot. In the proposed schema, there are two behaviors which is learned by fuzzy q-learning. Other behaviors is constructed in design step. All behaviors are coordinated by hierarchical hybrid coordination node. Simulation results demonstrate that the robot with proposed schema is able to learn the right policy, to avoid obstacle and to find the target. However, Fuzzy q-learning failed to give right policy for the robot to avoid collision in the corner location. Keywords : behavior based control, fuzzy q-learnin

    Q-learning for Robots

    No full text
    International audienceRobot learning is a challenging – and somewhat unique – research domain. If a robot behavior is defined as a mapping between situations that occurred in the real world and actions to be accomplished, then the supervised learning of a robot behavior requires a set of representative examples (situation, desired action). In order to be able to gather such learning base, the human operator must have a deep understanding of the robot-world interaction (i.e., a model). But, there are many application domains where such models cannot be obtained, either because detailed knowledge of the robot’s world is unavailable (e.g., spatial or underwater exploration, nuclear or toxic waste management), or because it would be to costly. In this context, the automatic synthesis of a representative learning base is an important issue. It can be sought using reinforcement learning techniques – in particular Q-learning which does not require a model of the robot-world interaction. Compared to supervised learning, Q-learning examples are triplets (situation, action, Q value), where the Q value is the utility of executing the action in the situation. The supervised learning base is obtained by recruiting the triplets with the highest utility

    Modeling and Simulation of Elementary Robot Behaviors using Associative Memories

    No full text
    International audienceToday, there are several drawbacks that impede the necessary and much needed use of robot learning techniques in real applications. First, the time needed to achieve the synthesis of any behavior is prohibitive. Second, the robot behavior during the learning phase is – by definition – bad, it may even be dangerous. Third, except within the lazy learning approach, a new behavior implies a new learning phase. We propose in this paper to use associative memories (self-organizing maps) to encode the non explicit model of the robot-world interaction sampled by the lazy memory, and then generate a robot behavior by means of situations to be achieved, i.e., points on the self-organizing maps. Any behavior can instantaneously be synthesized by the definition of a goal situation. Its performance will be minimal (not necessarily bad) and will improve by the mere repetition of the behavior

    Distributed Lazy Q-learning for Cooperative Mobile Robots

    No full text
    International audienceCompared to single robot learning, cooperative learning adds the challenge of a much larger search space (combined individual search spaces), awareness of other team members, and also the synthesis of the individual behaviors with respect to the task given to the group. Over the years, reinforcement learning has emerged as the main learning approach in autonomous robotics, and lazy learning has become the leading bias, allowing the reduction of the time required by an experiment to the time needed to test the learned behavior performance. These two approaches have been combined together in what is now called lazy Q-learning, a very efficient single robot learning paradigm. We propose a derivation of this learning to team of robots : the «pessimistic» algorithm able to compute for each team member a lower bound of the utility of executing an action in a given situation. We use the cooperative multi-robot observation of multiple moving targets (CMOMMT) application as an illustrative example, and study the efficiency of the Pessimistic Algorithm in its task of inducing learning of cooperation

    Two steps reinforcement learning

    Get PDF
    When applying reinforcement learning in domains with very large or continuous state spaces, the experience obtained by the learning agent in the interaction with the environment must be generalized. The generalization methods are usually based on the approximation of the value functions used to compute the action policy and tackled in two different ways. On the one hand by using an approximation of the value functions based on a supervized learning method. On the other hand, by discretizing the environment to use a tabular representation of the value functions. In this work, we propose an algorithm that uses both approaches to use the benefits of both mechanisms, allowing a higher performance. The approach is based on two learning phases. In the first one, a learner is used as a supervized function approximator, but using a machine learning technique which also outputs a state space discretization of the environment, such as nearest prototype classifiers or decision trees do. In the second learning phase, the space discretization computed in the first phase is used to obtain a tabular representation of the value function computed in the previous phase, allowing a tuning of such value function approximation. Experiments in different domains show that executing both learning phases improves the results obtained executing only the first one. The results take into account the resources used and the performance of the learned behavior.This research was partially conducted while the firs author was visiting Carnegie Mellon University from the Universidad Carlos III de Madrid, supported by a generous grant from the Spanish Ministry of Education and Fulbright. Both authors were partially sponsored by the Spanish MEC project TIN2005-08945-C06-05 and regional CAM-UC3M project number CCG06-UC3M/TIC-0831.Publicad

    Técnicas de aprendizaje automático para el análisis de datos en aplicaciones financieras: Machine learning techniques for data analysis in financial application

    Get PDF
    This article will present some of the most relevant techniques of machine learning used in different research works to deal with financial applications such as credit evaluation, portfolio management, prediction of markets, stocks or currencies and financial planning in general. The purpose of this document is to present in a general way the machine learning techniques used for these financial applications, making known the conclusions reached in the different research articles about the advantages and advances that have been made. When using these techniques when solving financial problems, the results obtained in each investigation in different financial aspects will not be entered into detail, it only seeks to obtain a broad panorama of the analysis of the use in these aspects of the following machine learning techniques: neural networks, expert systems, intelligence systems hybrid AI, data mining, soft computing techniques and deep learning techniques.En este artículo se presentarán algunas de las técnicas más relevantes del aprendizaje automático (machine learning) utilizadas en diferentes trabajos de investigación para tratar con aplicaciones financieras como evaluación crediticia, gestión de cartera, predicción de mercados, acciones o divisas y planificación financiera en general, la finalidad de este documento es el de presentar de manera general las técnicas de aprendizaje automático usadas para estas aplicaciones financieras, dando a conocer las conclusiones a las que se llegaron en los diferentes artículos de investigación acerca de las ventajas y los avances que se han tenido al utilizar dichas técnicas al solucionar problemas financieros, no se entrara al detalle de los resultados obtenidos en cada investigación en diferentes aspectos financieros, solo se busca obtener un panorama amplio del análisis del uso en dichos aspectos las siguientes técnicas de aprendizaje automático: redes neuronales, sistemas expertos, sistemas de inteligencia hibrida, minería de datos, técnicas de computación blandas y técnicas de aprendizaje profundo (Deep learning)

    DES RESEAUX DE NEURONES ARTIFICIELS A LA ROBOTIQUE COOPERATIVE

    Get PDF
    Les travaux décrits dans ce mémoire rapportent une trajectoire scientifique d’une dizaine d’années constamment guidée par le désir d’étudier et de développer des modèles de réseaux de neurones artificiels en prise directe avec le monde réel. La première partie de nos recherches s’est intéressée à l’apprentissage au sein de systèmes connexionnistes multi-réseaux. En droite ligne depuis le modèle de la Machine Séquentielle Connexionniste (MSC, développée durant la thèse de doctorat), qui met en jeu 2 réseaux multicouches, 6 MSCs sont mises en oeuvre qui permettent l’acquisition et le contrôle de la marche chez un robot hexapode. Le paradigme utilisé pour la distribution des informations nécessaires à chacun des modules connexionnistes est l’apprentissage par pénalité-récompense. Un robot hexapode a été construit qui valide les résulats préalablement obtenus en simulation. L’apprentissage par pénalité-récompense appartient à la classe des apprentissage par renforcement. La seconde partie de nos recherches s’est intéressée à étudier les interactions entre les réseaux de neurones artificiels et l’apprentissage par renforcement. Une implantation sur réseaux multicouches, puis sur cartes auto-organisatrices du Q-learning a été proposée. Nous obtenons ainsi des réductions de la taille mémoire requise et du nombre d’itérations d’apprentissage nécessaires qui autorisent une utilisation pratique. Nous avons ensuite développé des mécanismes de distribution de l’apprentissage par renforcement, soit au sein d’un seul robot doté de plusieurs comportements, soit au sein d’un groupe de robots dans une tâche impliquant la coopération. A la différence des courant de recherches actuels, qui pronent l’utilisation d’ a priori face à la combinatoire élevée de l’espace de recherche, nous proposons l’emploi d’ a posteriori , l’utilisation du “lazy learning” pour construire un modèle non explicite et le développement d’outils et méthodes d’aide à la conception des fonctions de renforcement. A moyen terme, l’objectif de nos recherches est d’automatiser la décomposition d’un comportement robotique complexe en une succession de comportements élémentaires. L’utilisation de marqueurs temporel et spatial est envisagée pour permettre le séquencement des cartes auto-organisatrices implantant les comportements élémentaires. Dans ce cas, la simple définition de l’objectif à atteindre suffirait alors à générer le comportement solution

    LEARNING REFLEXES FOR TELEOPERATED GROUND-BASED RESCUE ROBOTS

    Get PDF
    This thesis presents a system for shared autonomy, where a search and rescue robot uses training data to create a "maintain balance" reflex to enable a robot to autonomously stop, back up, or change configuration to avoid falling over as the operator drives it through rubble. Currently, the operator is responsible for determining if the robot is in an unsafe state and about to fall. Falling over often ends the mission for the robot. With a "maintain balance" reflex, the operator can drive the robot with less risk of falling over. This project required retrofitting an ASR/Inuktun Extreme variable geometry robot with an Analog Devices ADXL335 3-axis accelerometer to provide inputs for a fall classifier optimized by a genetic algorithm. The software system written in C# uses a Subsumption Architecture, where the reflex takes priority over operator commands that place the robot in danger. The developed system was tested over 3 trials, 2 with the NIST Standard Test Method for Response Robots: Mobility: Terrain: Stepfields. 4 variants of the system and a control were compared for effectiveness. Over the 3 trials each variant was tested with 45 starting configurations. Variants of the system demonstrated an 8% decrease in the probability of falling on a simple climbing stepfield, and a 40% decrease on flat terrain. The results show that the primary mechanism for reducing falls is backing up, which shows a 6% improvement over halting in terms of fall probability. The small improvement reflects the lack of agility and sensing limitations of the robot, the data suggests the classification algorithm was an appropriate choice, as it responds to situations not captured by a simple physically based model. This work is expected to be useful for other search and rescue robots; each type of robot would have to be trained using the procedures described in this thesis
    corecore