31 research outputs found
Information theoretic stochastic search
The MAP-i Doctoral Programme in Informatics, of the Universities of Minho, Aveiro and PortoOptimization is the research field that studies the design of algorithms for finding the
best solutions to problems we may throw at them. While the whole domain is practically
important, the present thesis will focus on the subfield of continuous black-box
optimization, presenting a collection of novel, state-of-the-art algorithms for solving
problems in that class. In this thesis, we introduce two novel general-purpose
stochastic search algorithms for black box optimisation. Stochastic search algorithms
aim at repeating the type of mutations that led to fittest search points in a population.
We can model those mutations by a stochastic distribution. Typically the stochastic
distribution is modelled as a multivariate Gaussian distribution. The key idea is to
iteratively change the parameters of the distribution towards higher expected fitness.
However we leverage information theoretic trust regions and limit the change of the
new distribution. We show how plain maximisation of the fitness expectation without
bounding the change of the distribution is destined to fail because of overfitting
and the results in premature convergence. Being derived from first principles, the
proposed methods can be elegantly extended to contextual learning setting which allows
for learning context dependent stochastic distributions that generates optimal
individuals for a given context, i.e, instead of learning one task at a time, we can
learn multiple related tasks at once. However, the search distribution typically uses
a parametric model using some hand-defined context features. Finding good context
features is a challenging task, and hence, non-parametric methods are often preferred
over their parametric counter-parts. Therefore, we further propose a non-parametric
contextual stochastic search algorithm that can learn a non-parametric search distribution
for multiple tasks simultaneously.Otimização é área de investigação que estuda o projeto de algoritmos para encontrar
as melhores soluções, tendo em conta um conjunto de critérios, para problemas
complexos. Embora todo o domínio de otimização tenha grande importância,
este trabalho está focado no subcampo da otimização contínua de caixa preta,
apresentando uma coleção de novos algoritmos novos de última geração para resolver
problemas nessa classe. Nesta tese, apresentamos dois novos algoritmos de
pesquisa estocástica de propósito geral para otimização de caixa preta. Os algoritmos
de pesquisa estocástica visam repetir o tipo de mutações que levaram aos
melhores pontos de pesquisa numa população. Podemos modelar essas mutações
por meio de uma distribuição estocástica e, tipicamente, a distribuição estocástica
é modelada como uma distribuição Gaussiana multivariada. A ideia chave é mudar
iterativamente os parâmetros da distribuição incrementando a avaliação. No entanto,
alavancamos as regiões de confiança teóricas de informação e limitamos a mudança
de distribuição. Deste modo, demonstra-se como a maximização simples da expectativa
de “fitness”, sem limites da mudança da distribuição, está destinada a falhar
devido ao “overfitness” e à convergência prematura resultantes. Sendo derivado dos
primeiros princípios, as abordagens propostas podem ser ampliadas, de forma elegante,
para a configuração de aprendizagem contextual que permite a aprendizagem
de distribuições estocásticas dependentes do contexto que geram os indivíduos ideais
para um determinado contexto. No entanto, a distribuição de pesquisa geralmente usa
um modelo paramétrico linear em algumas das características contextuais definidas
manualmente. Encontrar uma contextos bem definidos é uma tarefa desafiadora e,
portanto, os métodos não paramétricos são frequentemente preferidos em relação às
seus semelhantes paramétricos. Portanto, propomos um algoritmo não paramétrico
de pesquisa estocástica contextual que possa aprender uma distribuição de pesquisa
não-paramétrica para várias tarefas simultaneamente.FCT - Fundação para a Ciência e a Tecnologia. As well as fundings by European Union’s
FP7 under EuRoC grant agreement CP-IP 608849 and by LIACC (UID/CEC/00027/2015)
and IEETA (UID/CEC/00127/2015)
A survey on policy search algorithms for learning robot controllers in a handful of trials
Most policy search algorithms require thousands of training episodes to find
an effective policy, which is often infeasible with a physical robot. This
survey article focuses on the extreme other end of the spectrum: how can a
robot adapt with only a handful of trials (a dozen) and a few minutes? By
analogy with the word "big-data", we refer to this challenge as "micro-data
reinforcement learning". We show that a first strategy is to leverage prior
knowledge on the policy structure (e.g., dynamic movement primitives), on the
policy parameters (e.g., demonstrations), or on the dynamics (e.g.,
simulators). A second strategy is to create data-driven surrogate models of the
expected reward (e.g., Bayesian optimization) or the dynamical model (e.g.,
model-based policy search), so that the policy optimizer queries the model
instead of the real system. Overall, all successful micro-data algorithms
combine these two strategies by varying the kind of model and prior knowledge.
The current scientific challenges essentially revolve around scaling up to
complex robots (e.g., humanoids), designing generic priors, and optimizing the
computing time.Comment: 21 pages, 3 figures, 4 algorithms, accepted at IEEE Transactions on
Robotic
Humanoid Robots
For many years, the human being has been trying, in all ways, to recreate the complex mechanisms that form the human body. Such task is extremely complicated and the results are not totally satisfactory. However, with increasing technological advances based on theoretical and experimental researches, man gets, in a way, to copy or to imitate some systems of the human body. These researches not only intended to create humanoid robots, great part of them constituting autonomous systems, but also, in some way, to offer a higher knowledge of the systems that form the human body, objectifying possible applications in the technology of rehabilitation of human beings, gathering in a whole studies related not only to Robotics, but also to Biomechanics, Biomimmetics, Cybernetics, among other areas. This book presents a series of researches inspired by this ideal, carried through by various researchers worldwide, looking for to analyze and to discuss diverse subjects related to humanoid robots. The presented contributions explore aspects about robotic hands, learning, language, vision and locomotion
A survey on policy search algorithms for learning robot controllers in a handful of trials
International audienceMost policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word “big-data,” we refer to this challenge as “micro-data reinforcement learning.” In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time
Semantic Robot Programming for Taskable Goal-Directed Manipulation
Autonomous robots have the potential to assist people to be more productive in factories, homes, hospitals, and similar environments. Unlike traditional industrial robots that are pre-programmed for particular tasks in controlled environments, modern autonomous robots should be able to perform arbitrary user-desired tasks. Thus, it is beneficial to provide pathways to enable users to program an arbitrary robot to perform an arbitrary task in an arbitrary world. Advances in robot Programming by Demonstration (PbD) has made it possible for end-users to program robot behavior for performing desired tasks through demonstrations. However, it still remains a challenge for users to program robot behavior in a generalizable, performant, scalable, and intuitive manner.
In this dissertation, we address the problem of robot programming by demonstration in a declarative manner by introducing the concept of Semantic Robot Programming (SRP). In SRP, we focus on addressing the following challenges for robot PbD: 1) generalization across robots, tasks, and worlds, 2) robustness under partial observations of cluttered scenes, 3) efficiency in task performance as the workspace scales up, and 4) feasibly intuitive modalities of interaction for end-users to demonstrate tasks to robots.
Through SRP, our objective is to enable an end-user to intuitively program a mobile manipulator by providing a workspace demonstration of the desired goal scene. We use a scene graph to semantically represent conditions on the current and goal states of the world. To estimate the scene graph given raw sensor observations, we bring together discriminative object detection and generative state estimation for the inference of object classes and poses. The proposed scene estimation method outperformed the state of the art in cluttered scenes. With SRP, we successfully enabled users to program a Fetch robot to set up a kitchen tray on a cluttered tabletop in 10 different start and goal settings.
In order to scale up SRP from tabletop to large scale, we propose Contextual-Temporal Mapping (CT-Map) for semantic mapping of large scale scenes given streaming sensor observations. We model the semantic mapping problem via a Conditional Random Field (CRF), which accounts for spatial dependencies between objects. Over time, object poses and inter-object spatial relations can vary due to human activities. To deal with such dynamics, CT-Map maintains the belief over object classes and poses across an observed environment. We present CT-Map semantically mapping cluttered rooms with robustness to perceptual ambiguities, demonstrating higher accuracy on object detection and 6 DoF pose estimation compared to state-of-the-art neural network-based object detector and commonly adopted 3D registration methods.
Towards SRP at the building scale, we explore notions of Generalized Object Permanence (GOP) for robots to search for objects efficiently. We state the GOP problem as the prediction of where an object can be located when it is not being directly observed by a robot. We model object permanence via a factor graph inference model, with factors representing long-term memory, short-term memory, and common sense knowledge over inter-object spatial relations. We propose the Semantic Linking Maps (SLiM) model to maintain the belief over object locations while accounting for object permanence through a CRF. Based on the belief maintained by SLiM, we present a hybrid object search strategy that enables the Fetch robot to actively search for objects on a large scale, with a higher search success rate and less search time compared to state-of-the-art search methods.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155073/1/zengzhen_1.pd
Incorporating Human Expertise in Robot Motion Learning and Synthesis
With the exponential growth of robotics and the fast development of their advanced cognitive and motor capabilities, one can start to envision humans and robots jointly working together in unstructured environments. Yet, for that to be possible, robots need to be programmed for such types of complex scenarios, which demands significant domain knowledge in robotics and control. One viable approach to enable robots to acquire skills in a more flexible and efficient way is by giving them the capabilities of autonomously learn from human demonstrations and expertise through interaction. Such framework helps to make the creation of skills in robots more social and less demanding on programing and robotics expertise. Yet, current imitation learning approaches suffer from significant limitations, mainly about the flexibility and efficiency for representing, learning and reasoning about motor tasks. This thesis addresses this problem by exploring cost-function-based approaches to learning robot motion control, perception and the interplay between them. To begin with, the thesis proposes an efficient probabilistic algorithm to learn an impedance controller to accommodate motion contacts. The learning algorithm is able to incorporate important domain constraints, e.g., about force representation and decomposition, which are nontrivial to handle by standard techniques. Compliant handwriting motions are developed on an articulated robot arm and a multi-fingered hand. This work provides a flexible approach to learn robot motion conforming to both task and domain constraints. Furthermore, the thesis also contributes with techniques to learn from and reason about demonstrations with partial observability. The proposed approach combines inverse optimal control and ensemble methods, yielding a tractable learning of cost functions with latent variables. Two task priors are further incorporated. The first human kinematics prior results in a model which synthesizes rich and believable dynamical handwriting. The latter prior enforces dynamics on the latent variable and facilitates a real-time human intention cognition and an on-line motion adaptation in collaborative robot tasks. Finally, the thesis establishes a link between control and perception modalities. This work offers an analysis that bridges inverse optimal control and deep generative model, as well as a novel algorithm that learns cost features and embeds the modal coupling prior. This work contributes an end-to-end system for synthesizing arm joint motion from letter image pixels. The results highlight its robustness against noisy and out-of-sample sensory inputs. Overall, the proposed approach endows robots the potential to reason about diverse unstructured data, which is nowadays pervasive but hard to process for current imitation learning
The Meaning of Action:a review on action recognition and mapping
In this paper, we analyze the different approaches taken to date within the computer vision, robotics and artificial intelligence communities for the representation, recognition, synthesis and understanding of action. We deal with action at different levels of complexity and provide the reader with the necessary related literature references. We put the literature references further into context and outline a possible interpretation of action by taking into account the different aspects of action recognition, action synthesis and task-level planning
Hierarchical relative entropy policy search
Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that
are strongly structured. Such task structures can be exploited by incorporating hierarchical policies
that consist of gating networks and sub-policies. However, this concept has only been partially explored
for real world settings and complete methods, derived from first principles, are needed. Real
world settings are challenging due to large and continuous state-action spaces that are prohibitive
for exhaustive sampling methods. We define the problem of learning sub-policies in continuous
state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to
select the low-level sub-policies for execution by the agent. In order to efficiently share experience
with all sub-policies, also called inter-policy learning, we treat these sub-policies as latent variables
which allows for distribution of the update information between the sub-policies. We present three
different variants of our algorithm, designed to be suitable for a wide variety of real world robot
learning tasks and evaluate our algorithms in two real robot learning scenarios as well as several
simulations and comparisons
Recommended from our members
Multilayered skill learning and movement coordination for autonomous robotic agents
With advances in technology expanding the capabilities of robots, while at the same time making robots cheaper to manufacture, robots are rapidly becoming more prevalent in both industrial and domestic settings. An increase in the number of robots, and the likely subsequent decrease in the ratio of people currently trained to directly control the robots, engenders a need for robots to be able to act autonomously. Larger numbers of robots present together provide new challenges and opportunities for developing complex autonomous robot behaviors capable of multirobot collaboration and coordination.
The focus of this thesis is twofold. The first part explores applying machine learning techniques to teach simulated humanoid robots skills such as how to move or walk and manipulate objects in their environment. Learning is performed using reinforcement learning policy search methods, and layered learning methodologies are employed during the learning process in which multiple lower level skills are incrementally learned and combined with each other to develop richer higher level skills. By incrementally learning skills in layers such that new skills are learned in the presence of previously learned skills, as opposed to individually in isolation, we ensure that the learned skills will work well together and can be combined to perform complex behaviors (e.g. playing soccer). The second part of the thesis centers on developing algorithms to coordinate the movement and efforts of multiple robots working together to quickly complete tasks. These algorithms prioritize minimizing the makespan, or time for all robots to complete a task, while also attempting to avoid interference and collisions among the robots. An underlying objective of this research is to develop techniques and methodologies that allow autonomous robots to robustly interact with their environment (through skill learning) and with each other (through movement coordination) in order to perform tasks and accomplish goals asked of them.
The work in this thesis is implemented and evaluated in the RoboCup 3D simulation soccer domain, and has been a key component of the UT Austin Villa team winning the RoboCup 3D simulation league world championship six out of the past seven years.Computer Science