1,728 research outputs found
Semantic Robot Programming for Taskable Goal-Directed Manipulation
Autonomous robots have the potential to assist people to be more productive in factories, homes, hospitals, and similar environments. Unlike traditional industrial robots that are pre-programmed for particular tasks in controlled environments, modern autonomous robots should be able to perform arbitrary user-desired tasks. Thus, it is beneficial to provide pathways to enable users to program an arbitrary robot to perform an arbitrary task in an arbitrary world. Advances in robot Programming by Demonstration (PbD) has made it possible for end-users to program robot behavior for performing desired tasks through demonstrations. However, it still remains a challenge for users to program robot behavior in a generalizable, performant, scalable, and intuitive manner.
In this dissertation, we address the problem of robot programming by demonstration in a declarative manner by introducing the concept of Semantic Robot Programming (SRP). In SRP, we focus on addressing the following challenges for robot PbD: 1) generalization across robots, tasks, and worlds, 2) robustness under partial observations of cluttered scenes, 3) efficiency in task performance as the workspace scales up, and 4) feasibly intuitive modalities of interaction for end-users to demonstrate tasks to robots.
Through SRP, our objective is to enable an end-user to intuitively program a mobile manipulator by providing a workspace demonstration of the desired goal scene. We use a scene graph to semantically represent conditions on the current and goal states of the world. To estimate the scene graph given raw sensor observations, we bring together discriminative object detection and generative state estimation for the inference of object classes and poses. The proposed scene estimation method outperformed the state of the art in cluttered scenes. With SRP, we successfully enabled users to program a Fetch robot to set up a kitchen tray on a cluttered tabletop in 10 different start and goal settings.
In order to scale up SRP from tabletop to large scale, we propose Contextual-Temporal Mapping (CT-Map) for semantic mapping of large scale scenes given streaming sensor observations. We model the semantic mapping problem via a Conditional Random Field (CRF), which accounts for spatial dependencies between objects. Over time, object poses and inter-object spatial relations can vary due to human activities. To deal with such dynamics, CT-Map maintains the belief over object classes and poses across an observed environment. We present CT-Map semantically mapping cluttered rooms with robustness to perceptual ambiguities, demonstrating higher accuracy on object detection and 6 DoF pose estimation compared to state-of-the-art neural network-based object detector and commonly adopted 3D registration methods.
Towards SRP at the building scale, we explore notions of Generalized Object Permanence (GOP) for robots to search for objects efficiently. We state the GOP problem as the prediction of where an object can be located when it is not being directly observed by a robot. We model object permanence via a factor graph inference model, with factors representing long-term memory, short-term memory, and common sense knowledge over inter-object spatial relations. We propose the Semantic Linking Maps (SLiM) model to maintain the belief over object locations while accounting for object permanence through a CRF. Based on the belief maintained by SLiM, we present a hybrid object search strategy that enables the Fetch robot to actively search for objects on a large scale, with a higher search success rate and less search time compared to state-of-the-art search methods.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155073/1/zengzhen_1.pd
The memristive artificial neuron high level architecture for biologically inspired robotic systems
© 2017 IEEE. In this paper we propose a new hardware architecture for the implementation of an artificial neuron based on organic memristive elements and operational amplifiers. This architecture is proposed as a possible solution for the integration and deployment of the cluster based bio- realistic simulation of a mammalian brain into a robotic system. Originally, this simulation has been developed through a neuro-biologically inspired cognitive architecture (NeuCogAr) re-implementing basic emotional states or affects in a computational system. This way, the dopamine, serotonin and noradrenaline pathways developed in NeuCogAr are synthesized through hardware memristors suitable for the implementation of basic emotional states or affects on a biologically inspired robotic system
Recommended from our members
Building expert systems: cognitive emulation.
Chapter 1 briefly introduces the concept of cognitive emulation, and outlines its current status. Chapter 2 reviews psychological research on human expert thinking. First, the study of expert thinking is placed in the context of modern cognitive psychology. Next, the principal methods and techniques employed by psychologists examining expert cognition are examined. The remainder of the chapter is given over to a review of the published literature on the nature and development of human expertise. Chapter 3 reviews the main arguments for and against cognitive emulation in expert system design. The tentative conclusion reached is that a significant degree of emulation is inevitable, but that a pure, unselective strategy of emulation is neither realistic nor desirable. Chapter 4 examines the prospects for cognitive emulation from a more pragmatic angle. Several factors are identified that represent constraints on the usefulness of a cognitive approach. However, a second set of factors is identified which should facilitate an emulation strategy - especially in the longer term. Some guidance is given on when to seriously consider adopting an emulation strategy. Chapter 5 presents a critical survey of expert system research that has already addressed the emulation issue. Six basic approaches to cognitive emulation are distinguished and evaluated. This helps draw out in more detail the implications of an emulation strategy for knowledge acquisition, knowledge representation and system architecture. The chapter concludes by discussing the issues that arise when different approaches to emulation are combined. Some guidance is offered on how this might be achieved. Chapter 6 summarizes the main themes and issues to have emerged, the design advice contained in the thesis, and the original contributions made by the thesis
SEPEC conference proceedings: Hypermedia and Information Reconstruction. Aerospace applications and research directions
Papers presented at the conference on hypermedia and information reconstruction are compiled. The following subject areas are covered: real-world hypermedia projects, aerospace applications, and future directions in hypermedia research and development
Freeform User Interfaces for Graphical Computing
報告番号: 甲15222 ; 学位授与年月日: 2000-03-29 ; 学位の種別: 課程博士 ; 学位の種類: 博士(工学) ; 学位記番号: 博工第4717号 ; 研究科・専攻: 工学系研究科情報工学専
From Line Drawings to Human Actions: Deep Neural Networks for Visual Data Representation
In recent years, deep neural networks have been very successful
in computer vision, speech recognition, and artificial
intelligent systems. The rapid growth of data and fast increasing
computational tools provide solid foundations for the
applications which rely on the learning of large scale deep
neural networks with millions of parameters. The deep learning
approaches have been proved to be able to learn powerful
representations of the inputs in various tasks, such as image
classification, object recognition, and scene understanding. This
thesis demonstrates the generality and capacity of deep learning
approaches through a series of case studies including image
matching and human activity understanding. In these studies, I
explore the combinations of the neural network models with
existing machine learning techniques and extend the deep learning
approach for each task. Four related tasks are investigated: 1)
image matching through similarity learning; 2) human action
prediction; 3) finger force estimation in manipulation actions;
and 4) bimodal learning for human action understanding.
Deep neural networks have been shown to be very efficient in
supervised learning. Further, in some tasks, one would like to
group the features of the samples in the same category close to
each other, in additional to the discriminative representation.
Such kind of properties is desired in a number of applications,
such as semantic retrieval, image quality measurement, and social
network analysis, etc. My first study is to develop a similarity
learning method based on deep neural networks for image matching
between sketch images and 3D models. In this task, I propose to
use Siamese network to learn similarities of sketches and develop
a novel method for sketch based 3D shape retrieval. The proposed
method can successfully learn the representations of sketch
images as well as the similarities, then the 3D shape retrieval
problem can be solved with off-the-shelf nearest neighbor
methods.
After studying the representation learning methods for static
inputs, my focus turns to learning the representations of
sequential data. To be specific, I focus on manipulation actions,
because they are widely used in the daily life and play important
parts in the human-robot collaboration system. Deep neural
networks have been shown to be powerful to represent short video
clips [Donahue et al., 2015]. However, most existing methods
consider the action recognition problem as a classification task.
These methods assume the inputs are pre-segmented videos and the
outputs are category labels. In the scenarios such as the
human-robot collaboration system, the ability to predict the
ongoing human actions at an early stage is highly important. I
first attempt to address this issue with a fast manipulation
action prediction method. Then I build the action prediction
model based on Long Short-Term Memory (LSTM) architecture. The
proposed approach processes the sequential inputs as continuous
signals and keeps updating the prediction of the intended action
based on the learned action representations.
Further, I study the relationships between visual inputs and the
physical information, such as finger forces, that involved in the
manipulation actions. This is motivated by recent studies in
cognitive science which show that the subject’s intention is
strongly related to the hand movements during an action
execution. Human observers can interpret other’s actions in
terms of movements and forces, which can be used to repeat the
observed actions. If a robot system has the ability to estimate
the force feedbacks, it can learn how to manipulate an object by
watching human demonstrations. In this work, the finger forces
are estimated by only watching the movement of hands. A modified
LSTM model is used to regress the finger forces from video
frames. To facilitate this study, a specially designed sensor
glove has been used to collect data of finger forces, and a new
dataset has been collected to provide synchronized streams of
videos and finger forces.
Last, I investigate the usefulness of physical information in
human action recognition, which is an application of bimodal
learning, where both the vision inputs and the additional
information are used to learn the action representation. My study
demonstrates that, by combining additional information with the
vision inputs, the accuracy of human action recognition can be
improved steadily. I extend the LSTM architecture to accept both
video frames and sensor data as bimodal inputs to predict the
action. A hallucination network is jointly trained to approximate
the representations of the additional inputs. During the testing
stage, the hallucination network generates approximated
representations that used for classification. In this way, the
proposed method does not rely on the additional inputs for
testing
- …