17 research outputs found

    Learning and Transfer of Modulated Locomotor Controllers

    Get PDF
    We study a novel architecture and training procedure for locomotion tasks. A high-frequency, low-level "spinal" network with access to proprioceptive sensors learns sensorimotor primitives by training on simple tasks. This pre-trained module is fixed and connected to a low-frequency, high-level "cortical" network, with access to all sensors, which drives behavior by modulating the inputs to the spinal network. Where a monolithic end-to-end architecture fails completely, learning with a pre-trained spinal module succeeds at multiple high-level tasks, and enables the effective exploration required to learn from sparse rewards. We test our proposed architecture on three simulated bodies: a 16-dimensional swimming snake, a 20-dimensional quadruped, and a 54-dimensional humanoid. Our results are illustrated in the accompanying video at https://youtu.be/sboPYvhpraQComment: Supplemental video available at https://youtu.be/sboPYvhpra

    Towards Usable End-user Authentication

    Get PDF
    Authentication is the process of validating the identity of an entity, e.g., a person, a machine, etc.; the entity usually provides a proof of identity in order to be authenticated. When the entity - to be authenticated - is a human, the authentication process is called end-user authentication. Making an end-user authentication usable entails making it easy for a human to obtain, manage, and input the proof of identity in a secure manner. In machine-to-machine authentication, both ends have comparable memory and computational power to securely carry out the authentication process using cryptographic primitives and protocols. On the contrary, as a human has limited memory and computational power, in end-user authentication, cryptography is of little use. Although password based end-user authentication has many well-known security and usability problems, it is the de facto standard. Almost half a century of research effort has produced a multitude of end-user authentication methods more sophisticated than passwords; yet, none has come close to replacing passwords. In this dissertation, taking advantage of the built-in sensing capability of smartphones, we propose an end-user authentication framework for smartphones - called ePet - which does not require any active participation from the user most of the times; thus the proposed framework is highly usable. Using data collected from subjects, we validate a part of the authentication framework for the Android platform. For web authentication, in this dissertation, we propose a novel password creation interface, which helps a user remember a newly created password with more confidence - by allowing her to perform various memory tasks built upon her new password. Declarative and motor memory help the user remember and efficiently input a password. From a within-subjects study we show that declarative memory is sufficient for passwords; motor memory mostly facilitate the input process and thus the memory tasks have been designed to help cement the declarative memory for a newly created password. This dissertation concludes with an evaluation of the increased usability of the proposed interface through a between-subjects study

    Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

    Full text link
    Building general-purpose robots that can operate seamlessly, in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. Unfortunately, however, most existing robotic systems have been constrained - having been designed for specific tasks, trained on specific datasets, and deployed within specific environments. These systems usually require extensively-labeled data, rely on task-specific models, have numerous generalization issues when deployed in real-world scenarios, and struggle to remain robust to distribution shifts. Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i.e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of robotics, and also exploring (ii) what a robotics-specific foundation model would look like. We begin by providing an overview of what constitutes a conventional robotic system and the fundamental barriers to making it universally applicable. Next, we establish a taxonomy to discuss current work exploring ways to leverage existing foundation models for robotics and develop ones catered to robotics. Finally, we discuss key challenges and promising future directions in using foundation models for enabling general-purpose robotic systems. We encourage readers to view our living GitHub repository of resources, including papers reviewed in this survey as well as related projects and repositories for developing foundation models for robotics

    The emergence of active perception - seeking conceptual foundations

    Get PDF
    The aim of this thesis is to explain the emergence of active perception. It takes an interdisciplinary approach, by providing the necessary conceptual foundations for active perception research - the key notions that bridge the conceptual gaps remaining in understanding emergent behaviours of active perception in the context of robotic implementations. On the one hand, the autonomous agent approach to mobile robotics claims that perception is active. On the other hand, while explanations of emergence have been extensively pursued in Artificial Life, these explanations have not yet successfully accounted for active perception.The main question dealt with in this thesis is how active perception systems, as behaviour -based autonomous systems, are capable of providing relatively optimal perceptual guidance in response to environmental challenges, which are somewhat unpredictable. The answer is: task -level emergence on grounds of complicatedly combined computational strategies, but this notion needs further explanation.To study the computational strategies undertaken in active perception re- search, the thesis surveys twelve implementations. On the basis of the surveyed implementations, discussions in this thesis show that the perceptual task executed in support of bodily actions does not arise from the intentionality of a homuncu- lus, but is identified automatically on the basis of the dynamic small mod- ules of particular robotic architectures. The identified tasks are accomplished by quasi -functional modules and quasi- action modules, which maintain transformations of perceptual inputs, compute critical variables, and provide guidance of sensory -motor movements to the most relevant positions for fetching further needed information. Given the nature of these modules, active perception emerges in a different fashion from the global behaviour seen in other autonomous agent research.The quasi- functional modules and quasi- action modules cooperate by estimating the internal cohesion of various sources of information in support of the envisaged task. Specifically, such modules basically reflect various computational facilities for a species to single out the most important characteristics of its ecological niche. These facilities help to achieve internal cohesion, by maintaining a stepwise evaluation over the previously computed information, the required task, and the most relevant features presented in the environment.Apart from the above exposition of active perception, the process of task - level emergence is understood with certain principles extracted from four models of life origin. First, the fundamental structure of active perception is identified as the stepwise computation. Second, stepwise computation is promoted from baseline to elaborate patterns, i.e. from a simple system to a combinatory system. Third, a core requirement for all stepwise computational processes is the comparison between collected and needed information in order to insure the contribution to the required task. Interestingly, this point indicates that active perception has an inherent pragmatist dimension.The understanding of emergence in the present thesis goes beyond the distinc- tion between external processes and internal representations, which some current philosophers argue is required to explain emergence. The additional factors are links of various knowledge sources, in which the role of conceptual foundations is two -fold. On the one hand, those conceptual foundations elucidate how various knowledge sources can be linked. On the other, they make possible an interdisci- plinary view of emergence. Given this two -fold role, this thesis shows the unity of task -level emergence. Thus, the thesis demonstrates a cooperation between sci- ence and philosophy for the purpose of understanding the integrity of emergent cognitive phenomena

    Grounded Semantic Reasoning for Robotic Interaction with Real-World Objects

    Get PDF
    Robots are increasingly transitioning from specialized, single-task machines to general-purpose systems that operate in unstructured environments, such as homes, offices, and warehouses. In these real-world domains, robots need to manipulate novel objects while adapting to changes in environments and goals. Semantic knowledge, which concisely describes target domains with symbols, can potentially reveal the meaningful patterns shared between problems and environments. However, existing robots are yet to effectively reason about semantic data encoding complex relational knowledge or jointly reason about symbolic semantic data and multimodal data pertinent to robotic manipulation (e.g., object point clouds, 6-DoF poses, and attributes detected with multimodal sensing). This dissertation develops semantic reasoning frameworks capable of modeling complex semantic knowledge grounded in robot perception and action. We show that grounded semantic reasoning enables robots to more effectively perceive, model, and interact with objects in real-world environments. Specifically, this dissertation makes the following contributions: (1) a survey providing a unified view for the diversity of works in the field by formulating semantic reasoning as the integration of knowledge sources, computational frameworks, and world representations; (2) a method for predicting missing relations in large-scale knowledge graphs by leveraging type hierarchies of entities, effectively avoiding ambiguity while maintaining generalization of multi-hop reasoning patterns; (3) a method for predicting unknown properties of objects in various environmental contexts, outperforming prior knowledge graph and statistical relational learning methods due to the use of n-ary relations for modeling object properties; (4) a method for purposeful robotic grasping that accounts for a broad range of contexts (including object visual affordance, material, state, and task constraint), outperforming existing approaches in novel contexts and for unknown objects; (5) a systematic investigation into the generalization of task-oriented grasping that includes a benchmark dataset of 250k grasps, and a novel graph neural network that incorporates semantic relations into end-to-end learning of 6-DoF grasps; (6) a method for rearranging novel objects into semantically meaningful spatial structures based on high-level language instructions, more effectively capturing multi-object spatial constraints than existing pairwise spatial representations; (7) a novel planning-inspired approach that iteratively optimizes placements of partially observed objects subject to both physical constraints and semantic constraints inferred from language instructions.Ph.D

    Action-oriented Scene Understanding

    Get PDF
    In order to allow robots to act autonomously it is crucial that they do not only describe their environment accurately but also identify how to interact with their surroundings. While we witnessed tremendous progress in descriptive computer vision, approaches that explicitly target action are scarcer. This cumulative dissertation approaches the goal of interpreting visual scenes “in the wild” with respect to actions implied by the scene. We call this approach action-oriented scene understanding. It involves identifying and judging opportunities for interaction with constituents of the scene (e.g. objects and their parts) as well as understanding object functions and how interactions will impact the future. All of these aspects are addressed on three levels of abstraction: elements, perception and reasoning. On the elementary level, we investigate semantic and functional grouping of objects by analyzing annotated natural image scenes. We compare object label-based and visual context definitions with respect to their suitability for generating meaningful object class representations. Our findings suggest that representations generated from visual context are on-par in terms of semantic quality with those generated from large quantities of text. The perceptive level concerns action identification. We propose a system to identify possible interactions for robots and humans with the environment (affordances) on a pixel level using state-of-the-art machine learning methods. Pixel-wise part annotations of images are transformed into 12 affordance maps. Using these maps, a convolutional neural network is trained to densely predict affordance maps from unknown RGB images. In contrast to previous work, this approach operates exclusively on RGB images during both, training and testing, and yet achieves state-of-the-art performance. At the reasoning level, we extend the question from asking what actions are possible to what actions are plausible. For this, we gathered a dataset of household images associated with human ratings of the likelihoods of eight different actions. Based on the judgement provided by the human raters, we train convolutional neural networks to generate plausibility scores from unseen images. Furthermore, having considered only static scenes previously in this thesis, we propose a system that takes video input and predicts plausible future actions. Since this requires careful identification of relevant features in the video sequence, we analyze this particular aspect in detail using a synthetic dataset for several state-of-the-art video models. We identify feature learning as a major obstacle for anticipation in natural video data. The presented projects analyze the role of action in scene understanding from various angles and in multiple settings while highlighting the advantages of assuming an action-oriented perspective. We conclude that action-oriented scene understanding can augment classic computer vision in many real-life applications, in particular robotics

    Haptic training for a visuomotor fetch & pursue task

    No full text
    Can haptic interaction improve the tracking performance in a fetch & pursue task, similar to clay pigeon shooting? In order to answer this question, we challenged the tracking movements of the subjects by a saddle-like moving force field, with the unstable manifold aligned along the moving target and the stable manifold orthogonal to it. The experimental results show a positive effect, suggesting that the internal model acquired by the subjects for compensating the target-linked haptic disturbance can improve the prediction capability of the subjects based on pure visuo-motor feedback

    Xylo-Bot: A Therapeutic Robot-Based Music Platform for Children with Autism

    Get PDF
    Children with Autism Spectrum Disorder (ASD) experience deficits in verbal and nonverbal communication skills, including motor control, emotional facial expressions, and eye gaze / joint attention. This Ph.D. dissertation focuses on studying the feasibility and effectiveness of using a social robot, called NAO, and a toy music instrument, xylophone, at modeling and improving the social responses and behaviors of children with ASD. In our investigation, we designed an autonomous social interactive music teaching system to fulfill this mission. A novel modular robot-music teaching system consisting of three modules is presented. Module 1 provides an autonomous self-awareness positioning system for the robot to localize the instrument and make a micro adjustment for the arm joints to play the note bars properly. Module 2 allows the robot to be able to play any customized song per user’s request. This design provides an opportunity to translate songs into C-major or a-minor with a set of hexadecimal numbers without music experience. After the music score converted robot should be able to play it immediately. Module 3 is designed for providing real-life music teaching experience for the users. Two key features of this module are a) music detection and b) smart scoring and feedback . Short-time Fourier transform and Levenshtein distance are adapted to fulfill the design requirements, which allow the robot to understand music and provide a proper dosage of practice and oral feedback to users. A new instrument has designed to present better emotions from music due to the limitation of the original xylophone. This new programmable xylophone can provide a more extensive frequency range of notes, easily switch between the Major and Minor keys, extensively easy to control, and have fun with it as an advanced music instrument. Because our initial intention has been to study emotion in children with autism, an automated method for emotion classification in children using electrodermal activity (EDA) signals. The time-frequency analysis of the acquired raw EDAs provides a feature space based on which different emotions can be recognized. To this end, the complex Morlet (C-Morlet) wavelet function is applied to the recorded EDA signals. The dataset used in this research includes a set of multimodal recordings of social and communicative behavior as well as EDA recordings of 100 children younger than 30 months old. The dataset is annotated by two experts to extract the time sequence corresponding to three primary emotions, including “Joy”, “Boredom”, and “Acceptance”. Various experiments are conducted on the annotated EDA signals to classify emotions using a support vector machine (SVM) classifier. The quantitative results show that emotion classification performance remarkably improves compared to other methods when the proposed wavelet-based features are used. By using this emotion classification, emotion engagement during sessions, and feelings between different music can be detected after data analysis. NAO music education platform will be thought-about as a decent tool to facilitate improving fine motor control, turn-taking skills, and social activities engagement. Most of the ASD youngsters began to develop the strike movement within the two initial intervention sessions; some even mastered the motor ability throughout the early events. More than half of the subjects could dominate proper turn-taking after few sessions. Music teaching is a good example for accomplishing social skill tasks by taking advantage of customized songs selected by individuals. According to researcher and video annotator, majority of the subjects showed high level of engagement for all music game activities, especially with the free play mode. Based on the conversation and music performance with NAO, subjects showed strong interest in challenging the robot with a friendly way
    corecore