2,717 research outputs found

    Robots that Say ‘No’. Affective Symbol Grounding and the Case of Intent Interpretations

    Get PDF
    © 2017 IEEE. This article has been accepted for publication in a forthcoming issue of IEEE Transactions on Cognitive and Developmental Systems. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Modern theories on early child language acquisition tend to focus on referential words, mostly nouns, labeling concrete objects, or physical properties. In this experimental proof-of-concept study, we show how nonreferential negation words, typically belonging to a child's first ten words, may be acquired. A child-like humanoid robot is deployed in speech-wise unconstrained interaction with naïve human participants. In agreement with psycholinguistic observations, we corroborate the hypothesis that affect plays a pivotal role in the socially distributed acquisition process where the adept conversation partner provides linguistic interpretations of the affective displays of the less adept speaker. Negation words are prosodically salient within intent interpretations that are triggered by the learner's display of affect. From there they can be picked up and used by the budding language learner which may involve the grounding of these words in the very affective states that triggered them in the first place. The pragmatic analysis of the robot's linguistic performance indicates that the correct timing of negative utterances is essential for the listener to infer the meaning of otherwise ambiguous negative utterances. In order to assess the robot's performance thoroughly comparative data from psycholinguistic studies of parent-child dyads is needed highlighting the need for further interdisciplinary work.Peer reviewe

    Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping

    Full text link
    Grasping objects by a specific part is often crucial for safety and for executing downstream tasks. Yet, learning-based grasp planners lack this behavior unless they are trained on specific object part data, making it a significant challenge to scale object diversity. Instead, we propose LERF-TOGO, Language Embedded Radiance Fields for Task-Oriented Grasping of Objects, which uses vision-language models zero-shot to output a grasp distribution over an object given a natural language query. To accomplish this, we first reconstruct a LERF of the scene, which distills CLIP embeddings into a multi-scale 3D language field queryable with text. However, LERF has no sense of objectness, meaning its relevancy outputs often return incomplete activations over an object which are insufficient for subsequent part queries. LERF-TOGO mitigates this lack of spatial grouping by extracting a 3D object mask via DINO features and then conditionally querying LERF on this mask to obtain a semantic distribution over the object with which to rank grasps from an off-the-shelf grasp planner. We evaluate LERF-TOGO's ability to grasp task-oriented object parts on 31 different physical objects, and find it selects grasps on the correct part in 81% of all trials and grasps successfully in 69%. See the project website at: lerftogo.github.ioComment: See the project website at: lerftogo.github.i

    From Biological to Synthetic Neurorobotics Approaches to Understanding the Structure Essential to Consciousness (Part 3)

    Get PDF
    This third paper locates the synthetic neurorobotics research reviewed in the second paper in terms of themes introduced in the first paper. It begins with biological non-reductionism as understood by Searle. It emphasizes the role of synthetic neurorobotics studies in accessing the dynamic structure essential to consciousness with a focus on system criticality and self, develops a distinction between simulated and formal consciousness based on this emphasis, reviews Tani and colleagues' work in light of this distinction, and ends by forecasting the increasing importance of synthetic neurorobotics studies for cognitive science and philosophy of mind going forward, finally in regards to most- and myth-consciousness

    An investigation of fast and slow mapping

    Get PDF
    Children learn words astonishingly skilfully. Even infants can reliably “fast map” novel category labels to their referents without feedback or supervision (Carey & Bartlett, 1978; Houston-Price, Plunkett, & Harris, 2005). Using both empirical and neural network modelling methods this thesis presents an examination of both the fast and slow mapping phases of children's early word learning in the context of object and action categorisation. A series of empirical experiments investigates the relationship between within-category perceptual variability on two-year-old children’s ability to learn labels for novel categories of objects and actions. Results demonstrate that variability profoundly affects both noun and verb learning. A review paper situates empirical word learning research in the context of recent advances in the application of computational models to developmental research. Data from the noun experiments are then simulated using a Dynamic Neural Field (DNF) model (see Spencer & Schöner, 2009), suggesting that children’s early object categories can emerge dynamically from simple label-referent associations strengthened over time. Novel predictions generated by the model are replicated empirically, providing proofof- concept for the use of DNF models in simulations of word learning, as well emphasising the strong featural basis of early categorisation. The noun data are further explored using a connectionist architecture (Morse, de Greef, Belpaeme & Cangelosi, 2010) in a robotic system, providing the groundwork for future research in cognitive robotics. The implications of these different approaches to cognitive modelling are discussed, situating the current work firmly in the dynamic systems tradition whilst emphasising the value of interdisciplinary research in motivating novel research paradigms

    Uni-Fusion: Universal Continuous Mapping

    Full text link
    We present Uni-Fusion, a universal continuous mapping framework for surfaces, surface properties (color, infrared, etc.) and more (latent features in CLIP embedding space, etc.). We propose the first universal implicit encoding model that supports encoding of both geometry and different types of properties (RGB, infrared, features, etc.) without requiring any training. Based on this, our framework divides the point cloud into regular grid voxels and generates a latent feature in each voxel to form a Latent Implicit Map (LIM) for geometries and arbitrary properties. Then, by fusing a local LIM frame-wisely into a global LIM, an incremental reconstruction is achieved. Encoded with corresponding types of data, our Latent Implicit Map is capable of generating continuous surfaces, surface property fields, surface feature fields, and all other possible options. To demonstrate the capabilities of our model, we implement three applications: (1) incremental reconstruction for surfaces and color (2) 2D-to-3D transfer of fabricated properties (3) open-vocabulary scene understanding by creating a text CLIP feature field on surfaces. We evaluate Uni-Fusion by comparing it in corresponding applications, from which Uni-Fusion shows high-flexibility in various applications while performing best or being competitive. The project page of Uni-Fusion is available at https://jarrome.github.io/Uni-Fusion/ .Comment: Published on IEEE Transactions on Robotics. Project page: https://jarrome.github.io/Uni-Fusion

    A Pooling Approach to Modelling Spatial Relations for Image Retrieval and Annotation

    Full text link
    Over the last two decades we have witnessed strong progress on modeling visual object classes, scenes and attributes that have significantly contributed to automated image understanding. On the other hand, surprisingly little progress has been made on incorporating a spatial representation and reasoning in the inference process. In this work, we propose a pooling interpretation of spatial relations and show how it improves image retrieval and annotations tasks involving spatial language. Due to the complexity of the spatial language, we argue for a learning-based approach that acquires a representation of spatial relations by learning parameters of the pooling operator. We show improvements on previous work on two datasets and two different tasks as well as provide additional insights on a new dataset with an explicit focus on spatial relations

    Applications of Large Scale Foundation Models for Autonomous Driving

    Full text link
    Since DARPA Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Recently powered by large language models (LLMs), chat systems, such as chatGPT and PaLM, emerge and rapidly become a promising direction to achieve artificial general intelligence (AGI) in natural language processing (NLP). There comes a natural thinking that we could employ these abilities to reformulate autonomous driving. By combining LLM with foundation models, it is possible to utilize the human knowledge, commonsense and reasoning to rebuild autonomous driving systems from the current long-tailed AI dilemma. In this paper, we investigate the techniques of foundation models and LLMs applied for autonomous driving, categorized as simulation, world model, data annotation and planning or E2E solutions etc.Comment: 23 pages. A survey pape

    Pragmatic Frames for Teaching and Learning in Human-Robot interaction: Review and Challenges

    Get PDF
    Vollmer A-L, Wrede B, Rohlfing KJ, Oudeyer P-Y. Pragmatic Frames for Teaching and Learning in Human-Robot interaction: Review and Challenges. FRONTIERS IN NEUROROBOTICS. 2016;10: 10.One of the big challenges in robotics today is to learn from human users that are inexperienced in interacting with robots but yet are often used to teach skills flexibly to other humans and to children in particular. A potential route toward natural and efficient learning and teaching in Human-Robot Interaction (HRI) is to leverage the social competences of humans and the underlying interactional mechanisms. In this perspective, this article discusses the importance of pragmatic frames as flexible interaction protocols that provide important contextual cues to enable learners to infer new action or language skills and teachers to convey these cues. After defining and discussing the concept of pragmatic frames, grounded in decades of research in developmental psychology, we study a selection of HRI work in the literature which has focused on learning-teaching interaction and analyze the interactional and learning mechanisms that were used in the light of pragmatic frames. This allows us to show that many of the works have already used in practice, but not always explicitly, basic elements of the pragmatic frames machinery. However, we also show that pragmatic frames have so far been used in a very restricted way as compared to how they are used in human-human interaction and argue that this has been an obstacle preventing robust natural multi-task learning and teaching in HRI. In particular, we explain that two central features of human pragmatic frames, mostly absent of existing HRI studies, are that (1) social peers use rich repertoires of frames, potentially combined together, to convey and infer multiple kinds of cues; (2) new frames can be learnt continually, building on existing ones, and guiding the interaction toward higher levels of complexity and expressivity. To conclude, we give an outlook on the future research direction describing the relevant key challenges that need to be solved for leveraging pragmatic frames for robot learning and teaching
