2,717 research outputs found
Robots that Say âNoâ. Affective Symbol Grounding and the Case of Intent Interpretations
© 2017 IEEE. This article has been accepted for publication in a forthcoming issue of IEEE Transactions on Cognitive and Developmental Systems. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Modern theories on early child language acquisition tend to focus on referential words, mostly nouns, labeling concrete objects, or physical properties. In this experimental proof-of-concept study, we show how nonreferential negation words, typically belonging to a child's first ten words, may be acquired. A child-like humanoid robot is deployed in speech-wise unconstrained interaction with naïve human participants. In agreement with psycholinguistic observations, we corroborate the hypothesis that affect plays a pivotal role in the socially distributed acquisition process where the adept conversation partner provides linguistic interpretations of the affective displays of the less adept speaker. Negation words are prosodically salient within intent interpretations that are triggered by the learner's display of affect. From there they can be picked up and used by the budding language learner which may involve the grounding of these words in the very affective states that triggered them in the first place. The pragmatic analysis of the robot's linguistic performance indicates that the correct timing of negative utterances is essential for the listener to infer the meaning of otherwise ambiguous negative utterances. In order to assess the robot's performance thoroughly comparative data from psycholinguistic studies of parent-child dyads is needed highlighting the need for further interdisciplinary work.Peer reviewe
Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping
Grasping objects by a specific part is often crucial for safety and for
executing downstream tasks. Yet, learning-based grasp planners lack this
behavior unless they are trained on specific object part data, making it a
significant challenge to scale object diversity. Instead, we propose LERF-TOGO,
Language Embedded Radiance Fields for Task-Oriented Grasping of Objects, which
uses vision-language models zero-shot to output a grasp distribution over an
object given a natural language query. To accomplish this, we first reconstruct
a LERF of the scene, which distills CLIP embeddings into a multi-scale 3D
language field queryable with text. However, LERF has no sense of objectness,
meaning its relevancy outputs often return incomplete activations over an
object which are insufficient for subsequent part queries. LERF-TOGO mitigates
this lack of spatial grouping by extracting a 3D object mask via DINO features
and then conditionally querying LERF on this mask to obtain a semantic
distribution over the object with which to rank grasps from an off-the-shelf
grasp planner. We evaluate LERF-TOGO's ability to grasp task-oriented object
parts on 31 different physical objects, and find it selects grasps on the
correct part in 81% of all trials and grasps successfully in 69%. See the
project website at: lerftogo.github.ioComment: See the project website at: lerftogo.github.i
From Biological to Synthetic Neurorobotics Approaches to Understanding the Structure Essential to Consciousness (Part 3)
This third paper locates the synthetic neurorobotics research reviewed in the second paper in terms of themes introduced in the first paper. It begins with biological non-reductionism as understood by Searle. It emphasizes the role of synthetic neurorobotics studies in accessing the dynamic structure essential to consciousness with a focus on system criticality and self, develops a distinction between simulated and formal consciousness based on this emphasis, reviews Tani and colleagues' work in light of this distinction, and ends by forecasting the increasing importance of synthetic neurorobotics studies for cognitive science and philosophy of mind going forward, finally in regards to most- and myth-consciousness
An investigation of fast and slow mapping
Children learn words astonishingly skilfully. Even infants can reliably âfast mapâ
novel category labels to their referents without feedback or supervision (Carey &
Bartlett, 1978; Houston-Price, Plunkett, & Harris, 2005). Using both empirical and
neural network modelling methods this thesis presents an examination of both the fast
and slow mapping phases of children's early word learning in the context of object and
action categorisation. A series of empirical experiments investigates the relationship
between within-category perceptual variability on two-year-old childrenâs ability to
learn labels for novel categories of objects and actions. Results demonstrate that
variability profoundly affects both noun and verb learning.
A review paper situates empirical word learning research in the context of recent
advances in the application of computational models to developmental research. Data
from the noun experiments are then simulated using a Dynamic Neural Field (DNF)
model (see Spencer & Schöner, 2009), suggesting that childrenâs early object categories
can emerge dynamically from simple label-referent associations strengthened over time.
Novel predictions generated by the model are replicated empirically, providing proofof-
concept for the use of DNF models in simulations of word learning, as well
emphasising the strong featural basis of early categorisation.
The noun data are further explored using a connectionist architecture (Morse, de
Greef, Belpaeme & Cangelosi, 2010) in a robotic system, providing the groundwork for
future research in cognitive robotics. The implications of these different approaches to
cognitive modelling are discussed, situating the current work firmly in the dynamic
systems tradition whilst emphasising the value of interdisciplinary research in
motivating novel research paradigms
Uni-Fusion: Universal Continuous Mapping
We present Uni-Fusion, a universal continuous mapping framework for surfaces,
surface properties (color, infrared, etc.) and more (latent features in CLIP
embedding space, etc.). We propose the first universal implicit encoding model
that supports encoding of both geometry and different types of properties (RGB,
infrared, features, etc.) without requiring any training. Based on this, our
framework divides the point cloud into regular grid voxels and generates a
latent feature in each voxel to form a Latent Implicit Map (LIM) for geometries
and arbitrary properties. Then, by fusing a local LIM frame-wisely into a
global LIM, an incremental reconstruction is achieved. Encoded with
corresponding types of data, our Latent Implicit Map is capable of generating
continuous surfaces, surface property fields, surface feature fields, and all
other possible options. To demonstrate the capabilities of our model, we
implement three applications: (1) incremental reconstruction for surfaces and
color (2) 2D-to-3D transfer of fabricated properties (3) open-vocabulary scene
understanding by creating a text CLIP feature field on surfaces. We evaluate
Uni-Fusion by comparing it in corresponding applications, from which Uni-Fusion
shows high-flexibility in various applications while performing best or being
competitive. The project page of Uni-Fusion is available at
https://jarrome.github.io/Uni-Fusion/ .Comment: Published on IEEE Transactions on Robotics. Project page:
https://jarrome.github.io/Uni-Fusion
A Pooling Approach to Modelling Spatial Relations for Image Retrieval and Annotation
Over the last two decades we have witnessed strong progress on modeling
visual object classes, scenes and attributes that have significantly
contributed to automated image understanding. On the other hand, surprisingly
little progress has been made on incorporating a spatial representation and
reasoning in the inference process. In this work, we propose a pooling
interpretation of spatial relations and show how it improves image retrieval
and annotations tasks involving spatial language. Due to the complexity of the
spatial language, we argue for a learning-based approach that acquires a
representation of spatial relations by learning parameters of the pooling
operator. We show improvements on previous work on two datasets and two
different tasks as well as provide additional insights on a new dataset with an
explicit focus on spatial relations
Applications of Large Scale Foundation Models for Autonomous Driving
Since DARPA Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007,
autonomous driving has been the most active field of AI applications. Recently
powered by large language models (LLMs), chat systems, such as chatGPT and
PaLM, emerge and rapidly become a promising direction to achieve artificial
general intelligence (AGI) in natural language processing (NLP). There comes a
natural thinking that we could employ these abilities to reformulate autonomous
driving. By combining LLM with foundation models, it is possible to utilize the
human knowledge, commonsense and reasoning to rebuild autonomous driving
systems from the current long-tailed AI dilemma. In this paper, we investigate
the techniques of foundation models and LLMs applied for autonomous driving,
categorized as simulation, world model, data annotation and planning or E2E
solutions etc.Comment: 23 pages. A survey pape
Pragmatic Frames for Teaching and Learning in Human-Robot interaction: Review and Challenges
Vollmer A-L, Wrede B, Rohlfing KJ, Oudeyer P-Y. Pragmatic Frames for Teaching and Learning in Human-Robot interaction: Review and Challenges. FRONTIERS IN NEUROROBOTICS. 2016;10: 10.One of the big challenges in robotics today is to learn from human users that are inexperienced in interacting with robots but yet are often used to teach skills flexibly to other humans and to children in particular. A potential route toward natural and efficient learning and teaching in Human-Robot Interaction (HRI) is to leverage the social competences of humans and the underlying interactional mechanisms. In this perspective, this article discusses the importance of pragmatic frames as flexible interaction protocols that provide important contextual cues to enable learners to infer new action or language skills and teachers to convey these cues. After defining and discussing the concept of pragmatic frames, grounded in decades of research in developmental psychology, we study a selection of HRI work in the literature which has focused on learning-teaching interaction and analyze the interactional and learning mechanisms that were used in the light of pragmatic frames. This allows us to show that many of the works have already used in practice, but not always explicitly, basic elements of the pragmatic frames machinery. However, we also show that pragmatic frames have so far been used in a very restricted way as compared to how they are used in human-human interaction and argue that this has been an obstacle preventing robust natural multi-task learning and teaching in HRI. In particular, we explain that two central features of human pragmatic frames, mostly absent of existing HRI studies, are that (1) social peers use rich repertoires of frames, potentially combined together, to convey and infer multiple kinds of cues; (2) new frames can be learnt continually, building on existing ones, and guiding the interaction toward higher levels of complexity and expressivity. To conclude, we give an outlook on the future research direction describing the relevant key challenges that need to be solved for leveraging pragmatic frames for robot learning and teaching
- âŠ