1,432 research outputs found
Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions
Comprehension of spoken natural language is an essential component for robots
to communicate with human effectively. However, handling unconstrained spoken
instructions is challenging due to (1) complex structures including a wide
variety of expressions used in spoken language and (2) inherent ambiguity in
interpretation of human instructions. In this paper, we propose the first
comprehensive system that can handle unconstrained spoken language and is able
to effectively resolve ambiguity in spoken instructions. Specifically, we
integrate deep-learning-based object detection together with natural language
processing technologies to handle unconstrained spoken instructions, and
propose a method for robots to resolve instruction ambiguity through dialogue.
Through our experiments on both a simulated environment as well as a physical
industrial robot arm, we demonstrate the ability of our system to understand
natural instructions from human operators effectively, and how higher success
rates of the object picking task can be achieved through an interactive
clarification process.Comment: 9 pages. International Conference on Robotics and Automation (ICRA)
2018. Accompanying videos are available at the following links:
https://youtu.be/_Uyv1XIUqhk (the system submitted to ICRA-2018) and
http://youtu.be/DGJazkyw0Ws (with improvements after ICRA-2018 submission
Service humanoid robotics : a novel interactive system based on bionic-companionship framework
At present, industrial robotics focuses more on motion control and vision, whereas humanoid service robotics (HSRs) are increasingly being investigated and researched in the field of speech interaction. The problem and quality of human-robot interaction (HRI) has become a widely debated topic in academia. Especially when HSRs are applied in the hospitality industry, some researchers believe that the current HRI model is not well adapted to the complex social environment. HSRs generally lack the ability to accurately recognize human intentions and understand social scenarios. This study proposes a novel interactive framework suitable for HSRs. The proposed framework is grounded on the novel integration of Trevarthen ’s (2001) companionship theory and neural image captioning (NIC) generation algorithm. By integrating image-to-natural interactivity generation and communicating with the environment to better interact with the stakeholder, thereby changing from interaction to a bionic-companionship. Compared to previous research a novel interactive system is developed based on the bionic-companionship framework. The humanoid service robot was integrated with the system to conduct preliminary tests. The results show that the interactive system based on the bionic-companionship framework can help the service humanoid robot to effectively respond to changes in the interactive environment, for example give different responses to the same character in different scenes
Evolution of Natural Language Processing Technology: Not Just Language Processing Towards General Purpose AI
Since the invention of computers, communication through natural language
(actual human language) has been a dream technology. However, natural language
is extremely difficult to mathematically formulate, making it difficult to
realize as an algorithm without considering programming. While there have been
numerous technological developments, one cannot say that any results allowing
free utilization have been achieved thus far. In the case of language learning
in humans, for instance when learning one's mother tongue or foreign language,
one must admit that this process is similar to the adage "practice makes
perfect" in principle, even though the learning method is significant up to a
point. Deep learning has played a central role in contemporary AI technology in
recent years. When applied to natural language processing (NLP), this produced
unprecedented results. Achievements exceeding the initial predictions have been
reported from the results of learning vast amounts of textual data using deep
learning. For instance, four arithmetic operations could be performed without
explicit learning, thereby enabling the explanation of complex images and the
generation of images from corresponding explanatory texts. It is an accurate
example of the learner embodying the concept of "practice makes perfect" by
using vast amounts of textual data. This report provides a technological
explanation of how cutting-edge NLP has made it possible to realize the
"practice makes perfect" principle. Additionally, examples of how this can be
applied to business are provided. We reported in June 2022 in Japanese on the
NLP movement from late 2021 to early 2022. We would like to summarize this as a
memorandum since this is just the initial movement leading to the current large
language models (LLMs).Comment: 40 page
Agent AI: Surveying the Horizons of Multimodal Interaction
Multi-modal AI systems will likely become a ubiquitous presence in our
everyday lives. A promising approach to making these systems more interactive
is to embody them as agents within physical and virtual environments. At
present, systems leverage existing foundation models as the basic building
blocks for the creation of embodied agents. Embedding agents within such
environments facilitates the ability of models to process and interpret visual
and contextual data, which is critical for the creation of more sophisticated
and context-aware AI systems. For example, a system that can perceive user
actions, human behavior, environmental objects, audio expressions, and the
collective sentiment of a scene can be used to inform and direct agent
responses within the given environment. To accelerate research on agent-based
multimodal intelligence, we define "Agent AI" as a class of interactive systems
that can perceive visual stimuli, language inputs, and other
environmentally-grounded data, and can produce meaningful embodied actions. In
particular, we explore systems that aim to improve agents based on
next-embodied action prediction by incorporating external knowledge,
multi-sensory inputs, and human feedback. We argue that by developing agentic
AI systems in grounded environments, one can also mitigate the hallucinations
of large foundation models and their tendency to generate environmentally
incorrect outputs. The emerging field of Agent AI subsumes the broader embodied
and agentic aspects of multimodal interactions. Beyond agents acting and
interacting in the physical world, we envision a future where people can easily
create any virtual reality or simulated scene and interact with agents embodied
within the virtual environment
Teleoperated visual inspection and surveillance with unmanned ground and aerial vehicles,” Int
Abstract—This paper introduces our robotic system named UGAV (Unmanned Ground-Air Vehicle) consisting of two semi-autonomous robot platforms, an Unmanned Ground Vehicle (UGV) and an Unmanned Aerial Vehicles (UAV). The paper focuses on three topics of the inspection with the combined UGV and UAV: (A) teleoperated control by means of cell or smart phones with a new concept of automatic configuration of the smart phone based on a RKI-XML description of the vehicles control capabilities, (B) the camera and vision system with the focus to real time feature extraction e.g. for the tracking of the UAV and (C) the architecture and hardware of the UAV
On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective
Our situated environment is full of uncertainty and highly dynamic, thus
hindering the widespread adoption of machine-led Intelligent Decision-Making
(IDM) in real world scenarios. This means IDM should have the capability of
continuously learning new skills and efficiently generalizing across wider
applications. IDM benefits from any new approaches and theoretical
breakthroughs that exhibit Artificial General Intelligence (AGI) breaking the
barriers between tasks and applications. Recent research has well-examined
neural architecture, Transformer, as a backbone foundation model and its
generalization to various tasks, including computer vision, natural language
processing, and reinforcement learning. We therefore argue that a foundation
decision model (FDM) can be established by formulating various decision-making
tasks as a sequence decoding task using the Transformer architecture; this
would be a promising solution to advance the applications of IDM in more
complex real world tasks. In this paper, we elaborate on how a foundation
decision model improves the efficiency and generalization of IDM. We also
discuss potential applications of a FDM in multi-agent game AI, production
scheduling, and robotics tasks. Finally, through a case study, we demonstrate
our realization of the FDM, DigitalBrain (DB1) with 1.2 billion parameters,
which achieves human-level performance over 453 tasks, including text
generation, images caption, video games playing, robotic control, and traveling
salesman problems. As a foundation decision model, DB1 would be a baby step
towards more autonomous and efficient real world IDM applications.Comment: 26 pages, 4 figure
Service Humanoid Robotics: Review and Design of A Novel Bionic-Companionship Framework
At present, industrial robotics focused more on motion control and vision; whereas Humanoid Service Robotics (HSRs) are increasingly being investigated among researchers' and practitioners' field of speech interactions. The problematic and quality of human-robot interaction (HRI) has become one of the hot potatoes concerned in academia. This paper proposes a novel interactive framework suitable for HSRs. The proposed framework is grounded on the novel integration of Trevarthen Companionship Theory and neural image generation algorithm in computer vision. By integrating the image-to-natural interactivities generation, and communicate with the environment to better interact with the stakeholder, thereby changing from interaction to a bionic-companionship. In addition, the article also reviews the research of neural image generation algorithms and summarizes the application cases of the algorithm structure in the field of robotics from a critical perspective. We believe that the new interactive bionic-companionship framework can enable HSRs to further develop towards robot companions
Visual telemetry transmission in marine environment using Robot Operating System platform
Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) “Συστήματα Αυτοματισμού
- …