160 research outputs found
SiAM-dp : an open development platform for massively multimodal dialogue systems in cyber-physical environments
Cyber-physical environments enhance natural environments of daily life such as homes, factories, offices, and cars by connecting the cybernetic world of computers and communication with the real physical world. While under the keyword of Industrie 4.0, cyber-physical environments will take a relevant role in the next industrial revolution, and they will also appear in homes, offices, workshops, and numerous other areas. In this new world, classical interaction concepts where users exclusively interact with a single stationary device, PC or smartphone become less dominant and make room for new occurrences of interaction between humans and the environment itself. Furthermore, new technologies and a rising spectrum of applicable modalities broaden the possibilities for interaction designers to include more natural and intuitive non-verbal and verbal communication. The dynamic characteristic of a cyber-physical environment and the mobility of users confronts developers with the challenge of developing systems that are flexible concerning the connected and used devices and modalities. This implies new opportunities for cross-modal interaction that go beyond dual modalities interaction as is well known nowadays. This thesis addresses the support of application developers with a platform for the declarative and model based development of multimodal dialogue applications, with a focus on distributed input and output devices in cyber-physical environments. The main contributions can be divided into three parts:
- Design of models and strategies for the specification of dialogue applications in a declarative development approach. This includes models for the definition of project resources, dialogue behaviour, speech recognition grammars, and graphical user interfaces and mapping rules, which convert the device specific representation of input and output description to a common representation language.
- The implementation of a runtime platform that provides a flexible and extendable architecture for the easy integration of new devices and components. The platform realises concepts and strategies of multimodal human-computer interaction and is the basis for full-fledged multimodal dialogue applications for arbitrary device setups, domains, and scenarios.
- A software development toolkit that is integrated in the Eclipse rich client platform and provides wizards and editors for creating and editing new multimodal dialogue applications.Cyber-physische Umgebungen (CPEs) erweitern natürliche Alltagsumgebungen wie Heim, Fabrik, Büro und Auto durch Verbindung der kybernetischen Welt der Computer und Kommunikation mit der realen, physischen Welt. Die möglichen Anwendungsgebiete hierbei sind weitreichend. Während unter dem Stichwort Industrie 4.0 cyber-physische Umgebungen eine bedeutende Rolle für die nächste industrielle Revolution spielen werden, erhalten sie ebenfalls Einzug in Heim, Büro, Werkstatt und zahlreiche weitere Bereiche. In solch einer neuen Welt geraten klassische Interaktionskonzepte, in denen Benutzer ausschließlich mit einem einzigen Gerät, PC oder Smartphone interagieren, immer weiter in den Hintergrund und machen Platz für eine neue Ausprägung der Interaktion zwischen dem Menschen und der Umgebung selbst. Darüber hinaus sorgen neue Technologien und ein wachsendes Spektrum an einsetzbaren Modalitäten dafür, dass sich im Interaktionsdesign neue Möglichkeiten für eine natürlichere und intuitivere verbale und nonverbale Kommunikation auftun. Die dynamische Natur von cyber-physischen Umgebungen und die Mobilität der Benutzer darin stellt Anwendungsentwickler vor die Herausforderung, Systeme zu entwickeln, die flexibel bezüglich der verbundenen und verwendeten Geräte und Modalitäten sind. Dies impliziert auch neue Möglichkeiten in der modalitätsübergreifenden Kommunikation, die über duale Interaktionskonzepte, wie sie heutzutage bereits üblich sind, hinausgehen. Die vorliegende Arbeit befasst sich mit der Unterstützung von Anwendungsentwicklern mit Hilfe einer Plattform zur deklarativen und modellbasierten Entwicklung von multimodalen Dialogapplikationen mit einem Fokus auf verteilte Ein- und Ausgabegeräte in cyber-physischen Umgebungen. Die bearbeiteten Aufgaben können grundlegend in drei Teile gegliedert werden:
- Die Konzeption von Modellen und Strategien für die Spezifikation von Dialoganwendungen in einem deklarativen Entwicklungsansatz. Dies beinhaltet Modelle für das Definieren von Projektressourcen, Dialogverhalten, Spracherkennergrammatiken, graphischen Benutzerschnittstellen und Abbildungsregeln, die die gerätespezifische Darstellung von Ein- und Ausgabegeräten in eine gemeinsame Repräsentationssprache transformieren.
- Die Implementierung einer Laufzeitumgebung, die eine flexible und erweiterbare Architektur für die einfache Integration neuer Geräte und Komponenten bietet. Die Plattform realisiert Konzepte und Strategien der multimodalen Mensch-Maschine-Interaktion und ist die Basis vollwertiger multimodaler Dialoganwendungen für beliebige Domänen, Szenarien und Gerätekonfigurationen.
- Eine Softwareentwicklungsumgebung, die in die Eclipse Rich Client Plattform integriert ist und Entwicklern Assistenten und Editoren an die Hand gibt, die das Erstellen und Editieren von neuen multimodalen Dialoganwendungen unterstützen
MOG 2007:Workshop on Multimodal Output Generation: CTIT Proceedings
This volume brings together presents a wide variety of work offering different perspectives on multimodal generation. Two different strands of work can be distinguished: half of the gathered papers present current work on embodied conversational agents (ECA’s), while the other half presents current work on multimedia applications. Two general research questions are shared by all: what output modalities are most suitable in which situation, and how should different output modalities be combined
Persuasive Intelligence: On the Construction of Rhetor-Ethical Cognitive Machines
This work concerns the rhetorical and moral agency of machines, offering paths forward in machine ethics as well as problematizing the issue through the development and use of an interdisciplinary framework informed by rhetoric, philosophy of mind, media studies and historical narrative. I argue that cognitive machines of the past as well as those today, such as rapidly improving autonomous vehicles, are unable to make moral decisions themselves foremost because a moral agent must first be a rhetorical agent, capable of persuading and of being persuaded. I show that current machines, artificially intelligent or otherwise, and especially digital computers, are primarily concerned with control, whereas persuasive behavior requires an understanding of possibility. Further, this dissertation connects rhetorical agency and moral agency (what I call a rhetor-ethical constitution) by way of the Heraclitean notion of syllapsis ( grasping ), a mode of cognition that requires an agent to practice analysis and synthesis at once, cognizing the whole and its parts simultaneously. This argument does not, however, indicate that machines are devoid of ethical or rhetorical activity or future agency. To the contrary, the larger purpose of developing this theoretical framework is to provide avenues of research, exploration and experimentation in machine ethics and persuasion that have been overlooked or ignored thus far by adhering to restricted disciplinary programs; and, given the ontological nature of the ephemeral binary that drives digital computation, I show that at least in principle, computers share the syllaptic operating principle required for rhetor-ethical decisions and action
Interacção multimodal : contribuições para simplificar o desenvolvimento de aplicações
Doutoramento em Engenharia InformáticaA forma como interagimos com os dispositivos que nos rodeiam, no nosso diaa-
dia, está a mudar constantemente, consequência do aparecimento de novas
tecnologias e métodos que proporcionam melhores e mais aliciantes formas de
interagir com as aplicações. No entanto, a integração destas tecnologias, para
possibilitar a sua utilização alargada, coloca desafios significativos e requer, da
parte de quem desenvolve, um conhecimento alargado das tecnologias
envolvidas. Apesar de a literatura mais recente apresentar alguns avanços no
suporte ao desenho e desenvolvimento de sistemas interactivos multimodais,
vários aspectos chave têm ainda de ser resolvidos para que se atinja o seu
real potencial. Entre estes aspectos, um exemplo relevante é o da dificuldade
em desenvolver e integrar múltiplas modalidades de interacção.
Neste trabalho, propomos, desenhamos e implementamos uma framework que
permite um mais fácil desenvolvimento de interacção multimodal. A nossa
proposta mantém as modalidades de interacção completamente separadas da
aplicação, permitindo um desenvolvimento, independente de cada uma das
partes. A framework proposta já inclui um conjunto de modalidades genéricas
e módulos que podem ser usados em novas aplicações. De entre as
modalidades genéricas, a modalidade de voz mereceu particular atenção,
tendo em conta a relevância crescente da interacção por voz, por exemplo em
cenários como AAL, e a complexidade associada ao seu desenvolvimento.
Adicionalmente, a nossa proposta contempla ainda o suporte à gestão de
aplicações multi-dispositivo e inclui um método e respectivo módulo para criar
fusão entre eventos.
O desenvolvimento da arquitectura e da framework ocorreu num contexto de
I&D diversificado, incluindo vários projectos, cenários de aplicação e parceiros
internacionais. A framework permitiu o desenho e desenvolvimento de um
conjunto alargado de aplicações multimodais, sendo um exemplo digno de
nota o assistente pessoal AALFred, do projecto PaeLife. Estas aplicações, por
sua vez, serviram um contínuo melhoramento da framework, suportando a
recolha iterativa de novos requisitos, e permitido demonstrar a sua
versatilidade e capacidades.The way we interact with the devices around us, in everyday life, is constantly
changing, boosted by emerging technologies and methods, providing better
and more engaging ways to interact with applications. Nevertheless, the
integration with these technologies, to enable their widespread use in current
systems, presents a notable challenge and requires considerable knowhow
from developers. While the recent literature has made some advances in
supporting the design and development of multimodal interactive systems,
several key aspects have yet to be addressed to enable its full potential.
Among these, a relevant example is the difficulty to develop and integrate
multiple interaction modalities.
In this work, we propose, design and implement a framework enabling easier
development of multimodal interaction. Our proposal fully decouples the
interaction modalities from the application, allowing the separate development
of each part. The proposed framework already includes a set of generic
modalities and modules ready to be used in novel applications. Among the
proposed generic modalities, the speech modality deserved particular attention,
attending to the increasing relevance of speech interaction, for example in
scenarios such as AAL, and the complexity behind its development.
Additionally, our proposal also tackles the support for managing multi-device
applications and includes a method and corresponding module to create fusion
of events.
The development of the architecture and framework profited from a rich R&D
context including several projects, scenarios, and international partners. The
framework successfully supported the design and development of a wide set of
multimodal applications, a notable example being AALFred, the personal
assistant of project PaeLife. These applications, in turn, served the continuous
improvement of the framework by supporting the iterative collection of novel
requirements, enabling the proposed framework to show its versatility and
potential
An Approach for Contextual Control in Dialogue Management with Belief State Trend Analysis and Prediction
This thesis applies the theory of naturalistic decision making (NDM) in human physcology model for the study of dialogue management system in major approaches from the classical approach based upon finite state machine to most recent approach using partially observable markov decision process (POMDP). While most of the approaches use various techniques to estimate system state, POMDP-based system uses the belief state to make decisions. In addition to the state estimation POMDP provides a mechanism to model the uncertainty and allows error-recovery. However, applying Markovian over the belief-state space in the current POMDP models cause significant loss of valuable information in the dialogue history, leading to untruthful management of user\u27s intention. Also there is a need of adequate interaction with users according to their level of knowledge. To improve the performance of POMDP-based dialogue management, this thesis proposes an enabling method to allow dynamic control of dialogue management. There are three contributions made in order to achieve the dynamism which are as follows: Introduce historical belief information into the POMDP model, analyzing its trend and predicting the user belief states with history information and finally using this derived information to control the system based on the user intention by switching between contextual control modes. Theoretical derivations of proposed work and experiments with simulation provide evidence on dynamic dialogue control of the agent to improve the human-computer interaction using the proposed algorithm
Human-Robot Interaction architecture for interactive and lively social robots
Mención Internacional en el título de doctorLa sociedad está experimentando un proceso de envejecimiento que puede provocar un desequilibrio
entre la población en edad de trabajar y aquella fuera del mercado de trabajo. Una de las soluciones
a este problema que se están considerando hoy en día es la introducción de robots en multiples
sectores, incluyendo el de servicios. Sin embargo, para que esto sea una solución viable, estos robots
necesitan ser capaces de interactuar con personas de manera satisfactoria, entre otras habilidades. En
el contexto de la aplicación de robots sociales al cuidado de mayores, esta tesis busca proporcionar
a un robot social las habilidades necesarias para crear interacciones entre humanos y robots que
sean naturales. En concreto, esta tesis se centra en tres problemas que deben ser solucionados: (i) el
modelado de interacciones entre humanos y robots; (ii) equipar a un robot social con las capacidades
expresivas necesarias para una comunicación satisfactoria; y (iii) darle al robot una apariencia vivaz.
La solución al problema de modelado de diálogos presentada en esta tesis propone diseñar estos
diálogos como una secuencia de elementos atómicos llamados Actos Comunicativos (CAs, por sus
siglas en inglés). Se pueden parametrizar en tiempo de ejecución para completar diferentes objetivos
comunicativos, y están equipados con mecanismos para manejar algunas de las imprecisiones que
pueden aparecer durante interacciones. Estos CAs han sido identificados a partir de la combinación
de dos dimensiones: iniciativa (si la tiene el robot o el usuario) e intención (si se pretende obtener o
proporcionar información). Estos CAs pueden ser combinados siguiendo una estructura jerárquica
para crear estructuras mas complejas que sean reutilizables. Esto simplifica el proceso para crear
nuevas interacciones, permitiendo a los desarrolladores centrarse exclusivamente en diseñar el flujo
del diálogo, sin tener que preocuparse de reimplementar otras funcionalidades que tienen que estar
presentes en todas las interacciones (como el manejo de errores, por ejemplo).
La expresividad del robot está basada en el uso de una librería de gestos, o expresiones,
multimodales predefinidos, modelados como estructuras similares a máquinas de estados. El
módulo que controla la expresividad recibe peticiones para realizar dichas expresiones, planifica
su ejecución para evitar cualquier conflicto que pueda aparecer, las carga, y comprueba que su
ejecución se complete sin problemas. El sistema es capaz también de generar estas expresiones en
tiempo de ejecución a partir de una lista de acciones unimodales (como decir una frase, o mover una
articulación). Una de las características más importantes de la arquitectura de expresividad propuesta
es la integración de una serie de métodos de modulación que pueden ser usados para modificar los
gestos del robot en tiempo de ejecución. Esto permite al robot adaptar estas expresiones en base
a circunstancias particulares (aumentando al mismo tiempo la variabilidad de la expresividad del robot), y usar un número limitado de gestos para mostrar diferentes estados internos (como el estado
emocional).
Teniendo en cuenta que ser reconocido como un ser vivo es un requisito para poder participar en
interacciones sociales, que un robot social muestre una apariencia de vivacidad es un factor clave
en interacciones entre humanos y robots. Para ello, esta tesis propone dos soluciones. El primer
método genera acciones a través de las diferentes interfaces del robot a intervalos. La frecuencia e
intensidad de estas acciones están definidas en base a una señal que representa el pulso del robot.
Dicha señal puede adaptarse al contexto de la interacción o al estado interno del robot. El segundo
método enriquece las interacciones verbales entre el robot y el usuario prediciendo los gestos no
verbales más apropiados en base al contenido del diálogo y a la intención comunicativa del robot.
Un modelo basado en aprendizaje automático recibe la transcripción del mensaje verbal del robot,
predice los gestos que deberían acompañarlo, y los sincroniza para que cada gesto empiece en el
momento preciso. Este modelo se ha desarrollado usando una combinación de un encoder diseñado
con una red neuronal Long-Short Term Memory, y un Conditional Random Field para predecir la
secuencia de gestos que deben acompañar a la frase del robot.
Todos los elementos presentados conforman el núcleo de una arquitectura de interacción
humano-robot modular que ha sido integrada en múltiples plataformas, y probada bajo diferentes
condiciones. El objetivo central de esta tesis es contribuir al área de interacción humano-robot
con una nueva solución que es modular e independiente de la plataforma robótica, y que se centra
en proporcionar a los desarrolladores las herramientas necesarias para desarrollar aplicaciones que
requieran interacciones con personas.Society is experiencing a series of demographic changes that can result in an unbalance between
the active working and non-working age populations. One of the solutions considered to mitigate
this problem is the inclusion of robots in multiple sectors, including the service sector. But for
this to be a viable solution, among other features, robots need to be able to interact with humans
successfully. This thesis seeks to endow a social robot with the abilities required for a natural
human-robot interactions. The main objective is to contribute to the body of knowledge on the area
of Human-Robot Interaction with a new, platform-independent, modular approach that focuses on
giving roboticists the tools required to develop applications that involve interactions with humans. In
particular, this thesis focuses on three problems that need to be addressed: (i) modelling interactions
between a robot and an user; (ii) endow the robot with the expressive capabilities required for a
successful communication; and (iii) endow the robot with a lively appearance.
The approach to dialogue modelling presented in this thesis proposes to model dialogues as a
sequence of atomic interaction units, called Communicative Acts, or CAs. They can be parametrized
in runtime to achieve different communicative goals, and are endowed with mechanisms oriented to
solve some of the uncertainties related to interaction. Two dimensions have been used to identify the
required CAs: initiative (the robot or the user), and intention (either retrieve information or to convey
it). These basic CAs can be combined in a hierarchical manner to create more re-usable complex
structures. This approach simplifies the creation of new interactions, by allowing developers to focus
exclusively on designing the flow of the dialogue, without having to re-implement functionalities
that are common to all dialogues (like error handling, for example).
The expressiveness of the robot is based on the use of a library of predefined multimodal gestures,
or expressions, modelled as state machines. The module managing the expressiveness receives requests
for performing gestures, schedules their execution in order to avoid any possible conflict that might
arise, loads them, and ensures that their execution goes without problems. The proposed approach
is also able to generate expressions in runtime based on a list of unimodal actions (an utterance,
the motion of a limb, etc...). One of the key features of the proposed expressiveness management
approach is the integration of a series of modulation techniques that can be used to modify the
robot’s expressions in runtime. This would allow the robot to adapt them to the particularities of a
given situation (which would also increase the variability of the robot expressiveness), and to display
different internal states with the same expressions. Considering that being recognized as a living being is a requirement for engaging in social
encounters, the perception of a social robot as a living entity is a key requirement to foster
human-robot interactions. In this dissertation, two approaches have been proposed. The first
method generates actions for the different interfaces of the robot at certain intervals. The frequency
and intensity of these actions are defined by a signal that represents the pulse of the robot, which can
be adapted to the context of the interaction or the internal state of the robot. The second method
enhances the robot’s utterance by predicting the appropriate non-verbal expressions that should
accompany them, according to the content of the robot’s message, as well as its communicative
intention. A deep learning model receives the transcription of the robot’s utterances, predicts
which expressions should accompany it, and synchronizes them, so each gesture selected starts at
the appropriate time. The model has been developed using a combination of a Long-Short Term
Memory network-based encoder and a Conditional Random Field for generating a sequence of
gestures that are combined with the robot’s utterance.
All the elements presented above conform the core of a modular Human-Robot Interaction
architecture that has been integrated in multiple platforms, and tested under different conditions.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Fernando Torres Medina.- Secretario: Concepción Alicia Monje Micharet.- Vocal: Amirabdollahian Farshi
Interim research assessment 2003-2005 - Computer Science
This report primarily serves as a source of information for the 2007 Interim Research Assessment Committee for Computer Science at the three technical universities in the Netherlands. The report also provides information for others interested in our research activities
The Application of Mixed Reality Within Civil Nuclear Manufacturing and Operational Environments
This thesis documents the design and application of Mixed Reality (MR) within a nuclear
manufacturing cell through the creation of a Digitally Assisted Assembly Cell (DAAC). The
DAAC is a proof of concept system, combining full body tracking within a room sized
environment and bi-directional feedback mechanism to allow communication between users within
the Virtual Environment (VE) and a manufacturing cell. This allows for training, remote assistance,
delivery of work instructions, and data capture within a manufacturing cell.
The research underpinning the DAAC encompasses four main areas; the nuclear industry, Virtual
Reality (VR) and MR technology, MR within manufacturing, and finally the 4 th Industrial
Revolution (IR4.0). Using an array of Kinect sensors, the DAAC was designed to capture user
movements within a real manufacturing cell, which can be transferred in real time to a VE, creating
a digital twin of the real cell. Users can interact with each other via digital assets and laser pointers
projected into the cell, accompanied by a built-in Voice over Internet Protocol (VoIP) system. This
allows for the capture of implicit knowledge from operators within the real manufacturing cell, as
well as transfer of that knowledge to future operators. Additionally, users can connect to the VE
from anywhere in the world. In this way, experts are able to communicate with the users in the real
manufacturing cell and assist with their training. The human tracking data fills an identified gap in
the IR4.0 network of Cyber Physical System (CPS), and could allow for future optimisations
within manufacturing systems, Material Resource Planning (MRP) and Enterprise Resource
Planning (ERP).
This project is a demonstration of how MR could prove valuable within nuclear manufacture. The
DAAC is designed to be low cost. It is hoped this will allow for its use by groups who have
traditionally been priced out of MR technology. This could help Small to Medium Enterprises
(SMEs) close the double digital divide between themselves and larger global corporations. For
larger corporations it offers the benefit of being low cost, and, is consequently, easier to roll out
across the value chain. Skills developed in one area can also be transferred to others across the
internet, as users from one manufacturing cell can watch and communicate with those in another.
However, as a proof of concept, the DAAC is at Technology Readiness Level (TRL) five or six and,
prior to its wider application, further testing is required to asses and improve the technology.
The work was patented in both the UK (S. R EDDISH et al., 2017a), the US (S. R EDDISH et al.,
2017b) and China (S. R EDDISH et al., 2017c). The patents are owned by Rolls-Royce and cover
the methods of bi-directional feedback from which users can interact from the digital to the real
and vice versa.
Stephen Reddish
Mixed Mode Realities in Nuclear Manufacturing
Key words: Mixed Mode Reality, Virtual Reality, Augmented Reality, Nuclear, Manufacture,
Digital Twin, Cyber Physical Syste
Multi-Agent Systems
A multi-agent system (MAS) is a system composed of multiple interacting intelligent agents. Multi-agent systems can be used to solve problems which are difficult or impossible for an individual agent or monolithic system to solve. Agent systems are open and extensible systems that allow for the deployment of autonomous and proactive software components. Multi-agent systems have been brought up and used in several application domains
Proceedings of the International Workshop "Innovation Information Technologies: Theory and Practice": Dresden, Germany, September 06-10.2010
This International Workshop is a high quality seminar providing a forum for the exchange of scientific achievements between research communities of different universities and research institutes in the area of innovation information technologies. It is a continuation of the Russian-German Workshops that have been organized by the universities in Dresden, Karlsruhe and Ufa before.
The workshop was arranged in 9 sessions covering the major topics: Modern Trends in Information Technology, Knowledge Based Systems and Semantic Modelling, Software Technology and High Performance Computing, Geo-Information Systems and Virtual Reality, System and Process Engineering, Process Control and Management and Corporate Information Systems
- …