    SiAM-dp : an open development platform for massively multimodal dialogue systems in cyber-physical environments

    Cyber-physical environments enhance natural environments of daily life such as homes, factories, offices, and cars by connecting the cybernetic world of computers and communication with the real physical world. While under the keyword of Industrie 4.0, cyber-physical environments will take a relevant role in the next industrial revolution, and they will also appear in homes, offices, workshops, and numerous other areas. In this new world, classical interaction concepts where users exclusively interact with a single stationary device, PC or smartphone become less dominant and make room for new occurrences of interaction between humans and the environment itself. Furthermore, new technologies and a rising spectrum of applicable modalities broaden the possibilities for interaction designers to include more natural and intuitive non-verbal and verbal communication. The dynamic characteristic of a cyber-physical environment and the mobility of users confronts developers with the challenge of developing systems that are flexible concerning the connected and used devices and modalities. This implies new opportunities for cross-modal interaction that go beyond dual modalities interaction as is well known nowadays. This thesis addresses the support of application developers with a platform for the declarative and model based development of multimodal dialogue applications, with a focus on distributed input and output devices in cyber-physical environments. The main contributions can be divided into three parts: - Design of models and strategies for the specification of dialogue applications in a declarative development approach. This includes models for the definition of project resources, dialogue behaviour, speech recognition grammars, and graphical user interfaces and mapping rules, which convert the device specific representation of input and output description to a common representation language. - The implementation of a runtime platform that provides a flexible and extendable architecture for the easy integration of new devices and components. The platform realises concepts and strategies of multimodal human-computer interaction and is the basis for full-fledged multimodal dialogue applications for arbitrary device setups, domains, and scenarios. - A software development toolkit that is integrated in the Eclipse rich client platform and provides wizards and editors for creating and editing new multimodal dialogue applications.Cyber-physische Umgebungen (CPEs) erweitern natürliche Alltagsumgebungen wie Heim, Fabrik, Büro und Auto durch Verbindung der kybernetischen Welt der Computer und Kommunikation mit der realen, physischen Welt. Die möglichen Anwendungsgebiete hierbei sind weitreichend. Während unter dem Stichwort Industrie 4.0 cyber-physische Umgebungen eine bedeutende Rolle für die nächste industrielle Revolution spielen werden, erhalten sie ebenfalls Einzug in Heim, Büro, Werkstatt und zahlreiche weitere Bereiche. In solch einer neuen Welt geraten klassische Interaktionskonzepte, in denen Benutzer ausschließlich mit einem einzigen Gerät, PC oder Smartphone interagieren, immer weiter in den Hintergrund und machen Platz für eine neue Ausprägung der Interaktion zwischen dem Menschen und der Umgebung selbst. Darüber hinaus sorgen neue Technologien und ein wachsendes Spektrum an einsetzbaren Modalitäten dafür, dass sich im Interaktionsdesign neue Möglichkeiten für eine natürlichere und intuitivere verbale und nonverbale Kommunikation auftun. Die dynamische Natur von cyber-physischen Umgebungen und die Mobilität der Benutzer darin stellt Anwendungsentwickler vor die Herausforderung, Systeme zu entwickeln, die flexibel bezüglich der verbundenen und verwendeten Geräte und Modalitäten sind. Dies impliziert auch neue Möglichkeiten in der modalitätsübergreifenden Kommunikation, die über duale Interaktionskonzepte, wie sie heutzutage bereits üblich sind, hinausgehen. Die vorliegende Arbeit befasst sich mit der Unterstützung von Anwendungsentwicklern mit Hilfe einer Plattform zur deklarativen und modellbasierten Entwicklung von multimodalen Dialogapplikationen mit einem Fokus auf verteilte Ein- und Ausgabegeräte in cyber-physischen Umgebungen. Die bearbeiteten Aufgaben können grundlegend in drei Teile gegliedert werden: - Die Konzeption von Modellen und Strategien für die Spezifikation von Dialoganwendungen in einem deklarativen Entwicklungsansatz. Dies beinhaltet Modelle für das Definieren von Projektressourcen, Dialogverhalten, Spracherkennergrammatiken, graphischen Benutzerschnittstellen und Abbildungsregeln, die die gerätespezifische Darstellung von Ein- und Ausgabegeräten in eine gemeinsame Repräsentationssprache transformieren. - Die Implementierung einer Laufzeitumgebung, die eine flexible und erweiterbare Architektur für die einfache Integration neuer Geräte und Komponenten bietet. Die Plattform realisiert Konzepte und Strategien der multimodalen Mensch-Maschine-Interaktion und ist die Basis vollwertiger multimodaler Dialoganwendungen für beliebige Domänen, Szenarien und Gerätekonfigurationen. - Eine Softwareentwicklungsumgebung, die in die Eclipse Rich Client Plattform integriert ist und Entwicklern Assistenten und Editoren an die Hand gibt, die das Erstellen und Editieren von neuen multimodalen Dialoganwendungen unterstützen

    MOG 2007:Workshop on Multimodal Output Generation: CTIT Proceedings

    This volume brings together presents a wide variety of work offering different perspectives on multimodal generation. Two different strands of work can be distinguished: half of the gathered papers present current work on embodied conversational agents (ECA’s), while the other half presents current work on multimedia applications. Two general research questions are shared by all: what output modalities are most suitable in which situation, and how should different output modalities be combined

    Persuasive Intelligence: On the Construction of Rhetor-Ethical Cognitive Machines

    This work concerns the rhetorical and moral agency of machines, offering paths forward in machine ethics as well as problematizing the issue through the development and use of an interdisciplinary framework informed by rhetoric, philosophy of mind, media studies and historical narrative. I argue that cognitive machines of the past as well as those today, such as rapidly improving autonomous vehicles, are unable to make moral decisions themselves foremost because a moral agent must first be a rhetorical agent, capable of persuading and of being persuaded. I show that current machines, artificially intelligent or otherwise, and especially digital computers, are primarily concerned with control, whereas persuasive behavior requires an understanding of possibility. Further, this dissertation connects rhetorical agency and moral agency (what I call a rhetor-ethical constitution) by way of the Heraclitean notion of syllapsis ( grasping ), a mode of cognition that requires an agent to practice analysis and synthesis at once, cognizing the whole and its parts simultaneously. This argument does not, however, indicate that machines are devoid of ethical or rhetorical activity or future agency. To the contrary, the larger purpose of developing this theoretical framework is to provide avenues of research, exploration and experimentation in machine ethics and persuasion that have been overlooked or ignored thus far by adhering to restricted disciplinary programs; and, given the ontological nature of the ephemeral binary that drives digital computation, I show that at least in principle, computers share the syllaptic operating principle required for rhetor-ethical decisions and action

    Interacção multimodal : contribuições para simplificar o desenvolvimento de aplicações

    Doutoramento em Engenharia InformáticaA forma como interagimos com os dispositivos que nos rodeiam, no nosso diaa- dia, está a mudar constantemente, consequência do aparecimento de novas tecnologias e métodos que proporcionam melhores e mais aliciantes formas de interagir com as aplicações. No entanto, a integração destas tecnologias, para possibilitar a sua utilização alargada, coloca desafios significativos e requer, da parte de quem desenvolve, um conhecimento alargado das tecnologias envolvidas. Apesar de a literatura mais recente apresentar alguns avanços no suporte ao desenho e desenvolvimento de sistemas interactivos multimodais, vários aspectos chave têm ainda de ser resolvidos para que se atinja o seu real potencial. Entre estes aspectos, um exemplo relevante é o da dificuldade em desenvolver e integrar múltiplas modalidades de interacção. Neste trabalho, propomos, desenhamos e implementamos uma framework que permite um mais fácil desenvolvimento de interacção multimodal. A nossa proposta mantém as modalidades de interacção completamente separadas da aplicação, permitindo um desenvolvimento, independente de cada uma das partes. A framework proposta já inclui um conjunto de modalidades genéricas e módulos que podem ser usados em novas aplicações. De entre as modalidades genéricas, a modalidade de voz mereceu particular atenção, tendo em conta a relevância crescente da interacção por voz, por exemplo em cenários como AAL, e a complexidade associada ao seu desenvolvimento. Adicionalmente, a nossa proposta contempla ainda o suporte à gestão de aplicações multi-dispositivo e inclui um método e respectivo módulo para criar fusão entre eventos. O desenvolvimento da arquitectura e da framework ocorreu num contexto de I&D diversificado, incluindo vários projectos, cenários de aplicação e parceiros internacionais. A framework permitiu o desenho e desenvolvimento de um conjunto alargado de aplicações multimodais, sendo um exemplo digno de nota o assistente pessoal AALFred, do projecto PaeLife. Estas aplicações, por sua vez, serviram um contínuo melhoramento da framework, suportando a recolha iterativa de novos requisitos, e permitido demonstrar a sua versatilidade e capacidades.The way we interact with the devices around us, in everyday life, is constantly changing, boosted by emerging technologies and methods, providing better and more engaging ways to interact with applications. Nevertheless, the integration with these technologies, to enable their widespread use in current systems, presents a notable challenge and requires considerable knowhow from developers. While the recent literature has made some advances in supporting the design and development of multimodal interactive systems, several key aspects have yet to be addressed to enable its full potential. Among these, a relevant example is the difficulty to develop and integrate multiple interaction modalities. In this work, we propose, design and implement a framework enabling easier development of multimodal interaction. Our proposal fully decouples the interaction modalities from the application, allowing the separate development of each part. The proposed framework already includes a set of generic modalities and modules ready to be used in novel applications. Among the proposed generic modalities, the speech modality deserved particular attention, attending to the increasing relevance of speech interaction, for example in scenarios such as AAL, and the complexity behind its development. Additionally, our proposal also tackles the support for managing multi-device applications and includes a method and corresponding module to create fusion of events. The development of the architecture and framework profited from a rich R&D context including several projects, scenarios, and international partners. The framework successfully supported the design and development of a wide set of multimodal applications, a notable example being AALFred, the personal assistant of project PaeLife. These applications, in turn, served the continuous improvement of the framework by supporting the iterative collection of novel requirements, enabling the proposed framework to show its versatility and potential

    An Approach for Contextual Control in Dialogue Management with Belief State Trend Analysis and Prediction

    This thesis applies the theory of naturalistic decision making (NDM) in human physcology model for the study of dialogue management system in major approaches from the classical approach based upon finite state machine to most recent approach using partially observable markov decision process (POMDP). While most of the approaches use various techniques to estimate system state, POMDP-based system uses the belief state to make decisions. In addition to the state estimation POMDP provides a mechanism to model the uncertainty and allows error-recovery. However, applying Markovian over the belief-state space in the current POMDP models cause significant loss of valuable information in the dialogue history, leading to untruthful management of user\u27s intention. Also there is a need of adequate interaction with users according to their level of knowledge. To improve the performance of POMDP-based dialogue management, this thesis proposes an enabling method to allow dynamic control of dialogue management. There are three contributions made in order to achieve the dynamism which are as follows: Introduce historical belief information into the POMDP model, analyzing its trend and predicting the user belief states with history information and finally using this derived information to control the system based on the user intention by switching between contextual control modes. Theoretical derivations of proposed work and experiments with simulation provide evidence on dynamic dialogue control of the agent to improve the human-computer interaction using the proposed algorithm

    Human-Robot Interaction architecture for interactive and lively social robots

    Mención Internacional en el título de doctorLa sociedad está experimentando un proceso de envejecimiento que puede provocar un desequilibrio entre la población en edad de trabajar y aquella fuera del mercado de trabajo. Una de las soluciones a este problema que se están considerando hoy en día es la introducción de robots en multiples sectores, incluyendo el de servicios. Sin embargo, para que esto sea una solución viable, estos robots necesitan ser capaces de interactuar con personas de manera satisfactoria, entre otras habilidades. En el contexto de la aplicación de robots sociales al cuidado de mayores, esta tesis busca proporcionar a un robot social las habilidades necesarias para crear interacciones entre humanos y robots que sean naturales. En concreto, esta tesis se centra en tres problemas que deben ser solucionados: (i) el modelado de interacciones entre humanos y robots; (ii) equipar a un robot social con las capacidades expresivas necesarias para una comunicación satisfactoria; y (iii) darle al robot una apariencia vivaz. La solución al problema de modelado de diálogos presentada en esta tesis propone diseñar estos diálogos como una secuencia de elementos atómicos llamados Actos Comunicativos (CAs, por sus siglas en inglés). Se pueden parametrizar en tiempo de ejecución para completar diferentes objetivos comunicativos, y están equipados con mecanismos para manejar algunas de las imprecisiones que pueden aparecer durante interacciones. Estos CAs han sido identificados a partir de la combinación de dos dimensiones: iniciativa (si la tiene el robot o el usuario) e intención (si se pretende obtener o proporcionar información). Estos CAs pueden ser combinados siguiendo una estructura jerárquica para crear estructuras mas complejas que sean reutilizables. Esto simplifica el proceso para crear nuevas interacciones, permitiendo a los desarrolladores centrarse exclusivamente en diseñar el flujo del diálogo, sin tener que preocuparse de reimplementar otras funcionalidades que tienen que estar presentes en todas las interacciones (como el manejo de errores, por ejemplo). La expresividad del robot está basada en el uso de una librería de gestos, o expresiones, multimodales predefinidos, modelados como estructuras similares a máquinas de estados. El módulo que controla la expresividad recibe peticiones para realizar dichas expresiones, planifica su ejecución para evitar cualquier conflicto que pueda aparecer, las carga, y comprueba que su ejecución se complete sin problemas. El sistema es capaz también de generar estas expresiones en tiempo de ejecución a partir de una lista de acciones unimodales (como decir una frase, o mover una articulación). Una de las características más importantes de la arquitectura de expresividad propuesta es la integración de una serie de métodos de modulación que pueden ser usados para modificar los gestos del robot en tiempo de ejecución. Esto permite al robot adaptar estas expresiones en base a circunstancias particulares (aumentando al mismo tiempo la variabilidad de la expresividad del robot), y usar un número limitado de gestos para mostrar diferentes estados internos (como el estado emocional). Teniendo en cuenta que ser reconocido como un ser vivo es un requisito para poder participar en interacciones sociales, que un robot social muestre una apariencia de vivacidad es un factor clave en interacciones entre humanos y robots. Para ello, esta tesis propone dos soluciones. El primer método genera acciones a través de las diferentes interfaces del robot a intervalos. La frecuencia e intensidad de estas acciones están definidas en base a una señal que representa el pulso del robot. Dicha señal puede adaptarse al contexto de la interacción o al estado interno del robot. El segundo método enriquece las interacciones verbales entre el robot y el usuario prediciendo los gestos no verbales más apropiados en base al contenido del diálogo y a la intención comunicativa del robot. Un modelo basado en aprendizaje automático recibe la transcripción del mensaje verbal del robot, predice los gestos que deberían acompañarlo, y los sincroniza para que cada gesto empiece en el momento preciso. Este modelo se ha desarrollado usando una combinación de un encoder diseñado con una red neuronal Long-Short Term Memory, y un Conditional Random Field para predecir la secuencia de gestos que deben acompañar a la frase del robot. Todos los elementos presentados conforman el núcleo de una arquitectura de interacción humano-robot modular que ha sido integrada en múltiples plataformas, y probada bajo diferentes condiciones. El objetivo central de esta tesis es contribuir al área de interacción humano-robot con una nueva solución que es modular e independiente de la plataforma robótica, y que se centra en proporcionar a los desarrolladores las herramientas necesarias para desarrollar aplicaciones que requieran interacciones con personas.Society is experiencing a series of demographic changes that can result in an unbalance between the active working and non-working age populations. One of the solutions considered to mitigate this problem is the inclusion of robots in multiple sectors, including the service sector. But for this to be a viable solution, among other features, robots need to be able to interact with humans successfully. This thesis seeks to endow a social robot with the abilities required for a natural human-robot interactions. The main objective is to contribute to the body of knowledge on the area of Human-Robot Interaction with a new, platform-independent, modular approach that focuses on giving roboticists the tools required to develop applications that involve interactions with humans. In particular, this thesis focuses on three problems that need to be addressed: (i) modelling interactions between a robot and an user; (ii) endow the robot with the expressive capabilities required for a successful communication; and (iii) endow the robot with a lively appearance. The approach to dialogue modelling presented in this thesis proposes to model dialogues as a sequence of atomic interaction units, called Communicative Acts, or CAs. They can be parametrized in runtime to achieve different communicative goals, and are endowed with mechanisms oriented to solve some of the uncertainties related to interaction. Two dimensions have been used to identify the required CAs: initiative (the robot or the user), and intention (either retrieve information or to convey it). These basic CAs can be combined in a hierarchical manner to create more re-usable complex structures. This approach simplifies the creation of new interactions, by allowing developers to focus exclusively on designing the flow of the dialogue, without having to re-implement functionalities that are common to all dialogues (like error handling, for example). The expressiveness of the robot is based on the use of a library of predefined multimodal gestures, or expressions, modelled as state machines. The module managing the expressiveness receives requests for performing gestures, schedules their execution in order to avoid any possible conflict that might arise, loads them, and ensures that their execution goes without problems. The proposed approach is also able to generate expressions in runtime based on a list of unimodal actions (an utterance, the motion of a limb, etc...). One of the key features of the proposed expressiveness management approach is the integration of a series of modulation techniques that can be used to modify the robot’s expressions in runtime. This would allow the robot to adapt them to the particularities of a given situation (which would also increase the variability of the robot expressiveness), and to display different internal states with the same expressions. Considering that being recognized as a living being is a requirement for engaging in social encounters, the perception of a social robot as a living entity is a key requirement to foster human-robot interactions. In this dissertation, two approaches have been proposed. The first method generates actions for the different interfaces of the robot at certain intervals. The frequency and intensity of these actions are defined by a signal that represents the pulse of the robot, which can be adapted to the context of the interaction or the internal state of the robot. The second method enhances the robot’s utterance by predicting the appropriate non-verbal expressions that should accompany them, according to the content of the robot’s message, as well as its communicative intention. A deep learning model receives the transcription of the robot’s utterances, predicts which expressions should accompany it, and synchronizes them, so each gesture selected starts at the appropriate time. The model has been developed using a combination of a Long-Short Term Memory network-based encoder and a Conditional Random Field for generating a sequence of gestures that are combined with the robot’s utterance. All the elements presented above conform the core of a modular Human-Robot Interaction architecture that has been integrated in multiple platforms, and tested under different conditions.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Fernando Torres Medina.- Secretario: Concepción Alicia Monje Micharet.- Vocal: Amirabdollahian Farshi

    Interim research assessment 2003-2005 - Computer Science

    This report primarily serves as a source of information for the 2007 Interim Research Assessment Committee for Computer Science at the three technical universities in the Netherlands. The report also provides information for others interested in our research activities

    The Application of Mixed Reality Within Civil Nuclear Manufacturing and Operational Environments

    This thesis documents the design and application of Mixed Reality (MR) within a nuclear manufacturing cell through the creation of a Digitally Assisted Assembly Cell (DAAC). The DAAC is a proof of concept system, combining full body tracking within a room sized environment and bi-directional feedback mechanism to allow communication between users within the Virtual Environment (VE) and a manufacturing cell. This allows for training, remote assistance, delivery of work instructions, and data capture within a manufacturing cell. The research underpinning the DAAC encompasses four main areas; the nuclear industry, Virtual Reality (VR) and MR technology, MR within manufacturing, and finally the 4 th Industrial Revolution (IR4.0). Using an array of Kinect sensors, the DAAC was designed to capture user movements within a real manufacturing cell, which can be transferred in real time to a VE, creating a digital twin of the real cell. Users can interact with each other via digital assets and laser pointers projected into the cell, accompanied by a built-in Voice over Internet Protocol (VoIP) system. This allows for the capture of implicit knowledge from operators within the real manufacturing cell, as well as transfer of that knowledge to future operators. Additionally, users can connect to the VE from anywhere in the world. In this way, experts are able to communicate with the users in the real manufacturing cell and assist with their training. The human tracking data fills an identified gap in the IR4.0 network of Cyber Physical System (CPS), and could allow for future optimisations within manufacturing systems, Material Resource Planning (MRP) and Enterprise Resource Planning (ERP). This project is a demonstration of how MR could prove valuable within nuclear manufacture. The DAAC is designed to be low cost. It is hoped this will allow for its use by groups who have traditionally been priced out of MR technology. This could help Small to Medium Enterprises (SMEs) close the double digital divide between themselves and larger global corporations. For larger corporations it offers the benefit of being low cost, and, is consequently, easier to roll out across the value chain. Skills developed in one area can also be transferred to others across the internet, as users from one manufacturing cell can watch and communicate with those in another. However, as a proof of concept, the DAAC is at Technology Readiness Level (TRL) five or six and, prior to its wider application, further testing is required to asses and improve the technology. The work was patented in both the UK (S. R EDDISH et al., 2017a), the US (S. R EDDISH et al., 2017b) and China (S. R EDDISH et al., 2017c). The patents are owned by Rolls-Royce and cover the methods of bi-directional feedback from which users can interact from the digital to the real and vice versa. Stephen Reddish Mixed Mode Realities in Nuclear Manufacturing Key words: Mixed Mode Reality, Virtual Reality, Augmented Reality, Nuclear, Manufacture, Digital Twin, Cyber Physical Syste

    Multi-Agent Systems

    A multi-agent system (MAS) is a system composed of multiple interacting intelligent agents. Multi-agent systems can be used to solve problems which are difficult or impossible for an individual agent or monolithic system to solve. Agent systems are open and extensible systems that allow for the deployment of autonomous and proactive software components. Multi-agent systems have been brought up and used in several application domains

    Proceedings of the International Workshop "Innovation Information Technologies: Theory and Practice": Dresden, Germany, September 06-10.2010

    This International Workshop is a high quality seminar providing a forum for the exchange of scientific achievements between research communities of different universities and research institutes in the area of innovation information technologies. It is a continuation of the Russian-German Workshops that have been organized by the universities in Dresden, Karlsruhe and Ufa before. The workshop was arranged in 9 sessions covering the major topics: Modern Trends in Information Technology, Knowledge Based Systems and Semantic Modelling, Software Technology and High Performance Computing, Geo-Information Systems and Virtual Reality, System and Process Engineering, Process Control and Management and Corporate Information Systems