1,054 research outputs found

    Developing Intelligent MultiMedia applications

    Get PDF

    Emergence of articulatory-acoustic systems from deictic interaction games in a "vocalize to localize" framework

    Get PDF
    International audienceSince the 70's and Lindblom's proposal to "derive language from non-language", phoneticians have developed a number of "substance-based" theories. The starting point is Lindblom's Dispersion Theory and Stevens's Quantal Theory, which open the way to a rich tradition of works attempting to determine and possibly model how phonological systems could be shaped by the perceptuo-motor substance of speech communication. These works search to derive the shapes of human languages from constraints arising from perceptual (auditory and perhaps visual) and motor (articulatory and cognitive) properties of the speech communication system: we call them "Morphogenesis Theories". More recently, a number of proposals were introduced in order to connect pre-linguistic primate abilities (such as vocalization, gestures, mastication or deixis) to human language. For instance, in the "Vocalize-to-Localize" framework that we adopt in the present work (Abry & al., 2004), human language is supposed to derive from a precursor deictic function, considering that language could have provided at the beginning an evolutionary development of the ability to "show with the voice". We call this type of theories "Origins Theories". We propose that the principles of Morphogenesis Theories (such as dispersion principles or the quantal nature of speech) can be incorporated and to a certain extent derived from Origins Theories. While Morphogenesis Theories raise questions such as "why are vowel systems shaped the way they are?" and answer that it is to increase auditory dispersion in order to prevent confusion between them, we ask questions such as "why do humans attempt to prevent confusion between percepts?" and answer that it could be to "show with the voice", that is, to improve the pre-linguistic deictic function. In this paper, we present a computational Bayesian model incorporating the Dispersion and Quantal Theories of speech sounds inside the Vocalize-to-Localize framework, and show how realistic simulations of vowel systems can emerge from this model

    GestUI: A Model-driven Method and Tool for Including Gesture-based Interaction in User Interfaces

    Get PDF
    [EN] Among the technological advances in touch-based devices, gesture-based interaction have become a prevalent feature in many application domains. Information systems are starting to explore this type of interaction. As a result, gesture specifications are now being hard-coded by developers at the source code level that hinders their reusability and portability. Similarly, defining new gestures that reflect user requirements is a complex process. This paper describes a model-driven approach to include gesture-based interaction in desktop information systems. It incorporates a tool prototype that captures user-sketched multi-stroke gestures and transforms them into a model by automatically generating the gesture catalogue for gesture-based interaction technologies and gesture-based user interface source codes. We demonstrated our approach in several applications ranging from case tools to form-based information systems.This work was supported by SENESCYT and Universidad de Cuenca from Ecuador, and received financial support from Generalitat Valenciana under Project IDEO (PROMETEOII/2014/039).Parra-González, LO.; España Cubillo, S.; Pastor López, O. (2016). GestUI: A Model-driven Method and Tool for Including Gesture-based Interaction in User Interfaces. Complex Systems Informatics and Modeling Quarterly. 6:73-92. https://doi.org/10.7250/csimq.2016-6.05S7392

    SiAM-dp : an open development platform for massively multimodal dialogue systems in cyber-physical environments

    Get PDF
    Cyber-physical environments enhance natural environments of daily life such as homes, factories, offices, and cars by connecting the cybernetic world of computers and communication with the real physical world. While under the keyword of Industrie 4.0, cyber-physical environments will take a relevant role in the next industrial revolution, and they will also appear in homes, offices, workshops, and numerous other areas. In this new world, classical interaction concepts where users exclusively interact with a single stationary device, PC or smartphone become less dominant and make room for new occurrences of interaction between humans and the environment itself. Furthermore, new technologies and a rising spectrum of applicable modalities broaden the possibilities for interaction designers to include more natural and intuitive non-verbal and verbal communication. The dynamic characteristic of a cyber-physical environment and the mobility of users confronts developers with the challenge of developing systems that are flexible concerning the connected and used devices and modalities. This implies new opportunities for cross-modal interaction that go beyond dual modalities interaction as is well known nowadays. This thesis addresses the support of application developers with a platform for the declarative and model based development of multimodal dialogue applications, with a focus on distributed input and output devices in cyber-physical environments. The main contributions can be divided into three parts: - Design of models and strategies for the specification of dialogue applications in a declarative development approach. This includes models for the definition of project resources, dialogue behaviour, speech recognition grammars, and graphical user interfaces and mapping rules, which convert the device specific representation of input and output description to a common representation language. - The implementation of a runtime platform that provides a flexible and extendable architecture for the easy integration of new devices and components. The platform realises concepts and strategies of multimodal human-computer interaction and is the basis for full-fledged multimodal dialogue applications for arbitrary device setups, domains, and scenarios. - A software development toolkit that is integrated in the Eclipse rich client platform and provides wizards and editors for creating and editing new multimodal dialogue applications.Cyber-physische Umgebungen (CPEs) erweitern natürliche Alltagsumgebungen wie Heim, Fabrik, Büro und Auto durch Verbindung der kybernetischen Welt der Computer und Kommunikation mit der realen, physischen Welt. Die möglichen Anwendungsgebiete hierbei sind weitreichend. Während unter dem Stichwort Industrie 4.0 cyber-physische Umgebungen eine bedeutende Rolle für die nächste industrielle Revolution spielen werden, erhalten sie ebenfalls Einzug in Heim, Büro, Werkstatt und zahlreiche weitere Bereiche. In solch einer neuen Welt geraten klassische Interaktionskonzepte, in denen Benutzer ausschließlich mit einem einzigen Gerät, PC oder Smartphone interagieren, immer weiter in den Hintergrund und machen Platz für eine neue Ausprägung der Interaktion zwischen dem Menschen und der Umgebung selbst. Darüber hinaus sorgen neue Technologien und ein wachsendes Spektrum an einsetzbaren Modalitäten dafür, dass sich im Interaktionsdesign neue Möglichkeiten für eine natürlichere und intuitivere verbale und nonverbale Kommunikation auftun. Die dynamische Natur von cyber-physischen Umgebungen und die Mobilität der Benutzer darin stellt Anwendungsentwickler vor die Herausforderung, Systeme zu entwickeln, die flexibel bezüglich der verbundenen und verwendeten Geräte und Modalitäten sind. Dies impliziert auch neue Möglichkeiten in der modalitätsübergreifenden Kommunikation, die über duale Interaktionskonzepte, wie sie heutzutage bereits üblich sind, hinausgehen. Die vorliegende Arbeit befasst sich mit der Unterstützung von Anwendungsentwicklern mit Hilfe einer Plattform zur deklarativen und modellbasierten Entwicklung von multimodalen Dialogapplikationen mit einem Fokus auf verteilte Ein- und Ausgabegeräte in cyber-physischen Umgebungen. Die bearbeiteten Aufgaben können grundlegend in drei Teile gegliedert werden: - Die Konzeption von Modellen und Strategien für die Spezifikation von Dialoganwendungen in einem deklarativen Entwicklungsansatz. Dies beinhaltet Modelle für das Definieren von Projektressourcen, Dialogverhalten, Spracherkennergrammatiken, graphischen Benutzerschnittstellen und Abbildungsregeln, die die gerätespezifische Darstellung von Ein- und Ausgabegeräten in eine gemeinsame Repräsentationssprache transformieren. - Die Implementierung einer Laufzeitumgebung, die eine flexible und erweiterbare Architektur für die einfache Integration neuer Geräte und Komponenten bietet. Die Plattform realisiert Konzepte und Strategien der multimodalen Mensch-Maschine-Interaktion und ist die Basis vollwertiger multimodaler Dialoganwendungen für beliebige Domänen, Szenarien und Gerätekonfigurationen. - Eine Softwareentwicklungsumgebung, die in die Eclipse Rich Client Plattform integriert ist und Entwicklern Assistenten und Editoren an die Hand gibt, die das Erstellen und Editieren von neuen multimodalen Dialoganwendungen unterstützen

    MULTI-MODAL TASK INSTRUCTIONS TO ROBOTS BY NAIVE USERS

    Get PDF
    This thesis presents a theoretical framework for the design of user-programmable robots. The objective of the work is to investigate multi-modal unconstrained natural instructions given to robots in order to design a learning robot. A corpus-centred approach is used to design an agent that can reason, learn and interact with a human in a natural unconstrained way. The corpus-centred design approach is formalised and developed in detail. It requires the developer to record a human during interaction and analyse the recordings to find instruction primitives. These are then implemented into a robot. The focus of this work has been on how to combine speech and gesture using rules extracted from the analysis of a corpus. A multi-modal integration algorithm is presented, that can use timing and semantics to group, match and unify gesture and language. The algorithm always achieves correct pairings on a corpus and initiates questions to the user in ambiguous cases or missing information. The domain of card games has been investigated, because of its variety of games which are rich in rules and contain sequences. A further focus of the work is on the translation of rule-based instructions. Most multi-modal interfaces to date have only considered sequential instructions. The combination of frame-based reasoning, a knowledge base organised as an ontology and a problem solver engine is used to store these rules. The understanding of rule instructions, which contain conditional and imaginary situations require an agent with complex reasoning capabilities. A test system of the agent implementation is also described. Tests to confirm the implementation by playing back the corpus are presented. Furthermore, deployment test results with the implemented agent and human subjects are presented and discussed. The tests showed that the rate of errors that are due to the sentences not being defined in the grammar does not decrease by an acceptable rate when new grammar is introduced. This was particularly the case for complex verbal rule instructions which have a large variety of being expressed

    Machine Learning and Cognitive Robotics: Opportunities and Challenges

    Get PDF
    The chapter reviews recent developments in cognitive robotics, challenges and opportunities brought by new developments in machine learning (ML) and information communication technology (ICT), with a view to simulating research. To draw insights into the current trends and challenges, a review of algorithms and systems is undertaken. Furthermore, a case study involving human activity recognition, as well as face and emotion recognition, is also presented. Open research questions and future trends are then presented
    corecore