94 research outputs found

    Introduction for speech and language for interactive robots

    Get PDF
    This special issue includes research articles which apply spoken language processing to robots that interact with human users through speech, possibly combined with other modalities. Robots that can listen to human speech, understand it, interact according to the conveyed meaning, and respond represent major research and technological challenges. Their common aim is to equip robots with natural interaction abilities. However, robotics and spoken language processing are areas that are typically studied within their respective communities with limited communication across disciplinary boundaries. The articles in this special issue represent examples that address the need for an increased multidisciplinary exchange of ideas

    Design of Participatory Virtual Reality System for visualizing an intelligent adaptive cyberspace

    Get PDF
    The concept of 'Virtual Intelligence' is proposed as an intelligent adaptive interaction between the simulated 3-D dynamic environment and the 3-D dynamic virtual image of the participant in the cyberspace created by a virtual reality system. A system design for such interaction is realised utilising only a stereoscopic optical head-mounted LCD display with an ultrasonic head tracker, a pair of gesture-controlled fibre optic gloves and, a speech recogni(ion and synthesiser device, which are all connected to a Pentium computer. A 3-D dynamic environment is created by physically-based modelling and rendering in real-time and modification of existing object description files by afractals-based Morph software. It is supported by an extensive library of audio and video functions, and functions characterising the dynamics of various objects. The multimedia database files so created are retrieved or manipulated by intelligent hypermedia navigation and intelligent integration with existing information. Speech commands control the dynamics of the environment and the corresponding multimedia databases. The concept of a virtual camera developed by ZeIter as well as Thalmann and Thalmann, as automated by Noma and Okada, can be applied for dynamically relating the orientation and actions of the virtual image of the participant with respect to the simulated environment. Utilising the fibre optic gloves, gesture-based commands are given by the participant for controlling his 3-D virtual image using a gesture language. Optimal estimation methods and dataflow techniques enable synchronisation between the commands of the participant expressed through the gesture language and his 3-D dynamic virtual image. Utilising a framework, developed earlier by the author, for adaptive computational control of distribute multimedia systems, the data access required for the environment as well as the virtual image of the participant can be endowed with adaptive capability

    Developing Intelligent MultiMedia applications

    Get PDF

    Interaction Design for Digital Musical Instruments

    Get PDF
    The thesis aims to elucidate the process of designing interactive systems for musical performance that combine software and hardware in an intuitive and elegant fashion. The original contribution to knowledge consists of: (1) a critical assessment of recent trends in digital musical instrument design, (2) a descriptive model of interaction design for the digital musician and (3) a highly customisable multi-touch performance system that was designed in accordance with the model. Digital musical instruments are composed of a separate control interface and a sound generation system that exchange information. When designing the way in which a digital musical instrument responds to the actions of a performer, we are creating a layer of interactive behaviour that is abstracted from the physical controls. Often, the structure of this layer depends heavily upon: 1. The accepted design conventions of the hardware in use 2. Established musical systems, acoustic or digital 3. The physical configuration of the hardware devices and the grouping of controls that such configuration suggests This thesis proposes an alternate way to approach the design of digital musical instrument behaviour – examining the implicit characteristics of its composite devices. When we separate the conversational ability of a particular sensor type from its hardware body, we can look in a new way at the actual communication tools at the heart of the device. We can subsequently combine these separate pieces using a series of generic interaction strategies in order to create rich interactive experiences that are not immediately obvious or directly inspired by the physical properties of the hardware. This research ultimately aims to enhance and clarify the existing toolkit of interaction design for the digital musician

    Cybernetics in Music

    Get PDF
    This thesis examines the use of cybernetics (the science of systems) in music, through the tracing of an obscured history. The author postulates that cybernetic music may be thought of as genera of music in its own right, whose practitioners share a common ontology and set of working practices that distinctly differ from traditional approaches to composing electronic music. Ultimately, this critical examination of cybernetics in music provides the framework for a series of original compositions and the foundation of the further study of cybernetic music

    Altering speech synthesis prosody through real time natural gestural control

    Get PDF
    A significant amount of research has been and continues to be undertaken into generating expressive prosody within speech synthesis. Separately, recent developments in HMM-based synthesis (specifically pHTS, developed at University of Mons) provide a platform for reactive speech synthesis, able to react in real time to surroundings or user interaction. Considering both of these elements, this project explores whether it is possible to generate superior prosody in a speech synthesis system, using natural gestural controls, in real time. Building on a previous piece of work undertaken at The University of Edinburgh, a system is constructed in which a user may apply a variety of prosodic effects in real time through natural gestures, recognised by a Microsoft Kinect sensor. Gestures are recognised and prosodic adjustments made through a series of hand-crafted rules (based on data gathered from preliminary experiments), though machine learning techniques are also considered within this project and recommended for future iterations of the work. Two sets of formal experiments are implemented, both of which suggest that - under further development - the system developed may work successfully in a real world environment. Firstly, user tests show that subjects can learn to control the device successfully, adding prosodic effects to the intended words in the majority of cases with practice. Results are likely to improve further as buffering issues are resolved. Secondly, listening tests show that the prosodic effects currently implemented significantly increase perceived naturalness, and in some cases are able to alter the semantic perception of a sentence in an intended way. Alongside this paper, a demonstration video of the project may be found on the accompanying CD, or online at http://tinyurl.com/msc-synthesis. The reader is advised to view this demonstration, as a way of understanding how the system functions and sounds in action

    Personalising synthetic voices for individuals with severe speech impairment.

    Get PDF
    Speech technology can help individuals with speech disorders to interact more easily. Many individuals with severe speech impairment, due to conditions such as Parkinson's disease or motor neurone disease, use voice output communication aids (VOCAs), which have synthesised or pre-recorded voice output. This voice output effectively becomes the voice of the individual and should therefore represent the user accurately. Currently available personalisation of speech synthesis techniques require a large amount of data input, which is difficult to produce for individuals with severe speech impairment. These techniques also do not provide a solution for those individuals whose voices have begun to show the effects of dysarthria. The thesis shows that Hidden Markov Model (HMM)-based speech synthesis is a promising approach for 'voice banking' for individuals before their condition causes deterioration of the speech and once deterioration has begun. Data input requirements for building personalised voices with this technique using human listener judgement evaluation is investigated. It shows that 100 sentences is the minimum required to build a significantly different voice from an average voice model and show some resemblance to the target speaker. This amount depends on the speaker and the average model used. A neural network analysis trained on extracted acoustic features revealed that spectral features had the most influence for predicting human listener judgements of similarity of synthesised speech to a target speaker. Accuracy of prediction significantly improves if other acoustic features are introduced and combined non-linearly. These results were used to inform the reconstruction of personalised synthetic voices for speakers whose voices had begun to show the effects of their conditions. Using HMM-based synthesis, personalised synthetic voices were built using dysarthric speech showing similarity to target speakers without recreating the impairment in the synthesised speech output

    Portfolio of Electroacoustic Compositions with Commentaries

    Get PDF
    This portfolio consists of electroacoustic compositions which were primarily realised through the use of corporeally informed compositional practices. The manner in which a composer interacts with the compositional tools and musical materials at their disposal is a defining factor in the creation of musical works. Although the use of computers in the practice of electroacoustic composition has extended the range of sonic possibilities afforded to composers, it has also had a negative impact on the level of physical interaction that composers have with these musical materials. This thesis is an investigation into the use of mediation technologies with the aim of circumventing issues relating to the physical performance of electroacoustic music. This line of inquiry has led me to experiment with embedded computers, wearable technologies, and a range of various sensors. The specific tools that were used in the creation of the pieces within this portfolio are examined in detail within this thesis. I also provide commentaries and analysis of the eleven electroacoustic works which comprise this portfolio, describing the thought processes that led to their inception, the materials used in their creation, and the tools and techniques that I employed throughout the compositional process
    corecore