94 research outputs found
Introduction for speech and language for interactive robots
This special issue includes research articles which apply spoken language processing to robots that interact with human users through speech, possibly combined with other modalities. Robots that can listen to human speech, understand it, interact according to the conveyed meaning, and respond represent major research and technological challenges. Their common aim is to equip robots with natural interaction abilities. However, robotics and spoken language processing are areas that are typically studied within their respective communities with limited communication across disciplinary boundaries. The articles in this special issue represent examples that address the need for an increased multidisciplinary exchange of ideas
Design of Participatory Virtual Reality System for visualizing an intelligent adaptive cyberspace
The concept of 'Virtual Intelligence' is proposed as an intelligent adaptive interaction between the simulated 3-D dynamic environment and the 3-D dynamic virtual image of the participant in the cyberspace created by a virtual reality system. A system design for such interaction is realised utilising only a stereoscopic optical head-mounted LCD display with an ultrasonic head tracker, a pair of gesture-controlled fibre optic gloves and, a speech recogni(ion and synthesiser device, which are all connected to a Pentium computer. A 3-D dynamic environment is created by physically-based modelling and rendering in real-time and modification of existing object description files by afractals-based Morph software. It is supported by an extensive library of audio and video functions, and functions characterising the dynamics of various objects. The multimedia database files so created are retrieved or manipulated by intelligent hypermedia navigation and intelligent integration with existing information. Speech commands control the dynamics of the environment and the corresponding multimedia databases. The concept of a virtual camera developed by ZeIter as well as Thalmann and Thalmann, as automated by Noma and Okada, can be applied for dynamically relating the orientation and actions of the virtual image of the participant with respect to the simulated environment. Utilising the fibre optic gloves, gesture-based commands are given by the participant for controlling his 3-D virtual image using a gesture language. Optimal estimation methods and dataflow techniques enable synchronisation between the commands of the participant expressed through the gesture language and his 3-D dynamic virtual image. Utilising a framework, developed earlier by the author, for adaptive computational control of distribute multimedia systems, the data access required for the environment as well as the virtual image of the participant can be endowed with adaptive capability
Interaction Design for Digital Musical Instruments
The thesis aims to elucidate the process of designing interactive systems for musical performance that combine software and hardware in an intuitive and elegant fashion. The original contribution to knowledge consists of: (1) a critical assessment of recent trends in digital musical instrument design, (2) a descriptive model of interaction design for the digital musician and (3) a highly customisable multi-touch performance system that was designed in accordance with the model.
Digital musical instruments are composed of a separate control interface and a sound generation system that exchange information. When designing the way in which a digital musical instrument responds to the actions of a performer, we are creating a layer of interactive behaviour that is abstracted from the physical controls. Often, the structure of this layer depends heavily upon:
1. The accepted design conventions of the hardware in use
2. Established musical systems, acoustic or digital
3. The physical configuration of the hardware devices and the grouping of controls that such configuration suggests
This thesis proposes an alternate way to approach the design of digital musical instrument behaviour – examining the implicit characteristics of its composite devices. When we separate the conversational ability of a particular sensor type from its hardware body, we can look in a new way at the actual communication tools at the heart of the device. We can subsequently combine these separate pieces using a series of generic interaction strategies in order to create rich interactive experiences that are not immediately obvious or directly inspired by the physical properties of the hardware.
This research ultimately aims to enhance and clarify the existing toolkit of interaction design for the digital musician
Cybernetics in Music
This thesis examines the use of cybernetics (the science of systems) in music, through the tracing of an obscured history. The author postulates that cybernetic music may be thought of as genera of music in its own right, whose practitioners share a common ontology and set of working practices that distinctly differ from traditional approaches to composing electronic music. Ultimately, this critical examination of cybernetics in music provides the framework for a series of original compositions and the foundation of the further study of cybernetic music
Altering speech synthesis prosody through real time natural gestural control
A significant amount of research has been and continues to be undertaken into generating
expressive prosody within speech synthesis. Separately, recent developments in
HMM-based synthesis (specifically pHTS, developed at University of Mons) provide
a platform for reactive speech synthesis, able to react in real time to surroundings or
user interaction.
Considering both of these elements, this project explores whether it is possible to
generate superior prosody in a speech synthesis system, using natural gestural controls,
in real time. Building on a previous piece of work undertaken at The University of Edinburgh,
a system is constructed in which a user may apply a variety of prosodic effects
in real time through natural gestures, recognised by a Microsoft Kinect sensor. Gestures
are recognised and prosodic adjustments made through a series of hand-crafted
rules (based on data gathered from preliminary experiments), though machine learning
techniques are also considered within this project and recommended for future iterations
of the work.
Two sets of formal experiments are implemented, both of which suggest that - under
further development - the system developed may work successfully in a real world
environment. Firstly, user tests show that subjects can learn to control the device successfully,
adding prosodic effects to the intended words in the majority of cases with
practice. Results are likely to improve further as buffering issues are resolved. Secondly,
listening tests show that the prosodic effects currently implemented significantly
increase perceived naturalness, and in some cases are able to alter the semantic perception
of a sentence in an intended way.
Alongside this paper, a demonstration video of the project may be found on the accompanying
CD, or online at http://tinyurl.com/msc-synthesis. The reader is advised
to view this demonstration, as a way of understanding how the system functions and
sounds in action
Personalising synthetic voices for individuals with severe speech impairment.
Speech technology can help individuals with speech disorders to interact more easily. Many individuals with severe speech impairment, due to conditions such as Parkinson's disease or motor neurone disease, use voice output communication aids (VOCAs), which have synthesised or pre-recorded voice output. This voice output effectively becomes the voice of the individual and should therefore represent the user accurately.
Currently available personalisation of speech synthesis techniques require a large amount of data input, which is difficult to produce for individuals with severe speech impairment. These techniques also do not provide a solution for those individuals whose voices have begun to show the effects of dysarthria.
The thesis shows that Hidden Markov Model (HMM)-based speech synthesis is a promising approach for 'voice banking' for individuals before their condition causes deterioration of the speech and once deterioration has begun. Data input requirements for building personalised voices with this technique using human listener judgement evaluation is investigated. It shows that 100 sentences is the minimum required to build a significantly different voice from an average voice model and show some resemblance to the target speaker. This amount depends on the speaker and the average model used.
A neural network analysis trained on extracted acoustic features revealed that spectral features had the most influence for predicting human listener judgements of similarity of synthesised speech to a target speaker. Accuracy of prediction significantly improves if other acoustic features are introduced and combined non-linearly.
These results were used to inform the reconstruction of personalised synthetic voices for speakers whose voices had begun to show the effects of their conditions. Using HMM-based synthesis, personalised synthetic voices were built using dysarthric speech showing similarity to target speakers without recreating the impairment in the synthesised speech output
Portfolio of Electroacoustic Compositions with Commentaries
This portfolio consists of electroacoustic compositions which were primarily realised
through the use of corporeally informed compositional practices. The manner in which a
composer interacts with the compositional tools and musical materials at their disposal
is a defining factor in the creation of musical works. Although the use of computers in
the practice of electroacoustic composition has extended the range of sonic possibilities
afforded to composers, it has also had a negative impact on the level of physical interaction
that composers have with these musical materials. This thesis is an investigation
into the use of mediation technologies with the aim of circumventing issues relating to
the physical performance of electroacoustic music.
This line of inquiry has led me to experiment with embedded computers, wearable
technologies, and a range of various sensors. The specific tools that were used in the
creation of the pieces within this portfolio are examined in detail within this thesis. I also
provide commentaries and analysis of the eleven electroacoustic works which comprise
this portfolio, describing the thought processes that led to their inception, the materials
used in their creation, and the tools and techniques that I employed throughout the
compositional process
- …