Search CORE

170,413 research outputs found

MAUS Goes Iterative

Author: Lino Maria Teresa
Schiel Florian
Publication venue
Publication date: 01/01/2004
Field of study

In this paper we describe further developments of the MAUS system and announce a free-ware software package that may be downloaded from the ’Bavarian Archive for Speech Signals’ (BAS) web site. The quality of the MAUS output can be considerably improved by using an iterative technique. In this mode MAUS will calculated a first pass through all the target speech material using the standard speaker-independent acoustical models of the target language. Then the segmented and labelled speech data are used to re-estimated the acoustical models and the MAUS procedure is applied again to the speech data using these speaker-dependent models. The last two steps are repeated iteratively until the segmentation converges. The paper describes the general algorithm, the German benchmark for evaluating the method as well as some experiments on German target speakers

Using qualitative research methods to inform user centred design of an innovative assistive technology device

Author: Cunningham S.P.
Hawley M.
Judge S.
Kolluru B.
Robertson Z.
Publication venue: Department of Zoology, University of Cambridge
Publication date
Field of study

The SPECS project aims to develop a speech-driven device that will allow the home environment to be controlled (for example turning on or off the lights or television). The device developed will be targeted at older people and people with disabilities and will be sensitive to disordered speech. Current environmental control systems (ECS) work using either a switch interface or speech recognition software that does not comprehend disordered speech well. Switch-interface systems are often slow and complicated to use and the uptake of the available speech recognition system has been poor. A significant proportion of people requiring electronic assistive technology (EAT) have dysarthria, a motor speech disorder, associated with their physical disability. Speech control of EAT is seen as desirable for such people but machine recognition of dysarthric speech is a difficult problem due to the variability of their articulatory output. Other work on large vocabulary adaptive speech recognition systems and speaker dependent recognisers has not provided a solution for severely dysarthric speech. Building on the work of the STARDUST project our goal is to develop and implement speech recognition as a viable control interface for people with severe physical disability and severe dysarthria. The SPECS project is funded by the Health Technology Devices Programme of the Department of Health

VOICE MODIFICATION USING DIGITAL TECHNOLOGY

Author: Nik Mohd Naaim Nik Najiha Wahida
Publication venue: Universiti Teknologi Petronas
Publication date: 01/04/2004
Field of study

This report describes the voice modification using digital technology using MATLAB software and an additional hardware prototype designed. The voice modification objective that a source voice signal is mapped into another target voice signal. Other objective of this voice modification is to create a new voice signal from a given source signal. These modifications are carried by altering the voice waveform features. By enabling the modification, it is expected that user is able to mimic other person's voice. In addition, it can be used as a reference and guidance for voice conversion. For the modification algorithm, an optimization technique for the coding is applied to suit the objective of the project [7]. The project's major activities are to develop programs using MATLAB, that enable speech signal recording, analysis, synthesis, modification, and conversion. The recording is just a simple procedure, where it can be recorded using any computer, with a microphone. Analysis of speech signal is an essential step where the speech waveform features are calculated. Here, we model a vocal tract, which resemble as a filter for the excitation signal input. This filter is designed based on source speech waveform features. The speech waveform parameters used in this analysis are excitation source model, vocal tract model and control model which consist of gain and pitch parameter. Synthesis procedure is a step to produce a synthetic voice. This synthetic voice is just the display of speech signal designed in the analysis part based on the source signal. The modification and conversion parts of the software are performed for the voice modification. The modification part enables the user to change the speech waveform parameters calculated and thus creates a new voice. The conversion part is actually performs a mapping of source speech signal to target speech signal. In this work, hardware also is designed as an additional part to demonstrate the application. The prototype designed performs the conversion process by modulating the frequency of input signal. Tests were carried on both software and hardware. The software conversion design can convert any source voice signal to target voice signal. But the output will contains some noises. The hardware model can also modify any input voice signal to another form [14]. But modification is very much limited to seven types of output voices. Here also, noises are noticed. Since the source codes of the voice conversion software are huge, it is given in Volume 2, under title Source Code of Voice Modification Software

UTPedia

LinguaTag: an Emotional Speech Analysis Application

Author: Cullen Charlie
Kousidis Spyros
Vaughan Brian
Publication venue: Technological University Dublin
Publication date: 01/01/2008
Field of study

The analysis of speech, particularly for emotional content, is an open area of current research. Ongoing work has developed an emotional speech corpus for analysis, and defined a vowel stress method by which this analysis may be performed. This paper documents the development of LinguaTag, an open source speech analysis software application which implements this vowel stress emotional speech analysis method developed as part of research into the acoustic and linguistic correlates of emotional speech. The analysis output is contained within a file format combining SMIL and SSML markup tags, to facilitate search and retrieval methods within an emotional speech corpus database. In this manner, analysis performed using LinguaTag aims to combine acoustic, emotional and linguistic descriptors in a single metadata framework

Arrow@TUDublin

Development of a Voice-Controlled Human-Robot Interface

Author: Khaewratana Warat
Publication venue: RIT Scholar Works
Publication date: 13/05/2016
Field of study

The goal of this thesis is to develop a voice-controlled human-robot interface (HRI) which allows a person to control and communicate with a robot. Dragon NaturallySpeaking, a commercially available automatic speech recognition engine, was chosen for the development of the proposed HRI. In order to achieve the goal, the Dragon software is used to create custom commands (or macros) which must satisfy the tasks of (a) directly controlling the robot with voice, (b) writing a robot program with voice, and (c) developing a HRI which allows the human and robot to communicate with each other using speech. The key is to generate keystrokes upon recognizing the speech and three types of macro including step-by-step, macro recorder, and advanced scripting. Experiment was conducted in three phases to test the functionality of the developed macros in accomplishing all three tasks. The result showed that advanced scripting macro is the only type of macro that works. It is also the most suitable for the task because it is quick and easy to create and can be used to develop flexible and natural voice command. Since the output of macro is a series of keystrokes, which forms a syntax for the robot program, macros developed by the Dragon software can be used to communicate with virtually any robots by making an adjustment on the output keystroke

RIT Scholar Works

Design of a computerised treatment for short-term memory deficits in aphasia

Author: Giles Jane
Howard David
Hwang Faustina
Laird Robert
McCain Laura
Molero Diana
Salis Christos
Publication venue
Publication date
Field of study

The treatment of auditory-verbal short-term memory (STM) deficits in aphasia is a growing avenue of research (Martin & Reilly, 2012; Murray, 2012). STM treatment requires time precision, which is suited to computerised delivery. We have designed software, which provides STM treatment for aphasia. The treatment is based on matching listening span tasks (Howard & Franklin, 1990), aiming to improve the temporal maintenance of multi-word sequences (Salis, 2012). The person listens to pairs of word-lists that differ in word-order and decides if the pairs are the same or different. This approach does not require speech output and is suitable for persons with aphasia who have limited or no output. We describe the software and how its review from clinicians shaped its design

Effective Detection of Local Languages for Tourists Based on Surrounding Features

Author: Eze Tobenna
Publication venue: The Repository at St. Cloud State
Publication date: 01/12/2022
Field of study

The tourism industry is a trillion-dollar industry with many governments investing heavily in making their countries attractive enough to entice potential visitors. People engage in tourism due to different reasons which could range from business, education, leisure, medical or ancestral reasons. Communication between intending visitors and locals is essential, given the non-homogeneity that occurs across cultures and borders. In this paper, we focus on developing a cross-platform mobile application that listens to surrounding conversations, is able to pick certain keywords, automatically switch to the local language of its location and then offer translation capabilities to facilitate conversations. To implement this, we depend on the Google translate API for the translation capabilities of the application, starting with the English language as our base language. To provide the input (speech) for translation, we solely employ speech recognition software using the Speech-to-Text package available on Flutter. The output with the correct pronunciation (and local accent) of the translation is done with the Text-to-Speech package. If the application does not recognize any keywords, the local language can be determined using the geographical parameters of the user. Finally, we utilize the cross-platform competence of the Flutter software development kit and the Dart programming language to build the application

St. Cloud State University