130 research outputs found

    Speech Processes for Brain-Computer Interfaces

    Get PDF
    Speech interfaces have become widely used and are integrated in many applications and devices. However, speech interfaces require the user to produce intelligible speech, which might be hindered by loud environments, concern to bother bystanders or the general in- ability to produce speech due to disabilities. Decoding a usera s imagined speech instead of actual speech would solve this problem. Such a Brain-Computer Interface (BCI) based on imagined speech would enable fast and natural communication without the need to actually speak out loud. These interfaces could provide a voice to otherwise mute people. This dissertation investigates BCIs based on speech processes using functional Near In- frared Spectroscopy (fNIRS) and Electrocorticography (ECoG), two brain activity imaging modalities on opposing ends of an invasiveness scale. Brain activity data have low signal- to-noise ratio and complex spatio-temporal and spectral coherence. To analyze these data, techniques from the areas of machine learning, neuroscience and Automatic Speech Recog- nition are combined in this dissertation to facilitate robust classification of detailed speech processes while simultaneously illustrating the underlying neural processes. fNIRS is an imaging modality based on cerebral blood flow. It only requires affordable hardware and can be set up within minutes in a day-to-day environment. Therefore, it is ideally suited for convenient user interfaces. However, the hemodynamic processes measured by fNIRS are slow in nature and the technology therefore offers poor temporal resolution. We investigate speech in fNIRS and demonstrate classification of speech processes for BCIs based on fNIRS. ECoG provides ideal signal properties by invasively measuring electrical potentials artifact- free directly on the brain surface. High spatial resolution and temporal resolution down to millisecond sampling provide localized information with accurate enough timing to capture the fast process underlying speech production. This dissertation presents the Brain-to- Text system, which harnesses automatic speech recognition technology to decode a textual representation of continuous speech from ECoG. This could allow to compose messages or to issue commands through a BCI. While the decoding of a textual representation is unparalleled for device control and typing, direct communication is even more natural if the full expressive power of speech - including emphasis and prosody - could be provided. For this purpose, a second system is presented, which directly synthesizes neural signals into audible speech, which could enable conversation with friends and family through a BCI. Up to now, both systems, the Brain-to-Text and synthesis system are operating on audibly produced speech. To bridge the gap to the final frontier of neural prostheses based on imagined speech processes, we investigate the differences between audibly produced and imagined speech and present first results towards BCI from imagined speech processes. This dissertation demonstrates the usage of speech processes as a paradigm for BCI for the first time. Speech processes offer a fast and natural interaction paradigm which will help patients and healthy users alike to communicate with computers and with friends and family efficiently through BCIs

    Prosodic modules for speech recognition and understanding in VERBMOBIL

    Get PDF
    Within VERBMOBIL, a large project on spoken language research in Germany, two modules for detecting and recognizing prosodic events have been developed. One module operates on speech signal parameters and the word hypothesis graph, whereas the other module, designed for a novel, highly interactive architecture, only uses speech signal parameters as its input. Phrase boundaries, sentence modality, and accents are detected. The recognition rates in spontaneous dialogs are for accents up to 82,5%, for phrase boundaries up to 91,7%

    Current state of digital signal processing in myoelectric interfaces and related applications

    Get PDF
    This review discusses the critical issues and recommended practices from the perspective of myoelectric interfaces. The major benefits and challenges of myoelectric interfaces are evaluated. The article aims to fill gaps left by previous reviews and identify avenues for future research. Recommendations are given, for example, for electrode placement, sampling rate, segmentation, and classifiers. Four groups of applications where myoelectric interfaces have been adopted are identified: assistive technology, rehabilitation technology, input devices, and silent speech interfaces. The state-of-the-art applications in each of these groups are presented.Peer reviewe

    An experimental DSP-based tactile hearing aid : a feasibility study

    Get PDF

    Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systems

    Get PDF
    A thesis submitted to the Faculty of Engineering and the Built Environment, University of the Witwatersrand, Johannesburg, in ful lment of the requirements for the degree of Doctor of Philosophy. Johannesburg, September 2016Having survived the ordeal of a laryngectomy, the patient must come to terms with the resulting loss of speech. With recent advances in portable computing power, automatic lip-reading (ALR) may become a viable approach to voice restoration. This thesis addresses the image processing aspect of ALR, and focuses three contributions to colour-based lip segmentation. The rst contribution concerns the colour transform to enhance the contrast between the lips and skin. This thesis presents the most comprehensive study to date by measuring the overlap between lip and skin histograms for 33 di erent colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%, and results show that selecting the correct transform can increase the segmentation accuracy by up to three times. The second contribution is the development of a new lip segmentation algorithm that utilises the best colour transforms from the comparative study. The algorithm is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation error (SE) of 7:39 %. The third contribution focuses on the impact of the histogram threshold on the segmentation accuracy, and introduces a novel technique called Adaptive Threshold Optimisation (ATO) to select a better threshold value. The rst stage of ATO incorporates -SVR to train the lip shape model. ATO then uses feedback of shape information to validate and optimise the threshold. After applying ATO, the SE decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp or relative improvement of 15:1%. While this thesis concerns lip segmentation in particular, ATO is a threshold selection technique that can be used in various segmentation applications.MT201

    Speech analysis and synthesis using an auditory model

    Get PDF
    Many traditional speech analysis/synthesis techniques are designed to produce speech with a spectrum that is as close as possible to the original. This may not be necessary because the auditory nerve is the only link from the auditory periphery to the brain, and all information that is processed by the higher auditory system must exist in the auditory nerve firing patterns. Rather than matching the synthesised speech spectra to the original representation, it should be sufficient that the representations of the synthetic and original speech be similar at the auditory nerve level. This thesis develops a speech analysis system that incorporates a computationally efficient model of the auditory periphery. Timing-synchrony information is employed to exploit the in-synchrony phenomena observed in neuron firing patterns to form a nonlinear relative spectrum intensity measure. This measure is used to select specific dominant frequencies to reproduce the speech based on a synthesis-by-sinusoid approach. The resulting speech is found to be intelligible even when only a fraction of the original frequencies are selected for synthesis. Additionally, the synthesised speech is highly noise immune, and exhibits noise reduction due to the coherence property of the frequency transform algorithm, and the dominance effect of the spectrum intensity measure. This noise reduction and low bit rate potential of the speech analysis system is exploited to produce a highly noise immune synthesis that outperforms similar representations formed both by a more physiologically accurate model and a classical non-biological speech processing algorithm. Such a representation has potential application in low-bit rate systems, particularly as a front end to an automatic speech recogniser

    Electroacoustical simulation of listening room acoustics for project ARCHIMEDES

    Get PDF
    corecore