22,710 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Evaluation of bistable systems versus matched filters in detecting bipolar pulse signals

    Full text link
    This paper presents a thorough evaluation of a bistable system versus a matched filter in detecting bipolar pulse signals. The detectability of the bistable system can be optimized by adding noise, i.e. the stochastic resonance (SR) phenomenon. This SR effect is also demonstrated by approximate statistical detection theory of the bistable system and corresponding numerical simulations. Furthermore, the performance comparison results between the bistable system and the matched filter show that (a) the bistable system is more robust than the matched filter in detecting signals with disturbed pulse rates, and (b) the bistable system approaches the performance of the matched filter in detecting unknown arrival times of received signals, with an especially better computational efficiency. These significant results verify the potential applicability of the bistable system in signal detection field.Comment: 15 pages, 9 figures, MikTex v2.

    Hybrid Multiresolution Analysis Of ‘Punch’ In Musical Signals

    Get PDF
    This paper presents a hybrid multi-resolution technique for the extraction and measurement of attributes contained within a musical signal. Decomposing music into simpler percussive, harmonic and noise components is useful when detailed extraction of signal attributes is required. The key parameter of interest in this paper is that of punch. A methodology is explored that decomposes the musical signal using a critically sampled constant-Q filterbank of quadrature mirror filters (QMF) before adaptive windowed short term Fourier transforms (STFT). The proposed hybrid method offers accuracy in both the time and frequency domains. Following the decomposition transform process, attributes are analyzed. It is shown that analysis of these components may yield parameters that would be of use in both mixing/mastering and also audio transcription and retrieval

    A mechatronic approach to supernormal auditory localisation

    Get PDF
    Remote audio perception is a fundamental requirement for telepresence and teleoperation in applications that range from work in hostile environments to security and entertainment. The following paper presents the use of a mechatronic system to test the efficacy of audio for telepresence. It describes work to determine whether the use of supernormal inter-aural distance is a valid means of approaching an enhanced method of hearing for telepresence. The particular audio variable investigated is the azimuth angle of error and the construction of a dedicated mechatronic test rig is reported and the results obtained. The paper concludes by observing that the combination of the mechatronic system and supernormal audition does enhance the ability to localise sound sources and that further work in this area is justified

    Improving the Speech Intelligibility By Cochlear Implant Users

    Get PDF
    In this thesis, we focus on improving the intelligibility of speech for cochlear implants (CI) users. As an auditory prosthetic device, CI can restore hearing sensations for most patients with profound hearing loss in both ears in a quiet background. However, CI users still have serious problems in understanding speech in noisy and reverberant environments. Also, bandwidth limitation, missing temporal fine structures, and reduced spectral resolution due to a limited number of electrodes are other factors that raise the difficulty of hearing in noisy conditions for CI users, regardless of the type of noise. To mitigate these difficulties for CI listener, we investigate several contributing factors such as the effects of low harmonics on tone identification in natural and vocoded speech, the contribution of matched envelope dynamic range to the binaural benefits and contribution of low-frequency harmonics to tone identification in quiet and six-talker babble background. These results revealed several promising methods for improving speech intelligibility for CI patients. In addition, we investigate the benefits of voice conversion in improving speech intelligibility for CI users, which was motivated by an earlier study showing that familiarity with a talker’s voice can improve understanding of the conversation. Research has shown that when adults are familiar with someone’s voice, they can more accurately – and even more quickly – process and understand what the person is saying. This theory identified as the “familiar talker advantage” was our motivation to examine its effect on CI patients using voice conversion technique. In the present research, we propose a new method based on multi-channel voice conversion to improve the intelligibility of transformed speeches for CI patients

    Virtual sign : a real time bidirectional translator of portuguese sign language

    Get PDF
    Promoting equity, equal opportunities to all and social inclusion of people with disabilities is a concern of modern societies at large and a key topic in the agenda of European Higher Education. Despite all the progress, we cannot ignore the fact that the conditions provided by the society for the deaf are still far from being perfect. The communication with deaf by means of written text is not as efficient as it might seem at first. In fact, there is a very deep gap between sign language and spoken/written language. The vocabulary, the sentence construction and the grammatical rules are quite different among these two worlds. These facts bring significant difficulties in reading and understanding the meaning of text for deaf people and, on the other hand, make it quite difficult for people with no hearing disabilities to understand sign language. The deployment of tools to assist the daily communication, in schools, in public services, in museums and other, between deaf people and the rest may be a significant contribution to the social inclusion of the deaf community. The work described in this paper addresses the development of a bidirectional translator between Portuguese Sign Language and Portuguese text. The translator from sign language to text resorts to two devices, namely the Microsoft Kinect and 5DT Sensor Gloves in order to gather data about the motion and shape of the hands. The hands configurations are classified using Support Vector Machines. The classification of the movement and orientation of the hands are achieved through the use of Dynamic Time Warping algorithm. The translator exhibits a precision higher than 90%. In the other direction, the translation of Portuguese text to Portuguese Sign Language is supported by a 3D avatar which interprets the entered text and performs the corresponding animations

    Effects of Aging and Spectral Shaping on the Sub-cortical (Brainstem) Differentiation of Contrastive Stop Consonants

    Get PDF
    Purpose: The objectives of this dissertation are to: (1) evaluate the influence of aging on the sub-cortical (brainstem) differentiation of voiced stop consonants (i.e. /b-d-g/); (2) determine whether potential aging deficits at the brainstem level influence behavioral identification of the /b-d-g/ stimuli, (3) investigate whether spectral shaping diminishes any aging impairments at the brainstem level; and (4) if so, whether minimizing these deficits improves the behavioral identification of the speech stimuli. Subjects: Behavioral and electrophysiological responses were collected from 11 older adults (\u3e 50 years old) with near-normal to normal hearing and were compared to those of 16 normal-hearing younger adults (control group). Stimuli and Methods: Speech- evoked auditory brainstem responses (Speech-ABRs) were recorded for three 100-ms long /b-d-g/ consonant-vowel exemplars in unshaped and shaped conditions, for a total of six stimuli. Frequency-dependent spectral-shaping enhanced the second formant (F2) transition relative to the rest of the stimulus, such that it reduced gain for low frequencies; and increased gain for mid and high frequencies, the frequency region of the F2 transition in the /b-d-g/ syllables. Behavioral identification of 15-step perceptual unshaped and shaped /b-d-g/ continua was assessed by generating psychometric functions in order to quantify stimuli perception. Speech ABR peak amplitudes and latencies and stop consonant differentiation scores were measured for 6 stimuli (3 unshaped stimuli and 3 shaped stimuli). Summary of Findings: Older adults exhibited more robust categorical perception, and subtle sub-cortical deficits when compared to younger adults. Individual data showed fewer expected latency patterns for the /b-d-g/ speech-ABRs in older adults as opposed to younger adults, especially for major peaks. Spectral shaping improved the stop consonant differentiation score for major peaks in older adults, such that it moved older adults in the direction of the younger adults’ responses. Conclusion: Sub-cortical impairments at least those measured in this study do not seem to influence the behavioral differentiation of stop consonants in older adults. On the other hand, cue enhancement by spectral shaping seems to overcome some of the deficits noted at the electrophysiological level. However, due to a possible ceiling effect, improvements to the originally robust perception of older adults, at the behavioral level were not found. Significance: Aging seems to reduce the sub-cortical responsiveness to dynamic spectral cues without distorting the spectral coding as evident by the “reparable” age-related changes seen at the electrophysiological level. Cue enhancement appears to increase the neural responsiveness of aged but intact neurons, yielding a better sub-cortical differentiation of stop consonants
    corecore