Search CORE

7 research outputs found

Examining the Effect of Talker Familiarity Using Familiar and Unfamiliar Talkers on Noise-Vocoded Speech Perception in Normal-Hearing Listeners: A Training Study

Author: Collins Daria N
Publication venue: CUNY Academic Works
Publication date: 01/06/2023
Field of study

Auditory training studies utilizing stimuli that are applicable to real-world processing of speech have been shown to improve speech perception abilities in normal hearing populations, those with hearing loss, and cochlear implant wearers. In particular, exposing normal hearing adults to noise-vocoded speech via auditory training studies has been shown to not only simulate the perceptual experience of a cochlear implant wearer, but have demonstrated promising improvements on speech perceptual abilities via the training paradigm. Additionally, studies have highlighted various variables that impact speech perception, including, talker familiarity. Talker familiarity has been shown to enhance speech perception both in listeners with normal hearing and those with hearing loss. This study aims to create a multi-session training program using a population of normal hearing adults during which listeners would be exposed to either natural or noise-vocoded sentences. After completing three training sessions over a week, participants will be asked to recognize noise-vocoded sentences spoken by the same talker they have been exposed to during training as well as an unfamiliar talker they have not been exposed to previously. In particular, this portion of the larger study will focus on talker familiarity and the impacts of training listeners on familiar versus unfamiliar speakers in relation to noise vocoded speech perception performance. This study is unique in that it is the first study to investigate both talker familiarity and noise vocoded speech familiarity in the scope of a multi-session auditory training program. The results of the suggested study may also illuminate which training conditions could potentially optimize the improvement in noise-vocoded speech perception. The findings from this study may be useful for further research pertaining to those with hearing loss and/or those with cochlear implants and potentially assist in designing a preimplantation training program to assist patients’ transition to CI

City University of New York

Pitch modification techniques for sampled voice

Author: Brooks Michael
Publication venue
Publication date: 27/06/2018
Field of study

The Australian National University

Privacy protecting biometric authentication systems

Author: Kholmatov Alisher Anatolyevich
Publication venue
Publication date: 01/01/2008
Field of study

As biometrics gains popularity and proliferates into the daily life, there is an increased concern over the loss of privacy and potential misuse of biometric data held in central repositories. The major concerns are about i) the use of biometrics to track people, ii) non-revocability of biometrics (eg. if a fingerprint is compromised it can not be canceled or reissued), and iii) disclosure of sensitive information such as race, gender and health problems which may be revealed by biometric traits. The straightforward suggestion of keeping the biometric data in a user owned token (eg. smart cards) does not completely solve the problem, since malicious users can claim that their token is broken to avoid biometric verification altogether. Put together, these concerns brought the need for privacy preserving biometric authentication methods in the recent years. In this dissertation, we survey existing privacy preserving biometric systems and implement and analyze fuzzy vault in particular; we propose a new privacy preserving approach; and we study the discriminative capability of online signatures as it relates to the success of using online signatures in the available privacy preserving biometric verification systems. Our privacy preserving authentication scheme combines multiple biometric traits to obtain a multi-biometric template that hides the constituent biometrics and allows the possibility of creating non-unique identifiers for a person, such that linking separate template databases is impossible. We provide two separate realizations of the framework: one uses two separate fingerprints of the same individual to obtain a combined biometric template, while the other one combines a fingerprint with a vocal pass-phrase. We show that both realizations of the framework are successful in verifying a person's identity given both biometric traits, while preserving privacy (i.e. biometric data is protected and the combined identifier can not be used to track people). The Fuzzy Vault emerged as a promising construct which can be used in protecting biometric templates. It combines biometrics and cryptography in order to get the benefits of both fields; while biometrics provides non-repudiation and convenience, cryptography guarantees privacy and adjustable levels of security. On the other hand, the fuzzy vault is a general construct for unordered data, and as such, it is not straightforward how it can be used with different biometric traits. In the scope of this thesis, we demonstrate realizations of the fuzzy vault using fingerprints and online signatures such that authentication can be done while biometric templates are protected. We then demonstrate how to use the fuzzy vault for secret sharing, using biometrics. Secret sharing schemes are cryptographic constructs where a secret is split into shares and distributed amongst the participants in such a way that it is constructed/revealed only when a necessary number of share holders come together (e.g. in joint bank accounts). The revealed secret can then be used for encryption or authentication. Finally, we implemented how correlation attacks can be used to unlock the vault; showing that further measures are needed to protect the fuzzy vault against such attacks. The discriminative capability of a biometric modality is based on its uniqueness/entropy and is an important factor in choosing a biometric for a large-scale deployment or a cryptographic application. We present an individuality model for online signatures in order to substantiate their applicability in biometric authentication. In order to build our model, we adopt the Fourier domain representation of the signature and propose a matching algorithm. The signature individuality is measured as the probability of a coincidental match between two arbitrary signatures, where model parameters are estimated using a large signature database. Based on this preliminary model and estimated parameters, we conclude that an average online signature provides a high level of security for authentication purposes. Finally, we provide a public online signature database along with associated testing protocols that can be used for testing signature verification system

CiteSeerX

Sabanci University Research Database

Articulatory-based Speech Processing Methods for Foreign Accent Conversion

Author: Felps Daniel
Publication venue
Publication date
Field of study

The objective of this dissertation is to develop speech processing methods that enable without altering their identity. We envision accent conversion primarily as a tool for pronunciation training, allowing non-native speakers to hear their native-accented selves. With this application in mind, we present two methods of accent conversion. The first assumes that the voice quality/identity of speech resides in the glottal excitation, while the linguistic content is contained in the vocal tract transfer function. Accent conversion is achieved by convolving the glottal excitation of a non-native speaker with the vocal tract transfer function of a native speaker. The result is perceived as 60 percent less accented, but it is no longer identified as the same individual. The second method of accent conversion selects segments of speech from a corpus of non-native speech based on their acoustic or articulatory similarity to segments from a native speaker. We predict that articulatory features provide a more speaker-independent representation of speech and are therefore better gauges of linguistic similarity across speakers. To test this hypothesis, we collected a custom database containing simultaneous recordings of speech and the positions of important articulators (e.g. lips, jaw, tongue) for a native and non-native speaker. Resequencing speech from a non-native speaker based on articulatory similarity with a native speaker achieved a 20 percent reduction in accent. The approach is particularly appealing for applications in pronunciation training because it modifies speech in a way that produces realistically achievable changes in accent (i.e., since the technique uses sounds already produced by the non-native speaker). A second contribution of this dissertation is the development of subjective and objective measures to assess the performance of accent conversion systems. This is a difficult problem because, in most cases, no ground truth exists. Subjective evaluation is further complicated by the interconnected relationship between accent and identity, but modifications of the stimuli (i.e. reverse speech and voice disguises) allow the two components to be separated. Algorithms to measure objectively accent, quality, and identity are shown to correlate well with their subjective counterparts

Texas A&M Repository

Do colourless green voices speak furiously? Linkages between phonetic and visual perception in synaesthesia

Author: Moos Anja C.
Publication venue
Publication date: 01/01/2013
Field of study

Synaesthesia is an unusual phenomenon, in which additional sensory perceptions are triggered by apparently unrelated sensory or conceptual stimuli. The main foci of this thesis lie in speech sound - colour and voice-induced synaesthesia. While grapheme-colour synaesthesia has been intensively researched, few studies have approached types of synaesthesia based on vocal inducers with detailed acoustic-phonetic and colorimetric analyses. This approach is taken here. First, a thorough examination of speech-sound - colour synaesthesia was conducted. An experiment is reported that tested to what extent vowel acoustics influence colour associations for synaesthetes and non-synaesthetes. Systematic association patterns between vowel formants and colour measures could be found in general, but most strongly in synaesthetes. Synaesthetes also showed a more consistent pattern of vowel-colour associations. The issue of whether or not speech-sound - colour synaesthesia is a discrete type of synaesthesia independent of grapheme-colour synaesthesia is discussed, and how these might influence each other. Then, two experiments are introduced to explore voice-induced synaesthesia. First, a comprehensive voice description task was conducted with voice synaesthetes, phoneticians and controls to investigate their verbal voice quality descriptions and the colour and texture associations that they have with voices. Qualitative analyses provided data about the nature of associations by the participant groups, while quantitative analyses revealed that for all groups, acoustic parameters such as pitch, pitch range, vowel formants and other spectral properties influenced colour and texture associations in a systematic way. Above all, a strong connection was found between these measures and luminance. Finally, voice-induced synaesthetes, other synaesthetes and controls participated in a voice line-up, of the kind used in forensic phonetic case work. This experiment, motivated by previous findings of memory advantages in synaesthetes in certain areas, tested whether synaesthetes’ voice memory is influenced by their condition. While no difference in performance was found between groups when using normal speech, voice-induced synaesthetes outperformed others in identifying a whispering speaker. These are the first group studies on the otherwise under-researched type of voice-induced synaesthesia, with a focus on acoustic rather than semantic analysis. This adds knowledge to the growing field of synaesthesia research from a largely neglected phonetic angle. The debate around (re)defining synaesthesia is picked up. The voice description experiment, in particular, leads to a discussion of a synaesthesia spectrum in the population, as many common mechanisms and associations were found. It was also revealed that less common types of synaesthesia are often difficult to define in a rigid way using traditional criteria. Finally, the interplay of different types of synaesthesia is discussed and findings are evaluated against the background of the existing theories of synaesthesia

Glasgow Theses Service

Exploiting primitive grouping constraints for noise robust automatic speech recognition : studies with simultaneous speech.

Author: Coy André
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/01/2008
Field of study

Significant strides have been made in the field of automatic speech recognition over the past three decades. However, the systems are not robust; their performance degrades in the presence of even moderate amounts of noise. This thesis presents an approach to developing a speech recognition system that takes inspiration firom the approach of human speech recognition

White Rose E-theses Online

OpenGrey Repository

Recommended from our members

Effects of manipulating fundamental frequency and speech rate on synthetic voice recognition performance and perceived speaker identity, sex, and age

Author: Gous GE
Publication venue
Publication date: 01/09/2017
Field of study

Vocal fundamental frequency (F0) and speech rate provide the listener with important information relating to the identity, sex, and age of the speaker. Furthermore, it has also been demonstrated that manipulations in F0 or speech rate can lead to accentuation effects in voice memory. As a result, listeners appear to exaggerate the representation of a target voice in terms of F0 or speech rate, and mistakenly remember it as being higher or lower in F0, or faster or slower in speech rate, than the voice originally heard. The aim of this thesis was to understand the effect of manipulations/shifts in F0 or speech rate on voice matching performance and perceived speaker identity, sex, and age. Synthesised male and female voices speaking prescribed sentences were generated and shifted in either F0 and speech rate. In the first set of experiments (Experiments 2, 3, and 4), male and female listeners made judgements about the perceived identity, sex, or age of the speaker. In the second set of experiments (Experiment 5, 6, and 7) male and female listeners made target matching responses for voices presented with and without a delay, and with different spoken sentences. The results of Experiments 2, 3, and 4 indicated the following: (1) Shifts in either F0 or speech rate increased uncertainty about the identity of the speaker, though were more robust to shifts in speech rate than they were to shifts in F0. (2) Shifts in F0 also increased uncertainty about speaker sex, but shifts in speech rate did not. Male voices were accurately perceived as male irrespective of the direction of manipulation in F0. However, for female voices, decreasing F0 increased the uncertainty of speaker sex (i.e., the voices were more likely to be perceived as male rather than female). (3) Increasing either F0 or speech rate resulted in both male and female voices as sounding younger, whereas decreasing either F0 or speech rate lead to listeners perceiving the voices as sounding older. The results of Experiments 5, 6, and 7 indicated the following: (4) Shifts in either F0 or speech rate did increase matching errors for the target voice, however, there was no evidence of an accentuation effect. Specifically, for voices shifted in F0, there was an increase in the selection of voices higher in F0 compared to voices lower in F0. For voices shifted in speech rate, there was an increase in the selection of voices faster in speech rate compared to voices slower in speech rate, but only for slow speech rate target voices. (5) Accentuation errors were no more likely to occur when the inter-stimulus interval was increased, or (6) when a different sentence was spoken in the sequential voice pair to the one previously spoken by the target voice. The findings have theoretical and applied relevance. The work has provided a clearer understanding of how shifts in F0 or speech rate are likely to affect perceptions about the identity, sex, and age of the speaker than was possible to establish from previous studies. It has also contributed further to our understanding about the effect of shifts in F0 or speech rate on voice matching performance, and their importance in accurate recognition. This information might be insightful to the police and help to determine the accuracy of descriptions made about a voice and decisions made during a voice lineup, particularly if a suspect of a crime was likely to be disguising their voice

Nottingham Trent Institutional Repository (IRep)