211 research outputs found
Limited-data automatic speaker verification algorithm using band-limited phase-only correlation function
In this paper, a new method to deal with automatic speaker verification based on band-limited phaseonly correlation (BLPOC) is proposed. The aim of this study is to validate the use of the BLPOC function as a new limited-data automatic speaker verification technique. Although some speaker verification techniques have high accuracy,
efficiency usually depends on the extraction of complex theoretical information from speech signals and the amount of the data for training the algorithms. The BLPOC function is a high-accuracy biometric technique traditionally implemented in human identification by fingerprints (through image-matching)
A large-scale analysis of the acoustic-phonetic markers of speaker sex.
The research for this thesis lies within the fieIa of speaker characterisation through the
acoustic-phonetic analysis of speech. The thesis consists of two parts:
1. An inv.estigation of the acoustic-phonetic differences between the speech of women
and men;
2. An examination of the practicalities of automating the investigation to analyse a
large speech database.
The acoustic-phonetic markers of speaker sex examined here are the fundamental frequency,
the formant frequencies, and the relative amplitude of the first harmonic. The
aims of the investigation were, firstly, to establish to what extent these markers differentiate
between the sexes, and secondly, to examine the extent of between- and within-speaker
deviation from the female and male norms, or average values for each sex.
These points were investigated by an automated acoustic-phonetic analysis of the TIMIT
database, involving a data set of almost 16,000 segments of speech. An automated method
was dev~loped to enable the signal processing and statistical analysis of a data set of this
size. The problems to be encountered in the analysis of a highly variable data source (i.e.
the acoustic speech waveform) are addressed
Acoustic Approaches to Gender and Accent Identification
There has been considerable research on the problems of speaker and language recognition
from samples of speech. A less researched problem is that of accent recognition. Although this
is a similar problem to language identification, di�erent accents of a language exhibit more
fine-grained di�erences between classes than languages. This presents a tougher problem
for traditional classification techniques. In this thesis, we propose and evaluate a number of
techniques for gender and accent classification. These techniques are novel modifications and
extensions to state of the art algorithms, and they result in enhanced performance on gender
and accent recognition.
The first part of the thesis focuses on the problem of gender identification, and presents a
technique that gives improved performance in situations where training and test conditions are
mismatched.
The bulk of this thesis is concerned with the application of the i-Vector technique to accent
identification, which is the most successful approach to acoustic classification to have emerged
in recent years. We show that it is possible to achieve high accuracy accent identification without
reliance on transcriptions and without utilising phoneme recognition algorithms. The thesis
describes various stages in the development of i-Vector based accent classification that improve
the standard approaches usually applied for speaker or language identification, which are
insu�cient. We demonstrate that very good accent identification performance is possible with
acoustic methods by considering di�erent i-Vector projections, frontend parameters, i-Vector
configuration parameters, and an optimised fusion of the resulting i-Vector classifiers we can
obtain from the same data.
We claim to have achieved the best accent identification performance on the test corpus
for acoustic methods, with up to 90% identification rate. This performance is even better than
previously reported acoustic-phonotactic based systems on the same corpus, and is very close
to performance obtained via transcription based accent identification. Finally, we demonstrate
that the utilization of our techniques for speech recognition purposes leads to considerably
lower word error rates.
Keywords: Accent Identification, Gender Identification, Speaker Identification, Gaussian
Mixture Model, Support Vector Machine, i-Vector, Factor Analysis, Feature Extraction, British
English, Prosody, Speech Recognition
Recommended from our members
Effects of Attention on Multisensory Integration
The world presents information via a variety of sensory channels. To make sense of this information, we must determine what is relevant and ignore unhelpful noise. We then integrate congruent information within and across modalities to build coherent perceptions. Importantly, immediate goals and prevailing environmental factors may interact to affect our perceptual decisions. This dynamic process of multisensory integration is essential to successful perception in the real world, but can also lead to errors. The current project exploits some of these perceptual errors to explore how endogenous (task-directed) and exogenous (stimulus intensity) factors may influence multisensory integration. In a series of four experiments, we use the sound-induced flash illusion (SFI; Shams et al., 2000; 2002) and related audiovisual effects as indices of multisensory integration. Endogenous attention was manipulated using a focused attention visual task and a novel bimodal conditional attention task. In our first two experiments, we found that participants reported more illusions when attending to both sensory modalities. This effect was larger when the auditory stimuli were presented at near-threshold levels. Perceptual sensitivity (d′) was also found to decrease in the bimodal condition. We then manipulated auditory intensity in each of these tasks independently. Reports of the SFI were found to increase with the higher intensity auditory stimuli. However, differences in reporting these illusions within the same task were attributable to both changes in bias (c) and d′. Event-related potentials recorded in our first experiment revealed that the SFI was associated with smaller P3 potentials than found in valid targets. We also noted differences in the response-locked error positivity (Pe), with illusory stimuli having more positive amplitudes than real targets. However, the earlier occurring error-related negativity (ERN) was indistinguishable in real and illusory targets. This suggests that participants were less confident of the illusion during stimulus evaluation and one stage of response monitoring. We evaluate these results in terms of the directed attention and information reliability hypotheses (Andersen et al., 2004, 2005) and discuss how these and similar experiments may deepen our understanding of how multisensory perception is impacted at multiple stages of stimulus and response evaluation
COGNITIVE RADIO SOLUTION FOR IEEE 802.22
Current wireless systems suffer severe radio spectrum underutilization due to a number of problematic issues, including wasteful static spectrum allocations; fixed radio functionalities and architectures; and limited cooperation between network nodes. A significant number of research efforts aim to find alternative solutions to improve spectrum utilization. Cognitive radio based on software radio technology is one such novel approach, and the impending IEEE 802.22 air interface standard is the first based on such an approach. This standard aims to provide wireless services in wireless regional area network using TV spectrum white spaces. The cognitive radio devices employed feature two fundamental capabilities, namely supporting multiple modulations and data-rates based on wireless channel conditions and sensing a wireless spectrum. Spectrum sensing is a critical functionality with high computational complexity. Although the standard does not specify a spectrum sensing method, the sensing operation has inherent timing and accuracy constraints.This work proposes a framework for developing a cognitive radio system based on a small form factor software radio platform with limited memory resources and processing capabilities. The cognitive radio systems feature adaptive behavior based on wireless channel conditions and are compliant with the IEEE 802.22 sensing constraints. The resource limitations on implementation platforms post a variety of challenges to transceiver configurability and spectrum sensing. Overcoming these fundamental features on small form factors paves the way for portable cognitive radio devices and extends the range of cognitive radio applications.Several techniques are proposed to overcome resource limitation on a small form factor software radio platform based on a hybrid processing architecture comprised of a digital signal processor and a field programmable gate array. Hardware reuse and task partitioning over a number of processing devices are among the techniques used to realize a configurable radio transceiver that supports several communication modes, including modulations and data rates. In particular, these techniques are applied to build configurable modulation architecture and a configurable synchronization. A mode-switching architecture based on circular buffers is proposed to facilitate a reliable transitioning between different communication modes.The feasibility of efficient spectrum sensing based on a compressive sampling technique called "Fast Fourier Sampling" is examined. The configuration parameters are analyzed mathematically, and performance is evaluated using computer simulations for local spectrum sensing applications. The work proposed herein features a cooperative Fast Fourier sampling scheme to extend the narrowband and wideband sensing performance of this compressive sensing technique.The précis of this dissertation establishes the foundation of efficient cognitive radio implementation on small form factor software radio of hybrid processing architecture
The effects of singing exercises and melodic intonation therapy (MIT) on the male-to-female transgender voice
" The purpose of this study was to test the efficacy of traditional voice therapy approaches in combination with singing exercises and Melodic Intonation Therapy (MIT) to aid male-to-female transgender individuals gain a more feminine sounding voice. Participants from this study were recruited from a transgender support group in Greensboro, North Carolina. Six male-to-female individuals ranging in age from 37 to 63 years volunteered to participate in the study. Participants were randomly divided into two groups: Three individuals received traditional voice therapy plus feminine language structures/vocabulary and nonverbal communication (Group 1), while the remaining three received traditional voice therapy plus singing exercises and MIT (Group 2). All participants received traditional voice therapy techniques. Quantitative results suggested increased Speaking Fundamental Frequencies (SFFs) for participants in both groups, however, a slightly higher SFF was present in Group 2. Descriptive analysis of the results showed that by the study's end, all participants presented with self-voice ratings (1-7 scale) that were higher than the ratings given by the participants at the beginning of the study. Also, at the end of the study, all four judges (two first-year speech-language pathology graduate students and two random volunteers) rated the participants with voice ratings that were above the ratings at the beginning of the study."--Abstract from author supplied metadata
- …