14 research outputs found

    An Interactive and Efficient Voice Processing For Home Automation System

    Get PDF
    Home networking has evolved from linked personal computers to a more complex system that encompasses advanced security and automation applications. Once just reserved for high-end luxury homes, home networks are now a regular feature in residences. These networks allow users to consolidate heating, air conditioning, lighting, appliances, entertainment, intercom, telecommunication, surveillance and security systems into an easy-to-operate unified network. Interactive applications operated by voice recognition, for example integrated door security systems and the ability to control home appliances, are key features of home automation networks. This interactive capability depends on high-quality voice processing technology, including acoustic echo cancellation, low signal distortion and noise reduction techniques. A home automation system must also be scalable to allow future evolution, flexible to support field upgrades, interactive, easy-to-use, costefficient and reliable. This article introduces some of the voice quality performance issues and design challenges unique to home automation systems. It will discuss home automation network applications that rely on voice processing, and examine some of the critical features and functionality that can help ease design complexity and cost to deliver enhanced performance

    Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation

    Get PDF
    We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hidden Markov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach on the Broadcast News corpus, using the DARPA-TDT evaluation metrics. Results show that the prosodic model alone is competitive with word-based segmentation methods. Furthermore, we achieve a significant reduction in error by combining the prosodic and word-based knowledge sources.Comment: 27 pages, 8 figure

    The case for automatic higher-level features in forensic speaker recognition

    Get PDF
    Abstract Approaches from standard automatic speaker recognition, which rely on cepstral features, suffer the problem of lack of interpretability for forensic applications. But the growing practice of using "higher-level" features in automatic systems offers promise in this regard. We provide an overview of automatic higher-level systems and discuss potential advantages, as well as issues, for their use in the forensic context

    The case for automatic higher-level features in forensic speaker recognition

    Get PDF
    Abstract Approaches from standard automatic speaker recognition, which rely on cepstral features, suffer the problem of lack of interpretability for forensic applications. But the growing practice of using "higher-level" features in automatic systems offers promise in this regard. We provide an overview of automatic higher-level systems and discuss potential advantages, as well as issues, for their use in the forensic context

    An Investigation of Spectral Subband Centroids for Speaker Authentication

    Get PDF
    Most conventional features used in speaker authentication are based on estimation of spectral envelopes in one way or another, in the form of cepstrums, e.g., Mel-scale Filterbank Cepstrum Coefficients (MFCCs), Linear-scale Filterbank Cepstrum Coefficients (LFCCs) and Relative Spectral Perceptual Linear Prediction (RASTA-PLP). In this study, Spectral Subband Centroids (SSCs) are examined. These features are the centroid frequency in each subband. They have properties similar to the formant frequency but are limited to a given subband. Preliminary empirical findings, on a subset of the XM2VTS database, using Analysis of Variance and Linear Discriminant Analysis suggest that, firstly, a certain number of centroids (up to about 16) are necessary to cover enough information about the speaker's identity; and secondly, that SSCs could provide complementary information to the conventional MFCCs. Theoretical findings suggest that mean-subtracted SSCs are more robust to additive noise. Further empirical experiments carried out on the more realistic NIST2001 database using SSCs, MFCCs (respectively LFCCs) and their combinations by concatenation suggest that SSCs are indeed robust and complementary features to conventional MFCC (respectively LFCCs) features often used in speaker authentication

    Cost-Sensitive Learning for Emotion Robust Speaker Recognition

    Get PDF
    In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved

    Use of vocal source features in speaker segmentation.

    Get PDF
    Chan Wai Nang.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references (leaves 77-82).Abstracts in English and Chinese.Chapter Chapter1 --- Introduction --- p.1Chapter 1.1 --- Speaker recognition --- p.1Chapter 1.2 --- State of the art of speaker recognition techniques --- p.2Chapter 1.3 --- Motivations --- p.5Chapter 1.4 --- Thesis outline --- p.6Chapter Chapter2 --- Acoustic Features --- p.8Chapter 2.1 --- Speech production --- p.8Chapter 2.1.1 --- Physiology of speech production --- p.8Chapter 2.1.2 --- Source-filter model --- p.11Chapter 2.2 --- Vocal tract and vocal source related acoustic features --- p.14Chapter 2.3 --- Linear predictive analysis of speech --- p.15Chapter 2.4 --- Features for speaker recognition --- p.16Chapter 2.4.1 --- Vocal tract related features --- p.17Chapter 2.4.2 --- Vocal source related features --- p.19Chapter 2.5 --- Wavelet octave coefficients of residues (WOCOR) --- p.20Chapter Chapter3 --- Statistical approaches to speaker recognition --- p.24Chapter 3.1 --- Statistical modeling --- p.24Chapter 3.1.1 --- Classification and modeling --- p.24Chapter 3.1.2 --- Parametric vs non-parametric --- p.25Chapter 3.1.3 --- Gaussian mixture model (GMM) --- p.25Chapter 3.1.4 --- Model estimation --- p.27Chapter 3.2 --- Classification --- p.28Chapter 3.2.1 --- Multi-class classification for speaker identification --- p.28Chapter 3.2.2 --- Two-speaker recognition --- p.29Chapter 3.2.3 --- Model selection by statistical model --- p.30Chapter 3.2.4 --- Performance evaluation metric --- p.31Chapter Chapter4 --- Content dependency study of WOCOR and MFCC --- p.32Chapter 4.1 --- Database: CU2C --- p.32Chapter 4.2 --- Methods and procedures --- p.33Chapter 4.3 --- Experimental results --- p.35Chapter 4.4 --- Discussion --- p.36Chapter 4.5 --- Detailed analysis --- p.39Summary --- p.41Chapter Chapter5 --- Speaker Segmentation --- p.43Chapter 5.1 --- Feature extraction --- p.43Chapter 5.2 --- Statistical methods for segmentation and clustering --- p.44Chapter 5.2.1 --- Segmentation by spectral difference --- p.44Chapter 5.2.2 --- Segmentation by Bayesian information criterion (BIC) --- p.47Chapter 5.2.3 --- Segment clustering by BIC --- p.49Chapter 5.3 --- Baseline system --- p.50Chapter 5.3.1 --- Algorithm --- p.50Chapter 5.3.2 --- Speech database --- p.52Chapter 5.3.3 --- Performance metric --- p.53Chapter 5.3.4 --- Results --- p.58Summary --- p.60Chapter Chapter6 --- Application of vocal source features in speaker segmentation --- p.61Chapter 6.1 --- Discrimination power of WOCOR against MFCC --- p.61Chapter 6.1.1 --- Experimental set-up --- p.62Chapter 6.1.2 --- Results --- p.63Chapter 6.2 --- Speaker segmentation using vocal source features --- p.67Chapter 6.2.1 --- The construction of new proposed system --- p.67Summary --- p.72Chapter Chapter7 --- Conclusions --- p.74Reference --- p.7
    corecore