Search CORE

5,422 research outputs found

Speaker Identification Using a Combination of Different Parameters as Feature Inputs to an Artificial Neural Network Classifier

Author: Moonasar Viresh
Venayagamoorthy Ganesh K.
Publication venue: Scholars\u27 Mine
Publication date: 01/01/1999
Field of study

This paper presents a technique using artificial neural networks (ANNs) for speaker identification that results in a better success rate compared to other techniques. The technique used in this paper uses both power spectral densities (PSDs) and linear prediction coefficients (LPCs) as feature inputs to a self organizing feature map to achieve a better identification performance. Results for speaker identification with different methods are presented and compared

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Temporal contextual descriptors and applications to emotion analysis.

Author: Balti Haythem
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/12/2014
Field of study

The current trends in technology suggest that the next generation of services and devices allows smarter customization and automatic context recognition. Computers learn the behavior of the users and can offer them customized services depending on the context, location, and preferences. One of the most important challenges in human-machine interaction is the proper understanding of human emotions by machines and automated systems. In the recent years, the progress made in machine learning and pattern recognition led to the development of algorithms that are able to learn the detection and identification of human emotions from experience. These algorithms use different modalities such as image, speech, and physiological signals to analyze and learn human emotions. In many settings, the vocal information might be more available than other modalities due to widespread of voice sensors in phones, cars, and computer systems in general. In emotion analysis from speech, an audio utterance is represented by an ordered (in time) sequence of features or a multivariate time series. Typically, the sequence is further mapped into a global descriptor representative of the entire utterance/sequence. This descriptor is used for classification and analysis. In classic approaches, statistics are computed over the entire sequence and used as a global descriptor. This often results in the loss of temporal ordering from the original sequence. Emotion is a succession of acoustic events. By discarding the temporal ordering of these events in the mapping, the classic approaches cannot detect acoustic patterns that lead to a certain emotion. In this dissertation, we propose a novel feature mapping framework. The proposed framework maps temporally ordered sequence of acoustic features into data-driven global descriptors that integrate the temporal information from the original sequence. The framework contains three mapping algorithms. These algorithms integrate the temporal information implicitly and explicitly in the descriptor\u27s representation. In the rst algorithm, the Temporal Averaging Algorithm, we average the data temporally using leaky integrators to produce a global descriptor that implicitly integrates the temporal information from the original sequence. In order to integrate the discrimination between classes in the mapping, we propose the Temporal Response Averaging Algorithm which combines the temporal averaging step of the previous algorithm and unsupervised learning to produce data driven temporal contextual descriptors. In the third algorithm, we use the topology preserving property of the Self-Organizing Maps and the continuous nature of speech to map a temporal sequence into an ordered trajectory representing the behavior over time of the input utterance on a 2-D map of emotions. The temporal information is integrated explicitly in the descriptor which makes it easier to monitor emotions in long speeches. The proposed mapping framework maps speech data of different length to the same equivalent representation which alleviates the problem of dealing with variable length temporal sequences. This is advantageous in real time setting where the size of the analysis window can be variable. Using the proposed feature mapping framework, we build a novel data-driven speech emotion detection and recognition system that indexes speech databases to facilitate the classification and retrieval of emotions. We test the proposed system using two datasets. The first corpus is acted. We showed that the proposed mapping framework outperforms the classic approaches while providing descriptors that are suitable for the analysis and visualization of humans’ emotions in speech data. The second corpus is an authentic dataset. In this dissertation, we evaluate the performances of our system using a collection of debates. For that purpose, we propose a novel debate collection that is one of the first initiatives in the literature. We show that the proposed system is able to learn human emotions from debates

University of Louisville

Speaker segmentation and clustering

Author: Ajmera
Ajmera
Almpanidis
Barras
Bimbot
Campbell
Campbell
Cettolo
Constantine Kotropoulos
Delacourt
Deller
Fiscus
Gales
Garofolo
Godfrey
Graff
Graff
Graff
Hansen
Harb
Hess
Huang
Jain
Kim
Know
Lapidot
Lu
Manjunath
Margarita Kotti
Meignier
Oppenheim
Pellom
Reynolds
Sondhi
Tranter
Vassiliki Moschou
Ververidis
Wang
Wu
Wu
Zhou
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

Recommended from our members

Biologically inspired speaker verification

Author: Tashan T
Publication venue
Publication date: 01/01/2012
Field of study

Speaker verification is an active research problem that has been addressed using a variety of different classification techniques. However, in general, methods inspired by the human auditory system tend to show better verification performance than other methods. In this thesis three biologically inspired speaker verification algorithms are presented

Nottingham Trent Institutional Repository (IRep)

The Recommendation Architecture: Lessons from Large-Scale Electronic Systems Applied to Cognition

Author: coward l andrew
Publication venue: Elsevier
Publication date: 01/12/2001
Field of study

A fundamental approach of cognitive science is to understand cognitive systems by separating them into modules. Theoretical reasons are described which force any system which learns to perform a complex combination of real time functions into a modular architecture. Constraints on the way modules divide up functionality are also described. The architecture of such systems, including biological systems, is constrained into a form called the recommendation architecture, with a primary separation between clustering and competition. Clustering is a modular hierarchy which manages the interactions between functions on the basis of detection of functionally ambiguous repetition. Change to previously detected repetitions is limited in order to maintain a meaningful, although partially ambiguous context for all modules which make use of the previously defined repetitions. Competition interprets the repetition conditions detected by clustering as a range of alternative behavioural recommendations, and uses consequence feedback to learn to select the most appropriate recommendation. The requirements imposed by functional complexity result in very specific structures and processes which resemble those of brains. The design of an implemented electronic version of the recommendation architecture is described, and it is demonstrated that the system can heuristically define its own functionality, and learn without disrupting earlier learning. The recommendation architecture is compared with a range of alternative cognitive architectural proposals, and the conclusion reached that it has substantial potential both for understanding brains and for designing systems to perform cognitive functions

CogPrints Cognitive Sciences Eprint Archive

Optimizing spectral feature based text-Independent speaker recognition

Author: Kinnunen Tomi H.
Publication venue: University of Joensuu
Publication date
Field of study

UEF Electronic Publications

Evaluation of preprocessors for neural network speaker verification

Author: Salleh Sheikh-Hussain
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Edinburgh Research Archive