4,183 research outputs found

    Wavelet-based voice morphing

    Get PDF
    This paper presents a new multi-scale voice morphing algorithm. This algorithm enables a user to transform one person's speech pattern into another person's pattern with distinct characteristics, giving it a new identity, while preserving the original content. The voice morphing algorithm performs the morphing at different subbands by using the theory of wavelets and models the spectral conversion using the theory of Radial Basis Function Neural Networks. The results obtained on the TIMIT speech database demonstrate effective transformation of the speaker identity

    A neural network approach to audio-assisted movie dialogue detection

    Get PDF
    A novel framework for audio-assisted dialogue detection based on indicator functions and neural networks is investigated. An indicator function defines that an actor is present at a particular time instant. The cross-correlation function of a pair of indicator functions and the magnitude of the corresponding cross-power spectral density are fed as input to neural networks for dialogue detection. Several types of artificial neural networks, including multilayer perceptrons, voted perceptrons, radial basis function networks, support vector machines, and particle swarm optimization-based multilayer perceptrons are tested. Experiments are carried out to validate the feasibility of the aforementioned approach by using ground-truth indicator functions determined by human observers on 6 different movies. A total of 41 dialogue instances and another 20 non-dialogue instances is employed. The average detection accuracy achieved is high, ranging between 84.78%±5.499% and 91.43%±4.239%

    WSOM: Building Adaptive Wavelets with Self-organizing Maps

    Full text link
    The WSOM (Wavelet Self-Organizing Map) model, a neural network for the creation of wavelet bases adapted to the distribution of input data, is introduced. The model provides an efficient on-line way to construct high-dimensional wavelet bases. Simulations of a lD function approximation problem illustrate how WSOM adapts to non-uniformly distributed input data, outperforming the discrete wavelet transform. A speaker-independent vowel recognition benchmark task demonstrates how the model constructs high-dimensional bases using low-dimensional wavelets.National Science Foundation (IIU 91-011659); Office of Naval Research (N00014-95-10409, NOOOH-95-0657

    Gaussian Artmap: A Neural Network for Fast Incremental Learning of Noisy Multidimensional Maps

    Full text link
    A new neural network architecture for incremental supervised learning of analalog multidimensional maps is introduced. The architecture, called Gaussian ARTMAP, is a synthesis of a Gaussian classifier and an Adaptive Resonance Theory (ART) neural network, achieved by defining the ART choice function as the discriminant function of a Gaussian classifer with separable distributions, and the ART match function as the same, but with the a priori probabilities of the distributions discounted. While Gaussian ARTMAP retains the attractive parallel computing and fast learning properties of fuzzy ARTMAP, it learns a more efficient internal representation of a mapping while being more resistant to noise than fuzzy ARTMAP on a number of benchmark databases. Several simulations are presented which demonstrate that Gaussian ARTMAP consistently obtains a better trade-off of classification rate to number of categories than fuzzy ARTMAP. Results on a vowel classiflcation problem are also presented which demonstrate that Gaussian ARTMAP outperforms many other classifiers.National Science Foundation (IRI 90-00530); Office of Naval Research (N00014-92-J-4015, 40014-91-J-4100

    Robust ASR using Support Vector Machines

    Get PDF
    The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units. In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM–SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841–1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.Publicad

    Robust classification with context-sensitive features

    Get PDF
    This paper addresses the problem of classifying observations when features are context-sensitive, especially when the testing set involves a context that is different from the training set. The paper begins with a precise definition of the problem, then general strategies are presented for enhancing the performance of classification algorithms on this type of problem. These strategies are tested on three domains. The first domain is the diagnosis of gas turbine engines. The problem is to diagnose a faulty engine in one context, such as warm weather, when the fault has previously been seen only in another context, such as cold weather. The second domain is speech recognition. The context is given by the identity of the speaker. The problem is to recognize words spoken by a new speaker, not represented in the training set. The third domain is medical prognosis. The problem is to predict whether a patient with hepatitis will live or die. The context is the age of the patient. For all three domains, exploiting context results in substantially more accurate classification
    corecore