310 research outputs found
Extraction of vocal-tract system characteristics from speechsignals
We propose methods to track natural variations in the characteristics of the vocal-tract system from speech signals. We are especially interested in the cases where these characteristics vary over time, as happens in dynamic sounds such as consonant-vowel transitions. We show that the selection of appropriate analysis segments is crucial in these methods, and we propose a selection based on estimated instants of significant excitation. These instants are obtained by a method based on the average group-delay property of minimum-phase signals. In voiced speech, they correspond to the instants of glottal closure. The vocal-tract system is characterized by its formant parameters, which are extracted from the analysis segments. Because the segments are always at the same relative position in each pitch period, in voiced speech the extracted formants are consistent across successive pitch periods. We demonstrate the results of the analysis for several difficult cases of speech signals
On timing in time-frequency analysis of speech signals
The objective of this paper is to demonstrate the importance of position of the analysis time window in time-frequency analysis of speech signals. Speech signals contain information about the time varying characteristics of the excitation source and the vocal tract system. Resolution in both the temporal and spectral domains is essential for extracting the source and system characteristics from speech signals. It is not only the resolution, as determined by the analysis window in the time domain, but also the position of the window with respect to the production characteristics that is important for accurate analysis of speech signals. In this context, we propose an event-based approach for speech signals. We define the occurrence of events at the instants corresponding to significant excitation of the vocal tract system. Knowledge of these instants enable us to place the analysis window suitably for extracting the characteristics of the excitation source and the vocal tract system even from short segments of data. We present a method of extracting the instants of significant excitation from speech signals. We show that with the knowledge of these instants it is possible to perform prosodic manipulation of speech and also an accurate analysis of speech for extracting the source and system characteristics
A robust method for determining instants of major excitations in voiced speech
We propose a method for determining the instants of significant excitation in speech signals using the negative derivative of the unwrapped phase (group delay) function of the short time Fourier transform. Here significant excitation refers primarily to the instants of glottal closure in voiced speech. The method computes the average slope of the unwrapped phase spectrum as a function of time. The instants where the phase slope function makes a positive zero-crossing correspond to the major excitations in the signal. For an analysis window size in the range of one to two pitch periods, these instants coincide with the instants of glottal closure in each pitch period. The method is robust, as it depends only on the average phase slope value, and further, it depends only on the positive zero-crossing instants of the average phase slope functio
On the use of phase of the Fourier transform for face recognition under variations in illumination
In this paper, we propose a representation of the face image based on the phase of the 2-D Fourier transform of the image to overcome the adverse effect of illumination. The phase of the Fourier transform preserves the locations of the edges of a given face image. The main problem in the use of the phase spectrum is the need for unwrapping of the phase. The problem of unwrapping is avoided by considering two functions of the phase spectrum rather than the phase directly. Each of these functions gives partial evidence of the given face image. The effect of noise is reduced by using the first few eigenvectors of the eigenanalysis on the two phase functions separately. Experimental results on combining the evidences from the two phase functions show that the proposed method provides an alternative representation of the face images for dealing with the issue of illumination in face recognition
Voice Conversion by Prosody and Vocal Tract Modification
In this paper we proposed some exible methods, which are useful in the process of voice conversion. The pro-posed methods modify the shape of the vocal tract system and the characteristics of the prosody according to the de-sired requirement. The shape of the vocal tract system is modied by shifting the major resonant frequencies (for-mants) of the short term spectrum, and altering their band-widths accordingly. In the case of prosody modication, the required durational and intonational characteristics are im-posed on the given speech signal. In the proposed method, the prosodic characteristics are manipulated using instants of signicant excitation. The instants of signicant excita-tion correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excita-tions like onset of burst in the case of nonvoiced speech. Instants of signicant excitation are computed from the Lin-ear Prediction (LP) residual of the speech signals by using the property of average group delay of minimum phase sig-nals. The manipulations of durational characteristics and pitch contour (intonation pattern) are achieved by manipu-lating the LP residual with the help of the knowledge of the instants of signicant excitation. The modied LP residual is used to excite the time varying lter. The lter parameters are updated according to the desired vocal tract characteris-tics. The proposed methods are evaluated using listening tests. 1
Intonation modeling for Indian languages
Abstract In this paper we propose models for predicting the intonation for the sequence of syllables present in the utterance. The term intonation refers to the temporal changes of the fundamental frequency ðF 0 Þ. Neural networks are used to capture the implicit intonation knowledge in the sequence of syllables of an utterance. We focus on the development of intonation models for predicting the sequence of fundamental frequency values for a given sequence of syllables. Labeled broadcast news data in the languages Hindi, Telugu and Tamil is used to develop neural network models in order to predict the F 0 of syllables in these languages. The input to the neural network consists of a feature vector representing the positional, contextual and phonological constraints. The interaction between duration and intonation constraints can be exploited for improving the accuracy further. From the studies we find that 88% of the F 0 values (pitch) of the syllables could be predicted from the models within 15% of the actual F 0 . The performance of the intonation models is evaluated using objective measures such as average prediction error ðlÞ, standard deviation ðrÞ and correlation coefficient ðcÞ. The prediction accuracy of the intonation models is further evaluated using listening tests. The prediction performance of the proposed intonation models using neural networks is compared with Classification and Regression Tree (CART) models
Neural networks for contract bridge bidding
The objective of this study is to explore the possibility of capturing the reasoning process used in bidding a hand in a bridge game by an artificial neural network. We show that a multilayer feedforward neural network can be trained to learn to make an opening bid with a new hand. The game of bridge, like many other games used in artificial intelligence, can easily be represented in a machine. But, unlike most games used in artificial intelligence, bridge uses subtle reasoning over and above the agreed conventional system, to make a bid from the pattern of a given hand. Although it is difficult for a player to spell out the precise reasoning process he uses, we find that a neural network can indeed capture it. We demonstrate the results for the case of one-level opening bids, and discuss the need for a hierarchical architecture to deal with bids at all levels
- …