158 research outputs found
Harmonic Liouville Theorem for Exterior Domains
AbstractWe give a very simple function theoretic proof to a Liouville type theorem for harmonic functions defined on exterior domains obtained and proved in a convexity theoretic method by F. Cammaroto and A. Chinnı̀. The theorem itself is also slightly generalized
The Use of F0 Reliability Function for Prosodic Command Analysis on F0 Contour Generation Model
This paper describes a method of utilizing an ``F0 Reliability Field'' (FRF), which we have proposed in our previous work, for estimating prosodic commands on F0 contour generation model. This FRF is the time-frequency representation of F0 likelihood, and an advantage of FRF is that it is not necessary to consider F0 errors that occur during an automatic F0 determination. Therefore, it is thought that FRF can be a more useful feature for automatic prosody analyses than F0 contour, and our previous paper has reported the validity of FRF on the analysis of detecting prosodic boundaries in Japanese continuous speech. Moreover, in this paper, we have examined the validity on the prosodic command estimation of superpositional model. Experimental results show that the accuracy of command estimation with FRF is well and it is close to the accuracy of command estimation with ideal F0 contour that has no F0 error
Accent Phrase Segmentation by Finding N-Best Sequences of Pitch Pattern Templates
This paper describes a prosodic method for segmenting continuous speech into accent phrases. Optimum sequences are obtained on the basis of least squared error criterion by using dynamic time warping between F0 contours of input speech and reference accent patterns called 'pitch pattern templates'. But the optimum sequence does not always give good agreement with phrase boundaries labeled by hand, while the second or the third optimum candidate sequence does well. Therefore, we expand our system to be able to find out multiple candidates by using N-best algorithm. Evaluation tests were carried out using the ATR continuous speech database of 10 speakers. The results showed about 97% of phrase boundaries were correctly detected when we took 30-best candidates, and this accuracy is 7.5% higher than the conventional method without using N-best search algorithm
On Representation of Fundamental Frequency of Speech for Prosody Analysis Using Reliability Function.
This paper highlights on a method that provides a new
prosodic feature called ‘F0 reliability field’ based on a reliability
function of the fundamental frequency (F0). The
proposed method does not employ any correction process
for F0 estimation errors that occur during automatic F0
extraction. By applying this feature as a score function
for prosodic analyses like prosodic structure estimation
or superpositional modeling of prosodic commands, these
prosodic information could be acquired with higher accuracy.
The feature has been applied to ‘F0 template matching
method’, which detects accent phrase boundaries in
Japanese continuous speech. The experimental results
show that compared to the conventional F0 contour, the
proposed feature overcomes the harmful influence caused
by F0 errors
Prosodic phrase segmentation by pitch pattern clustering
This paper proposes a novel method for detecting the optimal sequence of prosodic phrases from continuous speech based on data-driven approach. The pitch pattern of input speech is divided into prosodic segments which minimized the overall distortion with pitch pattern templates of accent phrases by using the One Pass search algorithm. The pitch pattern templates are designed by clustering a large number of training samples of accent phrases. On the ATR continuous speech database uttered by 10 speakers, the rate of correct segmentation was 91.7% maximum for the same sex data of training and testing, 88.6% for the opposite sex
Robust Pitch Detection by Narrow Band Spectrum Analysis
This paper proposes a new technique for detecting pitch patterns which is useful for automatic speech recognition, by using a narrow band spectrum analysis. The motivation of this approach is that humans perceive some kind of pitch in whispers where no fundamental frequencies can be observed, while most of the pitch determination algorithm (PDA) fails to detect such perceptual pitch. The narrow band spectrum analysis enable us to find pitch structure distributed locally in frequency domain. Incorporating this technique into PDA's is realized to applying the technique to the lag window based PDA. Experimental results show that pitch detection performance could be improved by 4% for voiced sounds and 8% for voiceless sounds
Modifed Minimum Classification Error Learning and Its Application to Neural Networks
A novel method to improve the generalization performance of the Minimum Classification Error (MCE) / Generalized Probabilistic Descent (GPD) learning is proposed. The MCE/GPD learning proposed by Juang and Katagiri in 1992 results in better recognition performance than the maximum-likelihood (ML) based learning in various areas of pattern recognition. Despite its superiority in recognition performance,
as well as other learning algorithms, it still suffers from the problem of "over-fitting" to the training samples. In the present study, a regularization technique has been employed to the MCE learning to overcome this problem. Feed-forward neural networks are employed as a recognition platform to evaluate the recognition performance of the proposed method. Recognition experiments are conducted on several sorts of data sets
- …