774 research outputs found

    Neural Network vs. Rule-Based G2P: A Hybrid Approach to Stress Prediction and Related Vowel Reduction in Bulgarian

    Full text link
    An effective grapheme-to-phoneme (G2P) conversion system is a critical element of speech synthesis. Rule-based systems were an early method for G2P conversion. In recent years, machine learning tools have been shown to outperform rule-based approaches in G2P tasks. We investigate neural network sequence-to-sequence modeling for the prediction of syllable stress and resulting vowel reductions in the Bulgarian language. We then develop a hybrid G2P approach which combines manually written grapheme-to-phoneme mapping rules with neural network-enabled syllable stress predictions by inserting stress markers in the predicted stress position of the transcription produced by the rule-based finite-state transducer. Finally, we apply vowel reduction rules in relation to the position of the stress marker to yield the predicted phonetic transcription of the source Bulgarian word written in Cyrillic graphemes. We compare word error rates between the neural network sequence-to-sequence modeling approach with the hybrid approach and find no significant difference between the two. We conclude that our hybrid approach to syllable stress, vowel reduction, and transcription performs as well as the exclusively machine learning powered approach

    Voice Identification Using Classification Algorithms

    Get PDF
    This article discusses the classification algorithms for the problem of personality identification by voice using machine learning methods. We used the MFCC algorithm in the speech preprocessing process. To solve the problem, a comparative analysis of five classification algorithms was carried out. In the first experiment, the support vector method was determined—0.90 and multilayer perceptron—0.83, that showed the best results. In the second experiment, a multilayer perceptron with an accuracy of 0.93 was proposed using the Robust scaler method for personal identification. Therefore, to solve this problem, it is possible to use a multi-layer perceptron, taking into account the specifics of the speech signal
    • …
    corecore