Search CORE

7 research outputs found

A Measure of Smoothness in Synthesized Speech

Author: Diep Nguyen Thi Bich
Hien Phung Thi Thu
Huong Pham Thi Mai
Nghia Phung Trung
Tao Nguyen Van
Publication venue: 'The Radio and Electronics Association of Vietnam (REV)'
Publication date: 16/09/2016
Field of study

The articulators typically move smoothly during speech production. Therefore, speech features of natural speech are generally smooth. However, over-smoothness causes "muffleness" and, hence, reduction in ability to identify emotions/expressions/styles in synthesized speech that can affect the perception of naturalness in synthesized speech. In the literature, statistical variances of static spectral features have been used as a measure of smoothness in synthesized speech but they are not sufficient enough. This paper proposes another measure of smoothness that can be efficiently applied to evaluate the smoothness of synthesized speech. Experiments showed that the proposed measure is reliable and efficient to measure the smoothness of different kinds of synthesized speech

REV Journal on Electronics and Communications

DEVELOPMENT OF HIGH-PERFORMANCE AND LARGE-SCALE VIETNAMESE AUTOMATIC SPEECH RECOGNITION SYSTEMS

Author: Mai Luong Chi
Phuong Pham Ngoc
Truong Do Quoc
Tung Tran Hoang
Publication venue: 'Publishing House for Science and Technology, Vietnam Academy of Science and Technology'
Publication date: 30/01/2019
Field of study

Automatic Speech Recognition (ASR) systems convert human speech into the corresponding transcription automatically. They have a wide range of applications such as controlling robots, call center analytics, voice chatbot. Recent studies on ASR for English have achieved the performance that surpasses human ability. The systems were trained on a large amount of training data and performed well under many environments. With regards to Vietnamese, there have been many studies on improving the performance of existing ASR systems, however, many of them are conducted on a small-scaled data, which does not reflect realistic scenarios. Although the corpora used to train the system were carefully design to maintain phonetic balance properties, efforts in collecting them at a large-scale are still limited. Specifically, only a certain accent of Vietnam was evaluated in existing works. In this paper, we first describe our efforts in collecting a large data set that covers all 3 major accents of Vietnam located in the Northern, Center, and Southern regions. Then, we detail our ASR system development procedure utilizing the collected data set and evaluating different model architectures to find the best structure for Vietnamese. In the VLSP 2018 challenge, our system achieved the best performance with 6.5% WER and on our internal test set with more than 10 hours of speech collected real environments, the system also performs well with 11% WE

Vietnam Academy of Science and Technology: Journals Online

Trích chọn các tham số đặc trưng tiếng nói cho hệ thống tổng hợp tiếng Việt dựa vào mô hình Markov ẩn

Author: Cường Dương Tử
Sơn Phan Thanh
Publication venue: 'Publishing House for Science and Technology, Vietnam Academy of Science and Technology'
Publication date: 01/04/2013
Field of study

Recently, the statistical framework based on Hidden Markov Models (HMMs) plays an important role in the speech synthesis method. The system can be built without requiring a very large speech corpus for training the system. In this method, statistical modeling is applied to learn distributions of context-dependent acoustic vectors extracted from speech signals, each vector contains a suitable parametric representation of one speech frame and Vietnamese phonetic rules to synthesize the speech. The overall performance of the systems is often limited by the accuracy of the underlying speech parameterization and reconstruction method. The method proposed in this paper allows accurate MFCC, F0 and tone extraction and high-quality reconstruction of speech signals assuming Mel Log Spectral Approximation filter. Its suitability for high-quality HMM-based speech synthesis is shown through evaluations subjectively.Phương pháp tổng hợp tiếng nói dựa trên mô hình Markov ẩn (HMM) chỉ cần một kho ngữ liệu tiếng nói thu âm sẵn đủ lớn (bao hàm tất cả các âm vị của một ngôn ngữ) để phục vụ cho mục đích huấn luyện. Trong phương pháp này, mô hình thống kê được sử dụng để mô hình hóa sự phân bố của các véc tơ âm thanh phụ thuộc ngữ cảnh, các véc tơ này được trích rút từ tín hiệu tiếng nói, mỗi véc tơ là một tham số đặc trưng cho khung tín hiệu và các qui tắc ngữ âm tiếng Việt, phục vụ cho quá trình tổng hợp tiếng nói. Hiệu quả của hệ thống bị hạn chế bởi mức độ chính xác khi tham số hóa các đặc trưng tiếng nói và phương pháp tái tạo tín hiệu tiếng nói từ những tham số này. Bài báo này giới thiệu một phương pháp trích chọn các tham số MFCC, F0 và tái tạo tín hiệu tiếng nói chất lượng cao sử dụng bộ lọc MLSA. Phương pháp này thích hợp cho tổng hợp tiếng nói dựa trên HMM và kết quả của nó được đánh giá qua thực tế là khá tốt so với một số phương pháp khác

Vietnam Academy of Science and Technology: Journals Online

HMM-Based Vietnamese Speech Synthesis

Author
Publication venue: 'IGI Global'
Publication date
Field of study

Crossref

An HMM-based Vietnamese speech synthesis system

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref