12 research outputs found

    Incorporating Duration and Intonation Models in Filipino Speech Synthesis

    Get PDF
    In this paper we describe the development of an intonation model and a duration model to generate prosody for the Filipino language. Z-scores of normalized durations are used for the duration model and the Tilt parameters are used for the intonation model. The Filipino Speech Corpus (FSC) is the source of statistical data for modeling the duration and intonation. A Classification and Regression Tree (CART) generator is used to build the model for duration and intonation. The Harmonic plus Noise Model (HNM) is developed for the FSC. The diphones are concatenated to produce the synthetic speech and HNM is used to modify the prosody. The synthesized speech is evaluated using the Mean Opinion Score (MOS). Results show that the duration model and the intonation model needs improvement. HNM synthesis performs slightly better than TD-PSOLA (time-domain pitch synchronous overlap-add).APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 4-7 October 2009. Sapporo, Japan. Oral session: Speech and Music Processing (5 October 2009)

    On-device implementation of an automatic Filipino speech recognition system

    No full text
    This paper describes the preliminary results in an effort to develop a speaker-independent Filipino Speech-to-Text (STT) application on a smartphone. The system is composed of a front-end Symbian C++ application that was built to run on Symbian OS S60 3rd Edition smartphones. The application covers significant portions of a standard speech recognition system, namely, capture of speech input, feature extraction using PLP-RASTA directly converted to cepstral values, and posterior probabilities estimation using a multi-layer perceptron. Acoustic and language models were produced using different LM tools. To obtain the accuracy of the system, it was tested with a Linux-based decoder (OWAY) from SPRACHcore, a full source code release of speech recognition tools from ICSI, which was ported to run on a Windows-based desktop computer. Tests were done in a moderately noisy environment. System performance had average recognition rates of 53.94% and 6.15% for word-and sentence-level recognition, respectively

    Experiments and Pilot Study Evaluating the Performance of Reading Miscue Detector and Automated Reading Tutor for Filipino: A Children's Speech Technology for Improving Literacy

    No full text
    The latest advances in speech processing technology have allowed the development of automated reading tutors (ART) for improving children's literacy. An ART is a computer-assisted learning system based on oral reading fluency (ORF) instruction and automated speech recognition (ASR) technology. However, the design of an ART system is language-specif ic, and thus, requires developing a system specif ically for the Filipino language. In a previous work, the authors have presented the development of the children's Filipino speech corpus (CFSC) for the purpose of designing an ART in Filipino. In this paper, the authors present the evaluation of the ART in Filipino which integrates a reference verification (RV)- and word duration analysis-based reading miscue detector (RMD), a user interface, and a feedback and instruction set. The authors also present the performance evaluation of the RMD in offline tests, and the effectiveness of the ART as shown by the results of the intervention program, a month-long pilot study that involved the use of the ART by a small group of students. Offline test results show that the RMD's performance (i.e., FA rate ≈ 3% and MDerr rate ≈ 5%) is at par with those from state-of-the-art RMDs reported in the literature. The results of the ART intervention experiment showed that the students, on the average, have improved in their words correct per minute (WCPM) rate by 4.66 times, in their ORF-16 scores by 6.0 times, and in their reading comprehension exam scores by 4.4 times, after using the ART

    Development, implementation and testing of language identification system for seven Philippine languages

    No full text
    Three Language Identification (LID)approaches, namely, acoustic, phonotactic, and prosodic approaches are explored for Philippine Languages. Gaussian Mixture Models (GMM) is used for acoustic and prosodic approaches. The acoustic features used were Mel Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), Shifted Delta Cepstra (SDC) and Linear Prediction Cepstral Coefficients (LPCC). Pitch, rhythm, and energy are used as prosodic features. A Phone Recognition followed by Language Modelling (PRLM) and Parallel Phone Recognition followed by Language Modelling (PPRLM) are used for the phonotactic approach. After establishing that acoustic approach using a 32nd order PLP GMM-EM achieved the best performanceamong the combinations of approach and feature, three LID systems were built: 7-language LID, pair-wise LID and hierarchical LID; with average accuracy of 48.07%, 72.64% and 53.99%, respectively. Among the pair-wise LID systems the highest accuracy is 92.23% for Tagalog and Hiligaynon and the lowest accuracy is 52.21% for Bicolano and Tausug. In the hierarchical LID system, the accuracy for Tagalog, Cebuano, Bicolano, and Hiligaynon reached 80.56%, 80.26%, 78.26%, and 60.87% respectively. The LID systems that were designed, implemented and tested, are best suited for language verification or for language identification systems with small number of target languages that are closely related such as Philippine languages. © 2015, Science and Technology Information Institute. All rights reserved

    Real time karaoke grader using modal distribution for android devices

    No full text
    The increasing market for mobile devices requires us to investigate the viability of writing computationally expensive digital signal processing applications for these devices. Modal Distribution was used to acquire a time-frequency representation to evaluate singing ability using a karaoke application. Pitch and tempo were both used as basis for grading. The trade-off between the time-frequency resolution and the execution time of the algorithm especially for devices with slower processors was studied to observe the optimal parameters for Modal Distribution. A resolution of 10 milliseconds for tempo and 31.25 Hz were thus obtained

    Incorporating Duration and Intonation Models in Filipino Speech Synthesis

    No full text

    Development of Feature Set, Classification Implementation and Applications for Vowel Migration/Modification in Sung Filipino (Tagalog) Texts and Perceived Intelligibility

    No full text
    With the emergence of research on real-time visual feedback to supplement vocal pedagogy, the utilization of technology in the world of music is now seen to accelerate skills learning and enhance cognitive development. The researchers of this project aim to further analyze vowel intelligibility and develop software applications intended to be used not only by professional singers but also by individuals who wish to improve their singing capability. Data in the form of sung vowels and song pieces were obtained from 46 singers. A Listening Test was then conducted on these samples to obtain the ground truth for vowel classification based on human perception. Simulation of the human auditory perception of sung Filipino vowels was performed using formant frequencies and Mel-frequency cepstral coefficients as feature vector inputs to a two-stage Discriminant Analysis classifier. The setup resulted in an over-all Training Set accuracy of 89.4% and an over-all Test Set accuracy of 90.9%. The accuracy of the classifier, measured in terms of the correspondence of vowel classifications obtained from the classifier with the results of the Listening Test, reached 92.3%. Using information obtained from the classifier, offline and online/real-time software applications were developed. The main application features include the display of the spectral envelope and spectrogram, pitch and vibrato analysis and direct feedback on the classification of the sung vowel. These features were recommended by singers who were surveyed and were incorporated in the applications to aid singers to adjust formant locations, directly determine listener’s perception of sung vowels, perform modeling effectively and carry out vowel migration
    corecore