3 research outputs found

    PLASER: Pronunciation Learning via Automatic Speech Recognition

    Get PDF
    PLASER is a multimedia tool with instant feedback designed to teach English pronunciation for high-school students of Hong Kong whose mother tongue is Cantonese Chinese. The objective is to teach correct pronunciation and not to assess a student's overall pronunciation quality. Major challenges related to speech recognition technology include: allowance for non-native accent, reliable and corrective feedbacks, and visualization of errors

    Pruning of State-Tying Tree using Bayesian Information Criterion with Multiple Mixtures

    No full text
    The use of context-dependent phonetic units together with Gaussian mixture models allows modern-day speech recognizer to build very complex and accurate acoustic models. However, because of data sparseness issue, some sharing of data across dierent triphone states is necessary. The acoustic model design is typically done in two stages, namely, designing the state-tying map and growing the number of mixtures in each tied-state. In the design of the state-tying map, single Gaussians are used to represent the data, ignoring the fact that a single Gaussian is an insucient model. In this paper, we propose a simple modication to the two-stage process by adding a third stage. In this added stage, the state-tying tree is pruned and the pruning is based on the mixture representation of the tied-states. We propose using Bayesian Information Criterion(BIC) as the criterion for this pruning and show that by adding this step, the resulting model is more compact and gives better recognition accura..

    Joint optimization of the frequency-domain and time-domain transformations in deriving generalized static and dynamic MFCCs

    No full text
    Traditionally, static mel-frequency cepstral coefficients (MFCCs) are derived by discrete cosine transformation (DCT), and dynamic MFCCs are derived by linear regression. Their derivation may be generalized as a frequency-domain transformation of the log filter-bank energies (FBEs) followed by a time-domain transformation. In. the past, these two transformations are usually estimated or optimized separately. In this letter, we consider sequences of log FBEs as a set of spectrogram images and investigate an image compression technique to jointly optimize the two transformations so that the reconstruction error of the spectrogram images is minimized; there is an efficient algorithm that solves the optimization problem. The framework allows extension to other optimization costs as well
    corecore