1,361 research outputs found

    The Unsupervised Acquisition of a Lexicon from Continuous Speech

    Get PDF
    We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.Comment: 27 page technical repor

    Chinese information processing

    Full text link
    A survey of the field of Chinese information processing is provided. It covers the following areas: the Chinese writing system, several popular Chinese encoding schemes and code conversions, Chinese keyboard entry methods, Chinese fonts, Chinese operating systems, basic Chinese computing techniques and applications

    Speech Recognition by Composition of Weighted Finite Automata

    Full text link
    We present a general framework based on weighted finite automata and weighted finite-state transducers for describing and implementing speech recognizers. The framework allows us to represent uniformly the information sources and data structures used in recognition, including context-dependent units, pronunciation dictionaries, language models and lattices. Furthermore, general but efficient algorithms can used for combining information sources in actual recognizers and for optimizing their application. In particular, a single composition algorithm is used both to combine in advance information sources such as language models and dictionaries, and to combine acoustic observations and information sources dynamically during recognition.Comment: 24 pages, uses psfig.st

    Development of efficient techniques for ASR System for Speech Detection and Recognization system using Gaussian Mixture Model- Universal Background Model

    Get PDF
    Some practical uses of ASR have been implemented, including the transcription of meetings and the usage of smart speakers. It is the process by which speech waves are transformed into text that allows computers to interpret and act upon human speech. Scalable strategies for developing ASR systems in languages where no voice transcriptions or pronunciation dictionaries exist are the primary focus of this work. We first show that the necessity for voice transcription into the target language can be greatly reduced through cross-lingual acoustic model transfer when phonemic pronunciation lexicons exist in the new language. Afterwards, we investigate three approaches to dealing with languages that lack a pronunciation lexicon. Secondly, we have a look at the efficiency of graphemic acoustic model transfer, which makes it easy to build pronunciation dictionaries. Thesis problems can be solved, in part, by investigating optimization strategies for training on huge corpora (such as GA+HMM and DE+HMM). In the training phase of acoustic modelling, the suggested method is applied to traditional methods. Read speech and HMI voice experiments indicated that while each data augmentation strategy alone did not always increase recognition performance, using all three techniques together did. Power normalised cepstral coefficient (PNCC) features are tweaked somewhat in this work to enhance verification accuracy. To increase speaker verification accuracy, we suggest employing multiple “Gaussian Mixture Model-Universal Background Model (GMM-UBM) and SVM classifiers”. Importantly, pitch shift data augmentation and multi-task training reduced bias by more than 18% absolute compared to the baseline system for read speech, and applying all three data augmentation techniques during fine tuning reduced bias by more than 7% for HMI speech, while increasing recognition accuracy of both native and non-native Dutch speech

    Encryption by using base-n systems with many characters

    Full text link
    It is possible to interpret text as numbers (and vice versa) if one interpret letters and other characters as digits and assume that they have an inherent immutable ordering. This is demonstrated by the conventional digit set of the hexadecimal system of number coding, where the letters ABCDEF in this exact alphabetic sequence stand each for a digit and thus a numerical value. In this article, we consequently elaborate this thought and include all symbols and the standard ordering of the unicode standard for digital character coding. We show how this can be used to form digit sets of different sizes and how subsequent simple conversion between bases can result in encryption mimicking results of wrong encoding and accidental noise. Unfortunately, because of encoding peculiarities, switching bases to a higher one does not necessarily result in efficient disk space compression automatically.Comment: 12 pages, 6 figure

    New Grapheme Generation Rules for Two-Stage Modelbased Grapheme-to-Phoneme Conversion

    Get PDF
    The precise conversion of arbitrary text into its  corresponding phoneme sequence (grapheme-to-phoneme or G2P conversion) is implemented in speech synthesis and recognition, pronunciation learning software, spoken term detection and spoken document retrieval systems. Because the quality of this module plays an important role in the performance of such systems and many problems regarding G2P conversion have been reported, we propose a novel two-stage model-based approach, which is implemented using an existing weighted finite-state transducer-based G2P conversion framework, to improve the performance of the G2P conversion model. The first-stage model is built for automatic conversion of words  to phonemes, while  the second-stage  model utilizes the input graphemes and output phonemes obtained from the first stage to determine the best final output phoneme sequence. Additionally, we designed new grapheme generation rules, which enable extra detail for the vowel and consonant graphemes appearing within a word. When compared with previous approaches, the evaluation results indicate that our approach using rules focusing on the vowel graphemes slightly improved the accuracy of the out-of-vocabulary dataset and consistently increased the accuracy of the in-vocabulary dataset

    The Utilization of Digital Technologies in Learning Speaking Skills: Students’ Problems and Strategies at Islamic School

    Get PDF
    Effective communication through speaking is a crucial element in today's globalized world. As technology continues to advance, it is essential to incorporate digital tools to enhance and teach speaking skills. An analysis was conducted to investigate the challenges encountered by students while learning English speaking skills through the utilization of digital technologies at an Islamic High School. The study employed a case study methodology with questionnaires and in-depth interviews as data-gathering techniques. The outcomes of this research focused on the student's learning process of English speaking skills, the obstacles they encounter, and the strategies employed to optimize the use of digital technology in acquiring those skills. The demand for curriculum updates and teacher training may contribute to this trend. Urgent changes are needed in English learning to emphasize broader skill development instead of memorizing vocabulary and grammar. Educational institutions should consider these findings and improve English language instruction accordingly

    Afrikaans learner's dictionaries for a multilingual South Africa

    Get PDF
    Dictionaries have to be compiled in accordance with the specific needs and demands of a well-defined target user. Within the multilingual and multicultural South African society dictionaries should be aimed at the needs of the different groups of language learners. This article discusses aspects of Afrikaans learner's dictionaries. The emphasis is on the need and criteria for such dictionaries, the typical target user and on the nature of the macro- and microstructural information to be included. In a learner's dictionary the information should be presented in such a way that it can be retrieved without problems. Attention is given to various access structures employed to enhance the retrievability of information. It is argued that a restricted and simplified microstructure leads to a decrease in the density of information but to an increase in the explicitness and retrievability. The article proposes a different approach to the inclusion of certain types of encyclopedic information in learner's dictionaries.Keywords: access structures, addressing, density of information, encyclopedic information, explicitness, illustrative examples, learner's dictionary, macrostructure, metalexicography, microstructure, pictorial illustrations, retrievability, simplicity, target user, translation equivalent
    • …
    corecore