1,361 research outputs found
The Unsupervised Acquisition of a Lexicon from Continuous Speech
We present an unsupervised learning algorithm that acquires a
natural-language lexicon from raw speech. The algorithm is based on the optimal
encoding of symbol sequences in an MDL framework, and uses a hierarchical
representation of language that overcomes many of the problems that have
stymied previous grammar-induction procedures. The forward mapping from symbol
sequences to the speech stream is modeled using features based on articulatory
gestures. We present results on the acquisition of lexicons and language models
from raw speech, text, and phonetic transcripts, and demonstrate that our
algorithm compares very favorably to other reported results with respect to
segmentation performance and statistical efficiency.Comment: 27 page technical repor
Chinese information processing
A survey of the field of Chinese information processing is provided. It covers the following areas: the Chinese writing system, several popular Chinese encoding schemes and code conversions, Chinese keyboard entry methods, Chinese fonts, Chinese operating systems, basic Chinese computing techniques and applications
Speech Recognition by Composition of Weighted Finite Automata
We present a general framework based on weighted finite automata and weighted
finite-state transducers for describing and implementing speech recognizers.
The framework allows us to represent uniformly the information sources and data
structures used in recognition, including context-dependent units,
pronunciation dictionaries, language models and lattices. Furthermore, general
but efficient algorithms can used for combining information sources in actual
recognizers and for optimizing their application. In particular, a single
composition algorithm is used both to combine in advance information sources
such as language models and dictionaries, and to combine acoustic observations
and information sources dynamically during recognition.Comment: 24 pages, uses psfig.st
Development of efficient techniques for ASR System for Speech Detection and Recognization system using Gaussian Mixture Model- Universal Background Model
Some practical uses of ASR have been implemented, including the transcription of meetings and the usage of smart speakers. It is the process by which speech waves are transformed into text that allows computers to interpret and act upon human speech. Scalable strategies for developing ASR systems in languages where no voice transcriptions or pronunciation dictionaries exist are the primary focus of this work. We first show that the necessity for voice transcription into the target language can be greatly reduced through cross-lingual acoustic model transfer when phonemic pronunciation lexicons exist in the new language. Afterwards, we investigate three approaches to dealing with languages that lack a pronunciation lexicon. Secondly, we have a look at the efficiency of graphemic acoustic model transfer, which makes it easy to build pronunciation dictionaries. Thesis problems can be solved, in part, by investigating optimization strategies for training on huge corpora (such as GA+HMM and DE+HMM). In the training phase of acoustic modelling, the suggested method is applied to traditional methods. Read speech and HMI voice experiments indicated that while each data augmentation strategy alone did not always increase recognition performance, using all three techniques together did. Power normalised cepstral coefficient (PNCC) features are tweaked somewhat in this work to enhance verification accuracy. To increase speaker verification accuracy, we suggest employing multiple “Gaussian Mixture Model-Universal Background Model (GMM-UBM) and SVM classifiers”. Importantly, pitch shift data augmentation and multi-task training reduced bias by more than 18% absolute compared to the baseline system for read speech, and applying all three data augmentation techniques during fine tuning reduced bias by more than 7% for HMI speech, while increasing recognition accuracy of both native and non-native Dutch speech
Encryption by using base-n systems with many characters
It is possible to interpret text as numbers (and vice versa) if one interpret
letters and other characters as digits and assume that they have an inherent
immutable ordering. This is demonstrated by the conventional digit set of the
hexadecimal system of number coding, where the letters ABCDEF in this exact
alphabetic sequence stand each for a digit and thus a numerical value. In this
article, we consequently elaborate this thought and include all symbols and the
standard ordering of the unicode standard for digital character coding. We show
how this can be used to form digit sets of different sizes and how subsequent
simple conversion between bases can result in encryption mimicking results of
wrong encoding and accidental noise. Unfortunately, because of encoding
peculiarities, switching bases to a higher one does not necessarily result in
efficient disk space compression automatically.Comment: 12 pages, 6 figure
New Grapheme Generation Rules for Two-Stage Modelbased Grapheme-to-Phoneme Conversion
The precise conversion of arbitrary text into its  corresponding phoneme sequence (grapheme-to-phoneme or G2P conversion) is implemented in speech synthesis and recognition, pronunciation learning software, spoken term detection and spoken document retrieval systems. Because the quality of this module plays an important role in the performance of such systems and many problems regarding G2P conversion have been reported, we propose a novel two-stage model-based approach, which is implemented using an existing weighted finite-state transducer-based G2P conversion framework, to improve the performance of the G2P conversion model. The first-stage model is built for automatic conversion of words  to phonemes, while  the second-stage  model utilizes the input graphemes and output phonemes obtained from the first stage to determine the best final output phoneme sequence. Additionally, we designed new grapheme generation rules, which enable extra detail for the vowel and consonant graphemes appearing within a word. When compared with previous approaches, the evaluation results indicate that our approach using rules focusing on the vowel graphemes slightly improved the accuracy of the out-of-vocabulary dataset and consistently increased the accuracy of the in-vocabulary dataset
The Utilization of Digital Technologies in Learning Speaking Skills: Students’ Problems and Strategies at Islamic School
Effective communication through speaking is a crucial element in today's globalized world. As technology continues to advance, it is essential to incorporate digital tools to enhance and teach speaking skills. An analysis was conducted to investigate the challenges encountered by students while learning English speaking skills through the utilization of digital technologies at an Islamic High School. The study employed a case study methodology with questionnaires and in-depth interviews as data-gathering techniques. The outcomes of this research focused on the student's learning process of English speaking skills, the obstacles they encounter, and the strategies employed to optimize the use of digital technology in acquiring those skills. The demand for curriculum updates and teacher training may contribute to this trend. Urgent changes are needed in English learning to emphasize broader skill development instead of memorizing vocabulary and grammar. Educational institutions should consider these findings and improve English language instruction accordingly
Afrikaans learner's dictionaries for a multilingual South Africa
Dictionaries have to be compiled in accordance with the specific needs and demands of a well-defined target user. Within the multilingual and multicultural South African society dictionaries should be aimed at the needs of the different groups of language learners. This article discusses aspects of Afrikaans learner's dictionaries. The emphasis is on the need and criteria for such dictionaries, the typical target user and on the nature of the macro- and microstructural information to be included. In a learner's dictionary the information should be presented in such a way that it can be retrieved without problems. Attention is given to various access structures employed to enhance the retrievability of information. It is argued that a restricted and simplified microstructure leads to a decrease in the density of information but to an increase in the explicitness and retrievability. The article proposes a different approach to the inclusion of certain types of encyclopedic information in learner's dictionaries.Keywords: access structures, addressing, density of information, encyclopedic information, explicitness, illustrative examples, learner's dictionary, macrostructure, metalexicography, microstructure, pictorial illustrations, retrievability, simplicity, target user, translation equivalent
- …