219 research outputs found
A Comparison of Different Machine Transliteration Models
Machine transliteration is a method for automatically converting words in one
language into phonetically equivalent ones in another language. Machine
transliteration plays an important role in natural language applications such
as information retrieval and machine translation, especially for handling
proper nouns and technical terms. Four machine transliteration models --
grapheme-based transliteration model, phoneme-based transliteration model,
hybrid transliteration model, and correspondence-based transliteration model --
have been proposed by several researchers. To date, however, there has been
little research on a framework in which multiple transliteration models can
operate simultaneously. Furthermore, there has been no comparison of the four
models within the same framework and using the same data. We addressed these
problems by 1) modeling the four models within the same framework, 2) comparing
them under the same conditions, and 3) developing a way to improve machine
transliteration through this comparison. Our comparison showed that the hybrid
and correspondence-based models were the most effective and that the four
models can be used in a complementary manner to improve machine transliteration
performance
Rule-based Korean Grapheme to Phoneme Conversion Using Sound Patterns
PACLIC 23 / City University of Hong Kong / 3-5 December 200
The Mason-Alberta Phonetic Segmenter: A forced alignment system based on deep neural networks and interpolation
Forced alignment systems automatically determine boundaries between segments
in speech data, given an orthographic transcription. These tools are
commonplace in phonetics to facilitate the use of speech data that would be
infeasible to manually transcribe and segment. In the present paper, we
describe a new neural network-based forced alignment system, the Mason-Alberta
Phonetic Segmenter (MAPS). The MAPS aligner serves as a testbed for two
possible improvements we pursue for forced alignment systems. The first is
treating the acoustic model in a forced aligner as a tagging task, rather than
a classification task, motivated by the common understanding that segments in
speech are not truly discrete and commonly overlap. The second is an
interpolation technique to allow boundaries more precise than the common 10 ms
limit in modern forced alignment systems. We compare configurations of our
system to a state-of-the-art system, the Montreal Forced Aligner. The tagging
approach did not generally yield improved results over the Montreal Forced
Aligner. However, a system with the interpolation technique had a 27.92%
increase relative to the Montreal Forced Aligner in the amount of boundaries
within 10 ms of the target on the test set. We also reflect on the task and
training process for acoustic modeling in forced alignment, highlighting how
the output targets for these models do not match phoneticians' conception of
similarity between phones and that reconciliation of this tension may require
rethinking the task and output targets or how speech itself should be
segmented.Comment: submitted for publicatio
- …