2,123 research outputs found
Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic
Language modeling for an inflected language
such as Arabic poses new challenges for speech recognition and
machine translation due to its rich morphology. Rich morphology
results in large increases in out-of-vocabulary (OOV) rate and
poor language model parameter estimation in the absence of large
quantities of data. In this study, we present a joint
morphological-lexical language model (JMLLM) that takes
advantage of Arabic morphology. JMLLM combines
morphological segments with the underlying lexical items and
additional available information sources with regards to
morphological segments and lexical items in a single joint model.
Joint representation and modeling of morphological and lexical
items reduces the OOV rate and provides smooth probability
estimates while keeping the predictive power of whole words.
Speech recognition and machine translation experiments in
dialectal-Arabic show improvements over word and morpheme
based trigram language models. We also show that as the
tightness of integration between different information sources
increases, both speech recognition and machine translation
performances improve
Towards Understanding Egyptian Arabic Dialogues
Labelling of user's utterances to understanding his attends which called
Dialogue Act (DA) classification, it is considered the key player for dialogue
language understanding layer in automatic dialogue systems. In this paper, we
proposed a novel approach to user's utterances labeling for Egyptian
spontaneous dialogues and Instant Messages using Machine Learning (ML) approach
without relying on any special lexicons, cues, or rules. Due to the lack of
Egyptian dialect dialogue corpus, the system evaluated by multi-genre corpus
includes 4725 utterances for three domains, which are collected and annotated
manually from Egyptian call-centers. The system achieves F1 scores of 70. 36%
overall domains.Comment: arXiv admin note: substantial text overlap with arXiv:1505.0308
- …