771,334 research outputs found
Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic
Language modeling for an inflected language
such as Arabic poses new challenges for speech recognition and
machine translation due to its rich morphology. Rich morphology
results in large increases in out-of-vocabulary (OOV) rate and
poor language model parameter estimation in the absence of large
quantities of data. In this study, we present a joint
morphological-lexical language model (JMLLM) that takes
advantage of Arabic morphology. JMLLM combines
morphological segments with the underlying lexical items and
additional available information sources with regards to
morphological segments and lexical items in a single joint model.
Joint representation and modeling of morphological and lexical
items reduces the OOV rate and provides smooth probability
estimates while keeping the predictive power of whole words.
Speech recognition and machine translation experiments in
dialectal-Arabic show improvements over word and morpheme
based trigram language models. We also show that as the
tightness of integration between different information sources
increases, both speech recognition and machine translation
performances improve
A Trie-Structured Bayesian Model for Unsupervised Morphological Segmentation
In this paper, we introduce a trie-structured Bayesian model for unsupervised
morphological segmentation. We adopt prior information from different sources
in the model. We use neural word embeddings to discover words that are
morphologically derived from each other and thereby that are semantically
similar. We use letter successor variety counts obtained from tries that are
built by neural word embeddings. Our results show that using different
information sources such as neural word embeddings and letter successor variety
as prior information improves morphological segmentation in a Bayesian model.
Our model outperforms other unsupervised morphological segmentation models on
Turkish and gives promising results on English and German for scarce resources.Comment: 12 pages, accepted and presented at the CICLING 2017 - 18th
International Conference on Intelligent Text Processing and Computational
Linguistic
A Simple Joint Model for Improved Contextual Neural Lemmatization
English verbs have multiple forms. For instance, talk may also appear as
talks, talked or talking, depending on the context. The NLP task of
lemmatization seeks to map these diverse forms back to a canonical one, known
as the lemma. We present a simple joint neural model for lemmatization and
morphological tagging that achieves state-of-the-art results on 20 languages
from the Universal Dependencies corpora. Our paper describes the model in
addition to training and decoding procedures. Error analysis indicates that
joint morphological tagging and lemmatization is especially helpful in
low-resource lemmatization and languages that display a larger degree of
morphological complexity. Code and pre-trained models are available at
https://sigmorphon.github.io/sharedtasks/2019/task2/.Comment: NAACL 201
Chart-driven Connectionist Categorial Parsing of Spoken Korean
While most of the speech and natural language systems which were developed
for English and other Indo-European languages neglect the morphological
processing and integrate speech and natural language at the word level, for the
agglutinative languages such as Korean and Japanese, the morphological
processing plays a major role in the language processing since these languages
have very complex morphological phenomena and relatively simple syntactic
functionality. Obviously degenerated morphological processing limits the usable
vocabulary size for the system and word-level dictionary results in exponential
explosion in the number of dictionary entries. For the agglutinative languages,
we need sub-word level integration which leaves rooms for general morphological
processing. In this paper, we developed a phoneme-level integration model of
speech and linguistic processings through general morphological analysis for
agglutinative languages and a efficient parsing scheme for that integration.
Korean is modeled lexically based on the categorial grammar formalism with
unordered argument and suppressed category extensions, and chart-driven
connectionist parsing method is introduced.Comment: 6 pages, Postscript file, Proceedings of ICCPOL'9
Generic model of morphological changes in growing colonies of fungi
Fungal colonies are able to exhibit different morphologies depending on the
enviromental conditions. This allows them to cope with and adapt to external
changes. When grown in solid or semi-solid media the bulk of the colony is
compact and several morphological transitions have been reported to occur as
the external conditions are varied. Here we show how a unified simple
mathematical model, which includes the effect of the accumulation of toxic
metabolites, can account for the morphological changes observed. Our numerical
results are in excellent agreement with experiments carried out with the fungus
Aspergillus oryzae on solid agar.Comment: 8 pages, 5 figures, uses epsf. Accepted in Phys Rev
- …
