2,540 research outputs found
Compositional Morphology for Word Representations and Language Modelling
This paper presents a scalable method for integrating compositional
morphological representations into a vector-based probabilistic language model.
Our approach is evaluated in the context of log-bilinear language models,
rendered suitably efficient for implementation inside a machine translation
decoder by factoring the vocabulary. We perform both intrinsic and extrinsic
evaluations, presenting results on a range of languages which demonstrate that
our model learns morphological representations that both perform well on word
similarity tasks and lead to substantial reductions in perplexity. When used
for translation into morphologically rich languages with large vocabularies,
our models obtain improvements of up to 1.2 BLEU points relative to a baseline
system using back-off n-gram models.Comment: Proceedings of the 31st International Conference on Machine Learning
(ICML
Investigating collocational priming in Turkish
This is the final version of the article. Available from the publisher via the link in this record.Several attempts have been made to illustrate the organization of the monolingual mental lexicon and each model proposed so far has highlighted different aspects of lexical processing. What they have in common is the fact that their depictions rely on single lexical items and paradigmatic relations come to the fore in their explanations. Hoey’s lexical priming theory (2005) tries to shed light on the issue of collocational processing in the internal lexicon from a cognitive and psycholinguistic perspective and its importance for our overall creative language production. A number of psycholinguistic studies have tested Hoey's theory as it relates to English, but work in other languages is limited. The present study broadens the scope of work in this area by investigating whether collocational priming also holds for speakers of Turkish. Furthermore, the possible influence of frequency and part of speech on collocational priming is scrutinized by exploring the correlations between response times in the priming experiment and these independent variables. The findings revealed a significant collocational priming effect for Turkish L1 users, in line with Hoey’s claims. The regression analysis indicated frequency and part of speech as important predictors of processing duration. The correlation analysis also showed significant correlations between the response times and both word and collocational frequency. A tentative mental lexicon framework is proposed based on the findings of this research
Recommended from our members
Minimally supervised induction of morphology through bitexts
textA knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems.
Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one language–the source language–to another language–the target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis.
While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics.Linguistic
- …