361 research outputs found

    One-Shot Neural Cross-Lingual Transfer for Paradigm Completion

    Full text link
    We present a novel cross-lingual transfer method for paradigm completion, the task of mapping a lemma to its inflected forms, using a neural encoder-decoder model, the state of the art for the monolingual task. We use labeled data from a high-resource language to increase performance on a low-resource language. In experiments on 21 language pairs from four different language families, we obtain up to 58% higher accuracy than without transfer and show that even zero-shot and one-shot learning are possible. We further find that the degree of language relatedness strongly influences the ability to transfer morphological knowledge.Comment: Accepted at ACL 201

    Reconstructing Native Language Typology from Foreign Language Usage

    Get PDF
    Linguists and psychologists have long been studying cross-linguistic transfer, the influence of native language properties on linguistic performance in a foreign language. In this work we provide empirical evidence for this process in the form of a strong correlation between language similarities derived from structural features in English as Second Language (ESL) texts and equivalent similarities obtained from the typological features of the native languages. We leverage this finding to recover native language typological similarity structure directly from ESL text, and perform prediction of typological features in an unsupervised fashion with respect to the target languages. Our method achieves 72.2% accuracy on the typology prediction task, a result that is highly competitive with equivalent methods that rely on typological resources.Comment: CoNLL 201

    Towards cognitively plausible data science in language research

    Get PDF
    Over the past 10 years, Cognitive Linguistics has taken a Quantitative Turn. Yet, concerns have been raised that this preoccupation with quantification and modelling may not bring us any closer to understanding how language works. We show that this objection is unfounded, especially if we rely on modelling techniques based on biologically and psychologically plausible learning algorithms. These make it possible to take a quantitative approach, while generating and testing specific hypotheses that will advance our understanding of how knowledge of language emerges from exposure to usage

    The Missing Link between Morphemic Assemblies and Behavioral Responses:a Bayesian Information-Theoretical model of lexical processing

    Get PDF
    We present the Bayesian Information-Theoretical (BIT) model of lexical processing: A mathematical model illustrating a novel approach to the modelling of language processes. The model shows how a neurophysiological theory of lexical processing relying on Hebbian association and neural assemblies can directly account for a variety of effects previously observed in behavioural experiments. We develop two information-theoretical measures of the distribution of usages of a morpheme or word, and use them to predict responses in three visual lexical decision datasets investigating inflectional morphology and polysemy. Our model offers a neurophysiological basis for the effects of morpho-semantic neighbourhoods. These results demonstrate how distributed patterns of activation naturally result in the arisal of symbolic structures. We conclude by arguing that the modelling framework exemplified here, is a powerful tool for integrating behavioural and neurophysiological results

    Producing power-law distributions and damping word frequencies with two-stage language models

    Get PDF
    Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statisticalmodels that can generically produce power laws, breaking generativemodels into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes-the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process-that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.48 page(s

    The Paradigm Discovery Problem

    Full text link
    This work treats the paradigm discovery problem (PDP), the task of learning an inflectional morphological system from unannotated sentences. We formalize the PDP and develop evaluation metrics for judging systems. Using currently available resources, we construct datasets for the task. We also devise a heuristic benchmark for the PDP and report empirical results on five diverse languages. Our benchmark system first makes use of word embeddings and string similarity to cluster forms by cell and by paradigm. Then, we bootstrap a neural transducer on top of the clustered data to predict words to realize the empty paradigm slots. An error analysis of our system suggests clustering by cell across different inflection classes is the most pressing challenge for future work. Our code and data are available for public use.Comment: Forthcoming at ACL 202

    Discrimination in lexical decision.

    Get PDF
    In this study we present a novel set of discrimination-based indicators of language processing derived from Naive Discriminative Learning (ndl) theory. We compare the effectiveness of these new measures with classical lexical-distributional measures-in particular, frequency counts and form similarity measures-to predict lexical decision latencies when a complete morphological segmentation of masked primes is or is not possible. Data derive from a re-analysis of a large subset of decision latencies from the English Lexicon Project, as well as from the results of two new masked priming studies. Results demonstrate the superiority of discrimination-based predictors over lexical-distributional predictors alone, across both the simple and primed lexical decision tasks. Comparable priming after masked corner and cornea type primes, across two experiments, fails to support early obligatory segmentation into morphemes as predicted by the morpho-orthographic account of reading. Results fit well with ndl theory, which, in conformity with Word and Paradigm theory, rejects the morpheme as a relevant unit of analysis. Furthermore, results indicate that readers with greater spelling proficiency and larger vocabularies make better use of orthographic priors and handle lexical competition more efficiently

    The thermodynamics of human reaction times

    Get PDF
    I present a new approach for the interpretation of reaction time (RT) data from behavioral experiments. From a physical perspective, the entropy of the RT distribution provides a model-free estimate of the amount of processing performed by the cognitive system. In this way, the focus is shifted from the conventional interpretation of individual RTs being either long or short, into their distribution being\ud more or less complex in terms of entropy. The new approach enables the estimation of the cognitive processing load without reference to the informational content of the stimuli themselves, thus providing a more appropriate estimate of the cognitive impact of dierent sources of information that are carried by experimental stimuli or tasks. The paper introduces the formulation of the theory, followed by an empirical validation using a database of human RTs in lexical tasks (visual lexical decision and word\ud naming). The results show that this new interpretation of RTs is more powerful than the traditional one. The method provides theoretical estimates of the processing loads elicited by individual stimuli. These loads sharply distinguish the responses from different tasks. In addition, it provides upper-bound estimates for the speed at which the system processes information. Finally, I argue that the theoretical proposal, and the associated empirical evidence, provide strong arguments for an adaptive system that systematically adjusts its operational processing speed to the particular demands of each stimulus. This\ud finding is in contradiction with Hick's law, which posits a relatively constant processing speed within an experimental context

    Methods and algorithms for unsupervised learning of morphology

    Get PDF
    This is an accepted manuscript of a chapter published by Springer in Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403 in 2014 available online: https://doi.org/10.1007/978-3-642-54906-9_15 The accepted version of the publication may differ from the final published version.This paper is a survey of methods and algorithms for unsupervised learning of morphology. We provide a description of the methods and algorithms used for morphological segmentation from a computational linguistics point of view. We survey morphological segmentation methods covering methods based on MDL (minimum description length), MLE (maximum likelihood estimation), MAP (maximum a posteriori), parametric and non-parametric Bayesian approaches. A review of the evaluation schemes for unsupervised morphological segmentation is also provided along with a summary of evaluation results on the Morpho Challenge evaluations.Published versio
    corecore