696 research outputs found

    Polish children's productivity with case marking: the role of regularity, type frequency, and phonological diversity

    Get PDF
    Polish-speaking children aged from 2;4, to 4;8 and 16 adult controls participated in a nonce-word inflection experiment testing their ability to use the genitive, dative and accusative inflections productively. Results show that this ability develops early: the majority of two-year-olds were already productive with all inflections apart from dative neuter; and the overall performance of the four-year-olds was very similar to that of adults. All age groups were more productive with inflections that apply to large and/or phonologically diverse classes, although class size and token frequency appeared to be more important for younger children (two- and three-year-olds) and phonological diversity for older children and adults. Regularity, on the other hand, was a very poor predictor of productivity. The results support usage-based models of language acquisition and are problematic for the dual mechanism model

    A Challenge Set Approach to Evaluating Machine Translation

    Full text link
    Neural machine translation represents an exciting leap forward in translation quality. But what longstanding weaknesses does it resolve, and which remain? We address these questions with a challenge set approach to translation evaluation and error analysis. A challenge set consists of a small set of sentences, each hand-designed to probe a system's capacity to bridge a particular structural divergence between languages. To exemplify this approach, we present an English-French challenge set, and use it to analyze phrase-based and neural systems. The resulting analysis provides not only a more fine-grained picture of the strengths of neural systems, but also insight into which linguistic phenomena remain out of reach.Comment: EMNLP 2017. 28 pages, including appendix. Machine readable data included in a separate file. This version corrects typos in the challenge se

    A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

    Full text link
    Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introduces a la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transformation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable on the fly in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how the a la carte method requires fewer examples of words in context to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.Comment: 11 pages, 2 figures, To appear in ACL 201

    Polish children's productivity with case marking: the role of regularity, type frequency, and phonological diversity

    Get PDF
    57 Polish-speaking children aged from 2;4, to 4;8 and 16 adult controls participated in a nonce-word inflection experiment testing their ability to use the genitive, dative and accusative inflections productively. Results show that this ability develops early: the majority of two-year olds were already productive with all inflections apart from dative neuter; and the overall performance of the four-year-olds was very similar to that of adults. All age groups were more productive with inflections that apply to large and/or phonologically diverse classes, although class size and token frequency appeared to be more important for younger children (two- and three-year-olds) and phonological diversity for older children and adults. Regularity, on the other hand, was a very poor predictor of productivity. The results support usage-based models of language acquisition and are problematic for the dual mechanism model.
    corecore