1,933 research outputs found

    Production Methods

    Get PDF

    Arbitrariness, iconicity, and systematicity in language

    Get PDF
    The notion that the form of a word bears an arbitrary relation to its meaning accounts only partly for the attested relations between form and meaning in the languages of the world. Recent research suggests a more textured view of vocabulary structure, in which arbitrariness is complemented by iconicity (aspects of form resemble aspects of meaning) and systematicity (statistical regularities in forms predict function). Experimental evidence suggests these form-to-meaning correspondences serve different functions in language processing, development, and communication: systematicity facilitates category learning by means of phonological cues, iconicity facilitates word learning and communication by means of perceptuomotor analogies, and arbitrariness facilitates meaning individuation through distinctive forms. Processes of cultural evolution help to explain how these competing motivations shape vocabulary structure

    Pragmatic Constraint on Distributional Semantics

    Full text link
    This paper studies the limits of language models' statistical learning in the context of Zipf's law. First, we demonstrate that Zipf-law token distribution emerges irrespective of the chosen tokenization. Second, we show that Zipf distribution is characterized by two distinct groups of tokens that differ both in terms of their frequency and their semantics. Namely, the tokens that have a one-to-one correspondence with one semantic concept have different statistical properties than those with semantic ambiguity. Finally, we demonstrate how these properties interfere with statistical learning procedures motivated by distributional semantics

    A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

    Full text link
    This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. In addition to being one of the largest corpora available for the task of NLI, at 433k examples, this corpus improves upon available resources in its coverage: it offers data from ten distinct genres of written and spoken English--making it possible to evaluate systems on nearly the full complexity of the language--and it offers an explicit setting for the evaluation of cross-genre domain adaptation.Comment: 10 pages, 1 figures, 5 tables. v2 corrects a misreported accuracy number for the CBOW model in the 'matched' setting. v3 adds a discussion of the difficulty of the corpus to the analysis section. v4 is the version that was accepted to NAACL201

    MetaLDA: a Topic Model that Efficiently Incorporates Meta information

    Full text link
    Besides the text content, documents and their associated words usually come with rich sets of meta informa- tion, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence information in the training data is insufficient. In this paper, we present a topic model, called MetaLDA, which is able to leverage either document or word meta information, or both of them jointly. With two data argumentation techniques, we can derive an efficient Gibbs sampling algorithm, which benefits from the fully local conjugacy of the model. Moreover, the algorithm is favoured by the sparsity of the meta information. Extensive experiments on several real world datasets demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic quality, particularly in handling sparse texts. In addition, compared with other models using meta information, our model runs significantly faster.Comment: To appear in ICDM 201

    Pitch enhancement facilitates word learning across visual contexts

    Get PDF
    This study investigates word-learning using a new model that integrates three processes: a) extracting a word out of a continuous sound sequence, b) inferring its referential meanings in context, c) mapping the segmented word onto its broader intended referent, such as other objects of the same semantic category, and to novel utterances. Previous work has examined the role of statistical learning and/or of prosody in each of these processes separately. Here, we combine these strands of investigation into a single experimental approach, in which participants viewed a photograph belonging to one of three semantic categories while hearing a complex, five-syllable utterance containing a one-syllable target word. Six between-subjects conditions were tested with 20 adult participants each. In condition 1, the only cue to word-meaning mapping was the co-occurrence of word and referents. This statistical cue was present in all conditions. In condition 2, the target word was sounded at a higher pitch. In condition 3, random one-syllable words were sounded at a higher pitch, creating an inconsistent cue. In condition 4, the duration of the target word was lengthened. In conditions 5 and 6, an extraneous acoustic cue and a visual cue were associated with the target word, respectively. Performance in this word-learning task was significantly higher than that observed with simple co-occurrence only when pitch prominence consistently marked the target word. We discuss implications for the intentional value of pitch marking as well as the relevance of our findings to language acquisition and language evolution

    Early Linguistic Interactions: Distributional Properties of Verbs in Syntactic Patterns

    Full text link
    Honors (Bachelor's)LinguisticsUniversity of Michiganhttp://deepblue.lib.umich.edu/bitstream/2027.42/120575/1/liamc.pd

    Judging words by their covers and the company they keep: probabilistic cues support word learning.

    Get PDF
    Statistical learning may be central to lexical and grammatical development. The phonological and distributional properties of words provide probabilistic cues to their grammatical and semantic properties. Infants can capitalize on such probabilistic cues to learn grammatical patterns in listening tasks. However, infants often struggle to learn labels when performance requires attending to less obvious cues, raising the question of whether probabilistic cues support word learning. The current experiment presented 22-month-olds with an artificial language containing probabilistic correlations between words' statistical and semantic properties. Only infants with higher levels of grammatical development capitalized on statistical cues to support learning word-referent mappings. These findings suggest that infants' sensitivity to correlations between sounds and meanings may support both word learning and grammatical development
    • …
    corecore