5,014 research outputs found

    Statistical Inferences for Polarity Identification in Natural Language

    Full text link
    Information forms the basis for all human behavior, including the ubiquitous decision-making that people constantly perform in their every day lives. It is thus the mission of researchers to understand how humans process information to reach decisions. In order to facilitate this task, this work proposes a novel method of studying the reception of granular expressions in natural language. The approach utilizes LASSO regularization as a statistical tool to extract decisive words from textual content and draw statistical inferences based on the correspondence between the occurrences of words and an exogenous response variable. Accordingly, the method immediately suggests significant implications for social sciences and Information Systems research: everyone can now identify text segments and word choices that are statistically relevant to authors or readers and, based on this knowledge, test hypotheses from behavioral research. We demonstrate the contribution of our method by examining how authors communicate subjective information through narrative materials. This allows us to answer the question of which words to choose when communicating negative information. On the other hand, we show that investors trade not only upon facts in financial disclosures but are distracted by filler words and non-informative language. Practitioners - for example those in the fields of investor communications or marketing - can exploit our insights to enhance their writings based on the true perception of word choice

    More is more in language learning:reconsidering the less-is-more hypothesis

    Get PDF
    The Less-is-More hypothesis was proposed to explain age-of-acquisition effects in first language (L1) acquisition and second language (L2) attainment. We scrutinize different renditions of the hypothesis by examining how learning outcomes are affected by (1) limited cognitive capacity, (2) reduced interference resulting from less prior knowledge, and (3) simplified language input. While there is little-to-no evidence of benefits of limited cognitive capacity, there is ample support for a More-is-More account linking enhanced capacity with better L1- and L2-learning outcomes, and reduced capacity with childhood language disorders. Instead, reduced prior knowledge (relative to adults) may afford children with greater flexibility in inductive inference; this contradicts the idea that children benefit from a more constrained hypothesis space. Finally, studies of childdirected speech (CDS) confirm benefits from less complex input at early stages, but also emphasize how greater lexical and syntactic complexity of the input confers benefits in L1-attainment

    Cross-Lingual Adaptation using Structural Correspondence Learning

    Full text link
    Cross-lingual adaptation, a special case of domain adaptation, refers to the transfer of classification knowledge between two languages. In this article we describe an extension of Structural Correspondence Learning (SCL), a recently proposed algorithm for domain adaptation, for cross-lingual adaptation. The proposed method uses unlabeled documents from both languages, along with a word translation oracle, to induce cross-lingual feature correspondences. From these correspondences a cross-lingual representation is created that enables the transfer of classification knowledge from the source to the target language. The main advantages of this approach over other approaches are its resource efficiency and task specificity. We conduct experiments in the area of cross-language topic and sentiment classification involving English as source language and German, French, and Japanese as target languages. The results show a significant improvement of the proposed method over a machine translation baseline, reducing the relative error due to cross-lingual adaptation by an average of 30% (topic classification) and 59% (sentiment classification). We further report on empirical analyses that reveal insights into the use of unlabeled data, the sensitivity with respect to important hyperparameters, and the nature of the induced cross-lingual correspondences

    Theoretical Interpretations and Applications of Radial Basis Function Networks

    Get PDF
    Medical applications usually used Radial Basis Function Networks just as Artificial Neural Networks. However, RBFNs are Knowledge-Based Networks that can be interpreted in several way: Artificial Neural Networks, Regularization Networks, Support Vector Machines, Wavelet Networks, Fuzzy Controllers, Kernel Estimators, Instanced-Based Learners. A survey of their interpretations and of their corresponding learning algorithms is provided as well as a brief survey on dynamic learning algorithms. RBFNs' interpretations can suggest applications that are particularly interesting in medical domains

    Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation

    Full text link
    The neural language models (NLM) achieve strong generalization capability by learning the dense representation of words and using them to estimate probability distribution function. However, learning the representation of rare words is a challenging problem causing the NLM to produce unreliable probability estimates. To address this problem, we propose a method to enrich representations of rare words in pre-trained NLM and consequently improve its probability estimation performance. The proposed method augments the word embedding matrices of pre-trained NLM while keeping other parameters unchanged. Specifically, our method updates the embedding vectors of rare words using embedding vectors of other semantically and syntactically similar words. To evaluate the proposed method, we enrich the rare street names in the pre-trained NLM and use it to rescore 100-best hypotheses output from the Singapore English speech recognition system. The enriched NLM reduces the word error rate by 6% relative and improves the recognition accuracy of the rare words by 16% absolute as compared to the baseline NLM.Comment: 5 pages, 2 figures, accepted to INTERSPEECH 201

    Balancing generalization and lexical conservatism : an artificial language study with child learners

    Get PDF
    Successful language acquisition involves generalization, but learners must balance this against the acquisition of lexical constraints. Such learning has been considered problematic for theories of acquisition: if learners generalize abstract patterns to new words, how do they learn lexically-based exceptions? One approach claims that learners use distributional statistics to make inferences about when generalization is appropriate, a hypothesis which has recently received support from Artificial Language Learning experiments with adult learners (Wonnacott, Newport, & Tanenhaus, 2008). Since adult and child language learning may be different (Hudson Kam & Newport, 2005), it is essential to extend these results to child learners. In the current work, four groups of children (6 years) were each exposed to one of four semi-artificial languages. The results demonstrate that children are sensitive to linguistic distributions at and above the level of particular lexical items, and that these statistics influence the balance between generalization and lexical conservatism. The data are in line with an approach which models generalization as rational inference and in particular with the predictions of the domain general hierarchical Bayesian model developed in Kemp, Perfors & Tenenbaum, 2006. This suggests that such models have relevance for theories of language acquisition
    • 

    corecore