12 research outputs found

    Zipf's Law and Avoidance of Excessive Synonymy

    Full text link
    Zipf's law states that if words of language are ranked in the order of decreasing frequency in texts, the frequency of a word is inversely proportional to its rank. It is very robust as an experimental observation, but to date it escaped satisfactory theoretical explanation. We suggest that Zipf's law may arise from the evolution of word semantics dominated by expansion of meanings and competition of synonyms.Comment: 47 pages; fixed reference list missing in v.

    The optimality of attaching unlinked labels to unlinked meanings

    Get PDF
    Vocabulary learning by children can be characterized by many biases. When encountering a new word, children as well as adults, are biased towards assuming that it means something totally different from the words that they already know. To the best of our knowledge, the 1st mathematical proof of the optimality of this bias is presented here. First, it is shown that this bias is a particular case of the maximization of mutual information between words and meanings. Second, the optimality is proven within a more general information theoretic framework where mutual information maximization competes with other information theoretic principles. The bias is a prediction from modern information theory. The relationship between information theoretic principles and the principles of contrast and mutual exclusivity is also shown.Peer ReviewedPostprint (published version

    The communicative function of ambiguity in language

    Get PDF
    We present a general information-theoretic argument that all efficient communication systems will be ambiguous, assuming that context is informative about meaning. We also argue that ambiguity allows for greater ease of processing by permitting efficient linguistic units to be re-used. We test predictions of this theory in English, German, and Dutch. Our results and theoretical analysis suggest that ambiguity is a functional property of language that allows for greater communicative efficiency. This provides theoretical and empirical arguments against recent suggestions that core features of linguistic systems are not designed for communication.National Science Foundation (U.S.) (Grant 0844472

    A Hierarchical Core Reference Ontology for New Technology Insertion Design in Long Life Cycle, Complex Mission Critical Systems

    Get PDF
    Organizations, including government, commercial and others, face numerous challenges in maintaining and upgrading long life-cycle, complex, mission critical systems. Maintaining and upgrading these systems requires the insertion and integration of new technology to avoid obsolescence of hardware software, and human skills, to improve performance, to maintain and improve security, and to extend useful life. This is particularly true of information technology (IT) intensive systems. The lack of a coherent body of knowledge to organize new technology insertion theory and practice is a significant contributor to this difficulty. This research organized the existing design, technology road mapping, obsolescence, and sustainability literature into an ontology of theory and application as the foundation for a technology design and technology insertion design hierarchical core reference ontology and laid the foundation for body of knowledge that better integrates the new technology insertion problem into the technology design architecture

    A corpus-driven discourse analysis of transcripts of Hugo Chávez’s television programme ‘Aló Presidente’

    Get PDF
    This study proposes a methodology that combines techniques from corpus linguistics with theory from the Discourse-Historical Approach (DHA) to Critical Discourse Analysis (CDA). The methodology is demonstrated using a corpus comprising transcripts of Hugo Chávez’s television programme, Aló Presidente, broadcast between January 2002 and June 2007. In this thesis, I identify a number of criticisms of CDA and suggest that corpus linguistics can be used to reduce the principle risks: over-/under-interpretation of data and ensuring that the examples used are representative. I then present a methodology designed to minimise these effects, based upon a hypothesis that semantic fields are used more frequently in periods when they are topical, and therefore one can isolate instances which were produced at times of change. I use the Aló Presidente corpus to present a detailed description of three such semantic fields and then adopt the concept of discourse strategies from the DHA to demonstrate how Chávez’s framing of the topics changes with time. This leads to a set of conclusions which seek to answer the research question: How is life in Venezuela framed as having changed under Chávez’s Presidency by reference to his Aló Presidente television programme during the period 2002-2007

    Ambiguity and entropy in the process of translation and post-editing

    Get PDF
    This thesis analyses the way in which ambiguity is cognitively processed, in translation in general and post-editing in particular, drawing inferences from psycholinguistics, bilingualism, and entropy-based models of translation cognition. Conceptually, it assumes non-selective activation of both languages (source and target) in the translation process, and explores how entropy and entropy reduction can theoretically describe assumed mental states during disambiguation. Empirically, it uses a product-based metric of word translation entropy (HTra), and eye-movement and keystroke data from the CRITT Translation Process Research Database, to shed light on how the conceptual understanding of lexical and structural ambiguity may be manifested by observable behaviour. At the lexical level, examination of behavioural data pertaining to a high-HTra item from 217 participants translating/post-editing from English into multiple languages shows that the item tends to result in pauses in production and regression of eye movements, and that the translators’/post-editors’ corresponding scrutinization of the source text (ST) tends to involve a visual search for lower-HTra words in the co-text and, accordingly, a decrease in the average entropy of the activity unit. Regarding syntax, a Chinese relative clause in the machine translation output, which can involve a garden-path effect, is examined in terms of eye movements from 18 participants. Results show that, contrary to monolingual reading, disruptions of processing tend to occur not in the later part of the sentence where the wrong parse is disconfirmed, but in the earlier regions where the most quickly-built analysis is semantically inconsistent with the ST. Structural disambiguation and re-analysis seem to be bypassed. This suggests that, on the one hand, reading for post-editing receives a strong biasing effect from the ST, and on the other, argument integration is more appropriately explained from an incremental processing perspective rather than a head-driven approach, as thematic roles seem to be assigned immediately in reading for post-editing. While the lexical analysis supports a parallel disambiguation model, the structural analysis seems to support a serial one. In terms of translation models, both emphasize the impact of cross-linguistic priming and the presence of considerable horizontality in the translation process
    corecore