Search CORE

2,175 research outputs found

Recommended from our members

An Entropy-based Assessment of the Unicode Encoding for Tibetan

Author: Hackett Paul G.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2003
Field of study

This paper presents an analysis of the Unicode encoding scheme for Tibetan from the standpoint of morpheme entropy. We can speak of two levels of entropy in Tibetan: syllable entropy (a measure of the probability of the sequential occurrence of syllables), and morpheme entropy (a measure of the probability of the sequential occurrence of characters or morphemes), the latter being a measure of the redundancy of the language. Syllable entropy is a purely statistical calculation that is a function of the domain of the literature sampled, while morpheme entropy, we show, is relatively domain independent given a statistically significant sample. Morpheme entropy can be calculated statistically, though a theoretical upper bound can also be postulated based on language dependent morphology rules. This paper presents both theoretical and statistical estimates of the morpheme entropy for Tibetan, and explores the Tibetan Unicode encoding scheme in relation to data compression, and other issues analyzed in light of entropy-based language modeling

Columbia University Academic Commons

Recommended from our members

An Entropy-based Assessment of the Unicode Encoding for Tibetan

Author: Hackett Paul G.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2003
Field of study

Columbia University Academic Commons

Research on Reasoning and Modeling of Solving Mathematics Situation Word Problems of Primary Schools

Author: Yeh Yao-Ming
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date
Field of study

[[abstract]]This research developed a web-based reasoning of mathematical situation word problems using the natural language processing technology. Our system provided the steps of morphological analysis, syntax analysis, semantic analysis and rule judgment to infer the semantic structure and operational structure of situation word problems. It also adopted the language of MathML and SVG to provide the web-based illustration of solving procedure in mathematical situation word problems. Keywords: situation word problem; natural language processing; MathML; SVG

National Taiwan Normal University Repository

Probabilistic Modelling of Morphologically Rich Languages

Author: Botha Jan A.
Publication venue
Publication date: 01/01/2014
Field of study

This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often rely on the simplistic assumption that words are opaque symbols. This assumption does not fit morphologically complex language well, where words can have rich internal structure and sub-word elements are shared across distinct word forms. Our approach is to encode basic notions of morphology into the assumptions of three different types of language models, with the intention that leveraging shared sub-word structure can improve model performance and help overcome data sparsity that arises from morphological processes. In the context of n-gram language modelling, we formulate a new Bayesian model that relies on the decomposition of compound words to attain better smoothing, and we develop a new distributed language model that learns vector representations of morphemes and leverages them to link together morphologically related words. In both cases, we show that accounting for word sub-structure improves the models' intrinsic performance and provides benefits when applied to other tasks, including machine translation. We then shift the focus beyond the modelling of word sequences and consider models that automatically learn what the sub-word elements of a given language are, given an unannotated list of words. We formulate a novel model that can learn discontiguous morphemes in addition to the more conventional contiguous morphemes that most previous models are limited to. This approach is demonstrated on Semitic languages, and we find that modelling discontiguous sub-word structures leads to improvements in the task of segmenting words into their contiguous morphemes.Comment: DPhil thesis, University of Oxford, submitted and accepted 2014. http://ora.ox.ac.uk/objects/uuid:8df7324f-d3b8-47a1-8b0b-3a6feb5f45c

arXiv.org e-Print Archive

Oxford University Research Archive

Research in Semantical Development

Author: Eaton Allen
Publication venue: Scholarly Commons @ Ouachita
Publication date: 01/01/1969
Field of study

Ouachita Baptist University

Computational morphology and Bantu language learning:an implementation for Runyakitara

Author: Katushemererwe Fridah
Publication venue: s.n.
Publication date: 01/01/2013
Field of study

ARTS repository - University of Groningen

Introduction to Linguistics for English Language Teaching

Author: Rahmat Yusny
Risdaneva Risdaneva
Saiful Akmal 2001038202:
Publication venue: Padébooks
Publication date: 01/01/2017
Field of study

We envisaged this book as a main reference for English language teachers. Like many may have thought that this book laid out in both theory and practical terms why English language teachers should study linguistics for their future professional teaching career. This book lays out in theoretical terms why many of our most common views about the study on linguistics are fundamentally important. This book equips the theoretical importance with practical assignments and authentic tasks. These are the times that try language teacher’s souls on linguistics, and, for that reason, this book advocates its own petite contribution in knowledge-development

The Missing Link between Morphemic Assemblies and Behavioral Responses:a Bayesian Information-Theoretical model of lexical processing

Author: Filipovic-Djurdjevic Dusica
Kostic Prof Aleksandar
Moscoso del Prado Martin Dr Fermin
Publication venue
Publication date: 01/02/2006
Field of study

We present the Bayesian Information-Theoretical (BIT) model of lexical processing: A mathematical model illustrating a novel approach to the modelling of language processes. The model shows how a neurophysiological theory of lexical processing relying on Hebbian association and neural assemblies can directly account for a variety of effects previously observed in behavioural experiments. We develop two information-theoretical measures of the distribution of usages of a morpheme or word, and use them to predict responses in three visual lexical decision datasets investigating inflectional morphology and polysemy. Our model offers a neurophysiological basis for the effects of morpho-semantic neighbourhoods. These results demonstrate how distributed patterns of activation naturally result in the arisal of symbolic structures. We conclude by arguing that the modelling framework exemplified here, is a powerful tool for integrating behavioural and neurophysiological results

CogPrints Cognitive Sciences Eprint Archive

Acta Cybernetica : Volume 16. Number 4.

Author
Publication venue
Publication date: 01/01/2004
Field of study

University of Szeged