Search CORE

6 research outputs found

Recommended from our members

An Entropy-based Assessment of the Unicode Encoding for Tibetan

Author: Hackett Paul G.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2003
Field of study

This paper presents an analysis of the Unicode encoding scheme for Tibetan from the standpoint of morpheme entropy. We can speak of two levels of entropy in Tibetan: syllable entropy (a measure of the probability of the sequential occurrence of syllables), and morpheme entropy (a measure of the probability of the sequential occurrence of characters or morphemes), the latter being a measure of the redundancy of the language. Syllable entropy is a purely statistical calculation that is a function of the domain of the literature sampled, while morpheme entropy, we show, is relatively domain independent given a statistically significant sample. Morpheme entropy can be calculated statistically, though a theoretical upper bound can also be postulated based on language dependent morphology rules. This paper presents both theoretical and statistical estimates of the morpheme entropy for Tibetan, and explores the Tibetan Unicode encoding scheme in relation to data compression, and other issues analyzed in light of entropy-based language modeling

Columbia University Academic Commons

Recommended from our members

An Entropy-based Assessment of the Unicode Encoding for Tibetan

Author: Hackett Paul G.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2003
Field of study

Columbia University Academic Commons

Evaluating Parsing Schemes with Entropy Indicators.

Author: Brown S.
Lyon C.
Publication venue: The Association for Mathematics of Language
Publication date: 01/01/1997
Field of study

This paper introduces an objective metric for evaluating a parsing scheme. It is based on Shannon's original work with letter sequences, which can be extended to part-of-speech tag sequences. It is shown that this regular language is an inadequate model for natural language, but a representation is used that models language slightly higher in the Chomsky hierarchy. We show how the entropy of parsed and unparsed sentences can be measured. If the entropy of the parsed sentence is lower, this indicates that some of the structure of the language has been captured. We apply this entropy indicator to support one particular parsing scheme that effects a top down segmentation. This approach could be used to decompose the parsing task into computationally more tractable subtasks. It also lends itself to the extraction of predicate argument structure

arXiv.org e-Print Archive

University of Hertfordshire Research Archive

Proceedings of the Fifth Meeting on Mathematics of Language : MOL5

Author
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1997
Field of study

Proceedings of the Fifth Meeting on Mathematics of Language : MOL5

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/1997
Field of study

Acronym

Evaluating parsing schemes with entropy indicators

Author: Lyon C.
Publication venue: University of Hertfordshire
Publication date: 01/01/1997
Field of study

This paper introduces an objective metric for assessing the effectiveness of a parsing scheme. Information theoretic indicators can be used to show whether a given scheme captures some of the structure of natural language text. We then use this method to support a proposal to decompose the parsing task into computionally more tractable subtasks

CiteSeerX

University of Hertfordshire Research Archive