Search CORE

4 research outputs found

Approximate Entropy in Canonical and Non-Canonical Fiction

Author: Gast Volker
Mohseni Mahdi
Redies Christoph
Publication venue: 'MDPI AG'
Publication date: 01/02/2022
Field of study

: Computational textual aesthetics aims at studying observable differences between aesthetic categories of text. We use Approximate Entropy to measure the (un)predictability in two aesthetic text categories, i.e., canonical fiction (‘classics’) and non-canonical fiction (with lower prestige). Approximate Entropy is determined for series derived from sentence-length values and the distribution of part-of-speech-tags in windows of texts. For comparison, we also include a sample of non-fictional texts. Moreover, we use Shannon Entropy to estimate degrees of (un)predictability due to frequency distributions in the entire text. Our results show that the Approximate Entropy values can better differentiate canonical from non-canonical texts compared with Shannon Entropy, which is not true for the classification of fictional vs. expository prose. Canonical and non-canonical texts thus differ in sequential structure, while inter-genre differences are a matter of the overall distribution of local frequencies. We conclude that canonical fictional texts exhibit a higher degree of (sequential) unpredictability compared with non-canonical texts, corresponding to the popular assumption that they are more ‘demanding’ and ‘richer’. In using Approximate Entropy, we propose a new method for text classification in the context of computational textual aesthetics

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

Digitale Bibliothek Thüringen

The entropy of words-learnability and expressivity across more than 1000 languages

Author: Alikaniotis Dimitrios
Bentz Chris
Cysouw Michael
Ferrer Cancho Ramon
Publication venue: 'MDPI AG'
Publication date: 01/01/2017
Field of study

The choice associated with words is a fundamental property of natural languages. It lies at the heart of quantitative linguistics, computational linguistics and language sciences more generally. Information theory gives us tools at hand to measure precisely the average amount of choice associated with words: the word entropy. Here, we use three parallel corpora, encompassing ca. 450 million words in 1916 texts and 1259 languages, to tackle some of the major conceptual and practical problems of word entropy estimation: dependence on text size, register, style and estimation method, as well as non-independence of words in co-text. We present two main findings: Firstly, word entropies display relatively narrow, unimodal distributions. There is no language in our sample with a unigram entropy of less than six bits/word. We argue that this is in line with information-theoretic models of communication. Languages are held in a narrow range by two fundamental pressures: word learnability and word expressivity, with a potential bias towards expressivity. Secondly, there is a strong linear relationship between unigram entropies and entropy rates. The entropy difference between words with and without co-textual information is narrowly distributed around ca. three bits/word. In other words, knowing the preceding text reduces the uncertainty of words by roughly the same amount across languages of the world.Peer ReviewedPostprint (published version

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Directory of Open Access Journals

Complexity and Universality in the Long-Range Order of Words

Author: A Kershenbaum
A Lempel
A Lesne
A Puglisi
A Schenkel
AD Wyner
CE Shannon
DH Zanette
E Alvarez-Lacalle
EG Altmann
EG Altmann
GK Zipf
HS Heaps
I Kontoyiannis
J Maynard Smith
J Ziv
J Ziv
JA Hawkins
JA Hawkins
JH Greenberg
JP Herrera
K Frisch Von
K Ouattara
KW Church
M Dunn
M Ortuño
MA Montemurro
MA Montemurro
MA Montemurro
MA Montemurro
MA Montemurro
MA Montemurro
MA Montemurro
MA Montemurro
N Chomsky
N Evans
SP Harter
T Schurmann
TM Cover
TM Cover
TW Deacon
W Ebeling
Y Gao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/03/2015
Field of study

As is the case of many signals produced by complex systems, language presents a statistical structure that is balanced between order and disorder. Here we review and extend recent results from quantitative characterisations of the degree of order in linguistic sequences that give insights into two relevant aspects of language: the presence of statistical universals in word ordering, and the link between semantic information and the statistical linguistic structure. We first analyse a measure of relative entropy that assesses how much the ordering of words contributes to the overall statistical structure of language. This measure presents an almost constant value close to 3.5 bits/word across several linguistic families. Then, we show that a direct application of information theory leads to an entropy measure that can quantify and extract semantic structures from linguistic samples, even without prior knowledge of the underlying language.Comment: 8 pages, 6 figure

arXiv.org e-Print Archive

Crossref

CONICET Digital

Open Research Online (The Open University)

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

The University of Manchester - Institutional Repository