269 research outputs found
Beyond the Zipf-Mandelbrot law in quantitative linguistics
In this paper the Zipf-Mandelbrot law is revisited in the context of
linguistics. Despite its widespread popularity the Zipf--Mandelbrot law can
only describe the statistical behaviour of a rather restricted fraction of the
total number of words contained in some given corpus. In particular, we focus
our attention on the important deviations that become statistically relevant as
larger corpora are considered and that ultimately could be understood as
salient features of the underlying complex process of language generation.
Finally, it is shown that all the different observed regimes can be accurately
encompassed within a single mathematical framework recently introduced by C.
Tsallis.Comment: 6 pages and 7 figures; minor changes in text, added referece
An Observational Framework to the Zipfian Analysis among Different Languages: Studies to Indonesian Ethnic Biblical Texts
The paper introduces the used of Zipfian statistics to observe the human languages by using the same (meaning) corpus/corpora but different in grammatical and structural utterances. We used biblical texts since they contain corpuses that have been most widely and carefully translated into many languages. The idea is to reduce the possibility of noise came from the meaning of the texts in distinctive language. The result is that the robustness of the Zipfian law is observable and some statistical differences are discovered between English and widely used national and several ethnic languages in Indonesia. The paper ends by modestly propose further possible framework in interdisciplinary approaches to human language evolution
Regimes in Babel are Confirmed: Report on Findings in Several Indonesian Ethnic Biblical Texts
The paper introduces the presence of three statistical regimes in the Zipfian analysis of texts in quantitative linguistics: the Mandelbrot, original Zipf, and Cancho- Solé-Montemurro regimes. The work is carried out over nine different languages of the same intention semantically: the bible from different languages in Indonesian ethnic and national language. As always, the same analysis is also brought in English version of the Bible for reference. The existence of the three regimes are confirmed while in advance the length of the texts are also becomes an important issue. We outline some further works regarding the quantitative analysis for parameterization used to analyze the three regimes and the task to have broad explanation, especially the microstructure of the language in human decision or linguistic effort – emerging the robustness of them
Optimal coding and the origins of Zipfian laws
The problem of compression in standard information theory consists of
assigning codes as short as possible to numbers. Here we consider the problem
of optimal coding -- under an arbitrary coding scheme -- and show that it
predicts Zipf's law of abbreviation, namely a tendency in natural languages for
more frequent words to be shorter. We apply this result to investigate optimal
coding also under so-called non-singular coding, a scheme where unique
segmentation is not warranted but codes stand for a distinct number. Optimal
non-singular coding predicts that the length of a word should grow
approximately as the logarithm of its frequency rank, which is again consistent
with Zipf's law of abbreviation. Optimal non-singular coding in combination
with the maximum entropy principle also predicts Zipf's rank-frequency
distribution. Furthermore, our findings on optimal non-singular coding
challenge common beliefs about random typing. It turns out that random typing
is in fact an optimal coding process, in stark contrast with the common
assumption that it is detached from cost cutting considerations. Finally, we
discuss the implications of optimal coding for the construction of a compact
theory of Zipfian laws and other linguistic laws.Comment: in press in the Journal of Quantitative Linguistics; definition of
concordant pair corrected, proofs polished, references update
Decoding least effort and scaling in signal frequency distributions
Here, assuming a general communication model where objects map to signals, a power function for the distribution of signal frequencies is derived. The model relies on the satisfaction of the receiver (hearer) communicative needs when the entropy of the number of objects per signal is maximized. Evidence of power distributions in a linguistic context (some of them with exponents clearly different from the typical Ăź Ëś 2 of Zipf's law) is reviewed and expanded. We support the view that Zipf's law reflects some sort of optimization but following a novel realistic approach where signals (e.g. words) are used according to the objects (e.g. meanings) they are linked to. Our results strongly suggest that many systems in nature use non-trivial strategies for easing the interpretation of a signal. Interestingly, constraining just the number of interpretations of signals does not lead to scaling.Peer ReviewedPostprint (author's final draft
An Alternative Postulate to see Melody as “Language”
The paper proposes a way to see melodic features in music/songs in the terms of “letters” constituting “words”, while in return investigating the fulfillment of Zipf-Mandelbrot Law in them. Some interesting findings are reported including some possible conjectures for classification of melodic and musical artifacts considering several aspects of culture. The paper ends with some discussions related to further directions, be it enrichment in musicology and the possible plan for musical generative art
- …