Search CORE

80 research outputs found

Stochastic models and graph theory for Zipf's law

Author: Di Natale Anna
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 14/12/2018
Field of study

In questo elaborato ci siamo occupati della legge di Zipf sia da un punto di vista applicativo che teorico. Tale legge empirica afferma che il rango in frequenza (RF) delle parole di un testo seguono una legge a potenza con esponente -1. Per quanto riguarda l'approccio teorico abbiamo trattato due classi di modelli in grado di ricreare leggi a potenza nella loro distribuzione di probabilità. In particolare, abbiamo considerato delle generalizzazioni delle urne di Polya e i processi SSR (Sample Space Reducing). Di questi ultimi abbiamo dato una formalizzazione in termini di catene di Markov. Infine abbiamo proposto un modello di dinamica delle popolazioni capace di unificare e riprodurre i risultati dei tre SSR presenti in letteratura. Successivamente siamo passati all'analisi quantitativa dell'andamento del RF sulle parole di un corpus di testi. Infatti in questo caso si osserva che la RF non segue una pura legge a potenza ma ha un duplice andamento che può essere rappresentato da una legge a potenza che cambia esponente. Abbiamo cercato di capire se fosse possibile legare l'analisi dell'andamento del RF con le proprietà topologiche di un grafo. In particolare, a partire da un corpus di testi abbiamo costruito una rete di adiacenza dove ogni parola era collegata tramite un link alla parola successiva. Svolgendo un'analisi topologica della struttura del grafo abbiamo trovato alcuni risultati che sembrano confermare l'ipotesi che la sua struttura sia legata al cambiamento di pendenza della RF. Questo risultato può portare ad alcuni sviluppi nell'ambito dello studio del linguaggio e della mente umana. Inoltre, siccome la struttura del grafo presenterebbe alcune componenti che raggruppano parole in base al loro significato, un approfondimento di questo studio potrebbe condurre ad alcuni sviluppi nell'ambito della comprensione automatica del testo (text mining)

AMS Tesi di Laurea

Letter counting: a stem cell for Cryptology, Quantitative Linguistics, and Statistics

Author: Ycart Bernard
Publication venue
Publication date: 29/11/2012
Field of study

Counting letters in written texts is a very ancient practice. It has accompanied the development of Cryptology, Quantitative Linguistics, and Statistics. In Cryptology, counting frequencies of the different characters in an encrypted message is the basis of the so called frequency analysis method. In Quantitative Linguistics, the proportion of vowels to consonants in different languages was studied long before authorship attribution. In Statistics, the alternation vowel-consonants was the only example that Markov ever gave of his theory of chained events. A short history of letter counting is presented. The three domains, Cryptology, Quantitative Linguistics, and Statistics, are then examined, focusing on the interactions with the other two fields through letter counting. As a conclusion, the eclectism of past centuries scholars, their background in humanities, and their familiarity with cryptograms, are identified as contributing factors to the mutual enrichment process which is described here

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Exploring the law of text geographic information

Author: Ren Ming
Wang Zhenhua
Xu Guang
Zhang Daiyu
Publication venue
Publication date: 31/08/2023
Field of study

Textual geographic information is indispensable and heavily relied upon in practical applications. The absence of clear distribution poses challenges in effectively harnessing geographic information, thereby driving our quest for exploration. We contend that geographic information is influenced by human behavior, cognition, expression, and thought processes, and given our intuitive understanding of natural systems, we hypothesize its conformity to the Gamma distribution. Through rigorous experiments on a diverse range of 24 datasets encompassing different languages and types, we have substantiated this hypothesis, unearthing the underlying regularities governing the dimensions of quantity, length, and distance in geographic information. Furthermore, theoretical analyses and comparisons with Gaussian distributions and Zipf's law have refuted the contingency of these laws. Significantly, we have estimated the upper bounds of human utilization of geographic information, pointing towards the existence of uncharted territories. Also, we provide guidance in geographic information extraction. Hope we peer its true countenance uncovering the veil of geographic information.Comment: IP

arXiv.org e-Print Archive

Going to great lengths in the pursuit of luxury:how longer brand names can enhance the luxury perception of a brand

Author: Aaker D. A.
Aaker D. A.
Adi‐Bensaid L.
Alleres D.
Berman R. A.
Coltheart M.
Crystal D.
Danesi M.
Gitt W.
Jurafsky D.
Jurafsky D.
Spreen O.
Zipf G. K.
Zipf G. K.
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

Brand names are a crucial part of the brand equity and marketing strategy of any company. Research suggests that companies spend considerable time and money to create suitable names for their brands and products. This paper uses the Zipf's law (or Principle of Least Effort) to analyze the perceived luxuriousness of brand names. One of the most robust laws in linguistics, Zipf's law describes the inverse relationship between a word's length and its frequency i.e., the more frequently a word is used in language, the shorter it tends to be. Zipf's law has been applied to many fields of science and in this paper, we provide evidence for the idea that because polysyllabic words (and brand names) are rare in everyday conversation, they are considered as more complex, distant, and abstract and that the use of longer brand names can enhance the perception of how luxurious a brand is (compared with shorter brand names, which are considered to be close, frequent, and concrete to consumers). Our results suggest that shorter names (mono‐syllabic) are better suited to basic brands whereas longer names (tri‐syllabic or more) are more appropriate for luxury brands

Crossref

University of Dundee Online Publications

NORA - Norwegian Open Research Archives

DR-NTU (Digital Repository of NTU)

Essays on Technology in Presence of Globalization

Author: Madden Joshua
Publication venue: ScholarWorks @ Georgia State University
Publication date: 10/12/2019
Field of study

Technology has long been known to enable globalization in ways previously not thought possible, with instantaneous communication allowing members of organizations all across the globe to communicate and share information with little to no delay. However, as the effects of globalization have become more prominent, they have in turn helped to shape the very technologies that enable these processes. These three essays analyze three examples of how these two processes – globalization and technological development – impact one another. The first looks at a national policy level, attempting to understand how increased possibilities for inside leakers can force governments to consider asylum requests. The second analyzes the issue at the level of corporations, attempting to understand how and why business leaders choose to hire individuals from other countries. The third and final essay analyzes the issue at the most micro level, studying a potential application that could help analyze linguistic factors that have taken a more prominent role in a more globalized society

ScholarWorks @ Georgia State University

The Impacts of Bibliometrics Measurement in the Scientific Community A Statistical Analysis of Multiple Case Studies

Author: Basile Vincenzo
Cozzucoli Paolo Carmelo
Giacalone Massimiliano
Publication venue: 'Canadian Center of Science and Education'
Publication date: 01/01/2022
Field of study

In recent years, statistical methods such as bibliometrics have increasingly intensified to analyse books, articles, and other publications. Bibliometric methods, as techniques to measure the information distribution models, are frequently used in the field of information science and social research. The main purpose of this article is to offer scholars a general framework for the comparison between positive and negative aspects of bibliometrics, on the methods and tools used. Therefore, both the strengths and the critical points will be highlighted, to obtain a complete and detailed overview of the entire argument. In the methodological part, a bibliometric analysis will be applied to various case studies, such as with the Generalized Error Distribution, analysing and commenting on the data, and using the Bibliometrix software. The results suggest that in the future there will be greater consolidation of bibliometrics, as the introduction of increasingly advanced technologies will create new tools and methods characterized by a high degree of automation and speed

Archivio della ricerca - Università degli studi di Napoli Federico II

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"