8 research outputs found

    Zipfs law holds for phrases, not words

    Get PDF
    With Zipfs law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically that Zipfs law for phrases extends over as many as nine orders of rank magnitude. In doing so, we develop a principled and scalable statistical mechanical method of random text partitioning, which opens up a rich frontier of rigorous text analysis via a rank ordering of mixed length phrases

    Human language reveals a universal positivity bias

    Get PDF
    Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i ) the words of natural human language possess a universal positivity bias, (ii ) the estimated emotional content of words is consistent between languages under translation, and (iii ) this positivity bias is strongly independent of frequency of word use. Alongside these general regularities, we describe interlanguage variations in the emotional spectrum of languages that allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts

    Reply to Garcia et al.: Common mistakes in measuring frequency-dependent word characteristics

    Get PDF
    We demonstrate that the concerns expressed by Garcia et al. are misplaced, due to (1) a misreading of our findings in [1]; (2) a widespread failure to examine and present words in support of asserted summary quantities based on word usage frequencies; and (3) a range of misconceptions about word usage frequency, word rank, and expert-constructed word lists. In particular, we show that the English component of our study compares well statistically with two related surveys, that no survey design influence is apparent, and that estimates of measurement error do not explain the positivity biases reported in our work and that of others. We further demonstrate that for the frequency dependence of positivity---of which we explored the nuances in great detail in [1]---Garcia et al. did not perform a reanalysis of our data---they instead carried out an analysis of a different, statistically improper data set and introduced a nonlinearity before performing linear regression.Comment: 5 pages, 2 figures, 1 table. Expanded version of reply appearing in PNAS 201

    Quantifying the effects of residential segregation on individual mobility

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, School of Engineering, Center for Computational Engineering, Computation for Design and Optimization Program, 2015.Title as it appears in MIT Commencement Exercises program, June 5, 2015: Quantifying the effects of residential segregation on individual mobility Cataloged from PDF version of thesis.Includes bibliographical references (pages 60-64).More than half of today's world population lives in cities and that fraction is steadily growing. Models that accurately capture all segments of the population are necessary in order to design effective policies and new technologies to ensure efficient and stable operations of cities. The current sociology literature has a rich foundation in characterizing the demographics of static population distributions, however, these characterizations fail to account for the reality of dynamic movement. Though there has been recent work in developing models of human mobility, they in turn do not capture demographic differences in the populations of cities. In this work we present a computational approach to reformulating segregation metrics to incorporate dynamic movement patterns and also quantify the effects of introducing demographics into a mobility model. In coupling two fields that are inherently connected but not established as so, we must very carefully consider our experimental set up. The first part of this work deals with understanding our data and its limitations at fine granularities and explicitly measuring segregation metrics at various scales to design a study that will elucidate meaningful aspects of segregation. In the second part of this work we reformulate traditional segregation metrics using topological properties of origin destination networks as input. These measures are flexible in considering many locations that individuals visit and therefore more accurately capture the environments of individuals that traditional segregation literature seeks to characterize. We utilize two rank-based mobility models that implicitly incorporate geographic properties of population distributions to understand the effects of residential segregation on mobility patterns and examine the effect of demographic considerations on model accuracy. In summary, this thesis will focus on synthesizing the rich body of work on static characterizations of socioeconomic structure in cities with dynamic models to better understand different racial segmentations of Boston's population. This work is both an extension to static segregation literature as well as a refinement of current mobility models.by Suma Desu.S.M

    Zipf's law holds for phrases, not words

    No full text
    With Zipf's law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically that Zipf's law for phrases extends over as many as nine orders of rank magnitude. In doing so, we develop a principled and scalable statistical mechanical method of random text partitioning, which opens up a rich frontier of rigorous text analysis via a rank ordering of mixed length phrases. Over the last century, the elements of many disparate systems have been found to approximately follow Zipf 's law-that element size is inversely proportional to element size rank 1,2 -from city populations 2-
    corecore