Search CORE

1,824 research outputs found

Scaling laws and fluctuations in the statistics of word frequencies

Author: Altmann Eduardo G.
Gerlach Martin
Publication venue: 'IOP Publishing'
Publication date: 04/11/2014
Field of study

In this paper we combine statistical analysis of large text databases and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies. Besides the sublinear scaling of the vocabulary size with database size (Heaps' law), here we report a new scaling of the fluctuations around this average (fluctuation scaling analysis). We explain both scaling laws by modeling the usage of words by simple stochastic processes in which the overall distribution of word-frequencies is fat tailed (Zipf's law) and the frequency of a single word is subject to fluctuations across documents (as in topic models). In this framework, the mean and the variance of the vocabulary size can be expressed as quenched averages, implying that: i) the inhomogeneous dissemination of words cause a reduction of the average vocabulary size in comparison to the homogeneous case, and ii) correlations in the co-occurrence of words lead to an increase in the variance and the vocabulary size becomes a non-self-averaging quantity. We address the implications of these observations to the measurement of lexical richness. We test our results in three large text databases (Google-ngram, Enlgish Wikipedia, and a collection of scientific articles).Comment: 19 pages, 4 figure

arXiv.org e-Print Archive

MPG.PuRe

Stochastic model for the vocabulary growth in natural languages

Author: Altmann Eduardo G.
Gerlach Martin
Publication venue: 'American Physical Society (APS)'
Publication date: 04/04/2013
Field of study

We propose a stochastic model for the number of different words in a given database which incorporates the dependence on the database size and historical changes. The main feature of our model is the existence of two different classes of words: (i) a finite number of core-words which have higher frequency and do not affect the probability of a new word to be used; and (ii) the remaining virtually infinite number of noncore-words which have lower frequency and once used reduce the probability of a new word to be used in the future. Our model relies on a careful analysis of the google-ngram database of books published in the last centuries and its main consequence is the generalization of Zipf's and Heaps' law to two scaling regimes. We confirm that these generalizations yield the best simple description of the data among generic descriptive models and that the two free parameters depend only on the language but not on the database. From the point of view of our model the main change on historical time scales is the composition of the specific words included in the finite list of core-words, which we observe to decay exponentially in time with a rate of approximately 30 words per year for English.Comment: corrected typos and errors in reference list; 10 pages text, 15 pages supplemental material; to appear in Physical Review

arXiv.org e-Print Archive

Directory of Open Access Journals

MPG.PuRe

Proper identification of RR Lyrae Stars brighter than 12.5 mag

Author: Altmann
Altmann
Altmann
Beers
Dambis
de Boer
Fernley
G. Maintz
Layden
Layden
Maintz
Martin
Mennessier
Schmidt
Publication venue: 'EDP Sciences'
Publication date: 26/07/2005
Field of study

RR Lyrae stars are of great importance for investigations of Galactic structure. However, a complete compendium of all RR-Lyraes in the solar neighbourhood with accurate classifications and coordinates does not exist to this day. Here we present a catalogue of 561 local RR-Lyrae stars V_max less equal 12.5 mag according to the magnitudes given in the Combined General Catalogue of Variable Stars (GCVS) and 16 fainter ones. The Tycho2 catalogue contains about 100 RR Lyr stars. However, many objects have inaccurate coordinates in the GCVS, the primary source of variable star information, so that a reliable cross-identification is difficult. We identified RR Lyrae from both catalogues based on an intensive literature search. In dubious cases we carried out photometry of fields to identify the variable. Mennessier and Colome (2002) have published a paper with Tyc2-GCVS identifications, but we found that many of their identifications are wrong. Keywords: astrometry -- Stars: RR Lyrae stars -- Catalogues: Tycho-2 catalogue -- Catalogues: The HST Guide Star Catalogue, Version 1.2 -- Catalogues: Combined General Catalogue of Variable StarsComment: 5 pages with 2 figures; A and A accepted Online-Data are available under http://www.astro.uni-bonn.de/~gmaint

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server

Searching for merger debris in the Galactic halo: Chemodynamical evidence based on local blue HB stars

Author: Altmann Martin
Catelan Marcio
Zoccali Manuela
Publication venue: 'EDP Sciences'
Publication date: 01/07/2005
Field of study

We report on the discovery of a group of local A-type blue horizontal-branch (HBA) stars moving in a prograde, comet-like orbit with very similar kinematics and abundances. This serendipitously discovered group contains 5 or 6 local HBA stars venturing very close to the Galactic centre; their [Fe/H] is around -1.7, and they seem to present minimum scatter in at least Mg, Si, Ti, Fe, Al, and Cr abundances. This ``Cometary Orbit Group'' (COG) was found while we were testing a new method to detect the debris associated with the merger of smaller, specific protogalactic entities into our galaxy. The method is primarily intended to identify field HBA stars with similar kinematics and detailed, multi-species abundance patterns as seen among members of a surviving remnant (e.g., omega Centauri). Quite possibly, the COG is the remnant, on a highly decayed orbit, of a merging event that took place in the relatively remote past (i.e., at least one revolution ago).Comment: 4 pages and 2 EPS figures, accepted for publication in Astronomy and Astrophysics Letter

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server

Exceptional sequences of 8 line bundles on (P1)3

Author: Altmann Klaus
Altmann Martin
Publication venue
Publication date: 30/12/2021
Field of study

We investigate maximal exceptional sequences of line bundles on (P1)r, i.e., those consisting of 2r elements. For r=3 we show that they are always full, meaning that they generate the derived category. Everything is done in the discrete setup: Exceptional sequences of line bundles appear as special finite subsets s of the Picard group Zr of (P1)r, and the question of generation is understood like a process of contamination of the whole Zr out of an infectious seed s

arXiv.org e-Print Archive

Institutional Repository of the Freie Universität Berlin