1,824 research outputs found
Scaling laws and fluctuations in the statistics of word frequencies
In this paper we combine statistical analysis of large text databases and
simple stochastic models to explain the appearance of scaling laws in the
statistics of word frequencies. Besides the sublinear scaling of the vocabulary
size with database size (Heaps' law), here we report a new scaling of the
fluctuations around this average (fluctuation scaling analysis). We explain
both scaling laws by modeling the usage of words by simple stochastic processes
in which the overall distribution of word-frequencies is fat tailed (Zipf's
law) and the frequency of a single word is subject to fluctuations across
documents (as in topic models). In this framework, the mean and the variance of
the vocabulary size can be expressed as quenched averages, implying that: i)
the inhomogeneous dissemination of words cause a reduction of the average
vocabulary size in comparison to the homogeneous case, and ii) correlations in
the co-occurrence of words lead to an increase in the variance and the
vocabulary size becomes a non-self-averaging quantity. We address the
implications of these observations to the measurement of lexical richness. We
test our results in three large text databases (Google-ngram, Enlgish
Wikipedia, and a collection of scientific articles).Comment: 19 pages, 4 figure
Stochastic model for the vocabulary growth in natural languages
We propose a stochastic model for the number of different words in a given
database which incorporates the dependence on the database size and historical
changes. The main feature of our model is the existence of two different
classes of words: (i) a finite number of core-words which have higher frequency
and do not affect the probability of a new word to be used; and (ii) the
remaining virtually infinite number of noncore-words which have lower frequency
and once used reduce the probability of a new word to be used in the future.
Our model relies on a careful analysis of the google-ngram database of books
published in the last centuries and its main consequence is the generalization
of Zipf's and Heaps' law to two scaling regimes. We confirm that these
generalizations yield the best simple description of the data among generic
descriptive models and that the two free parameters depend only on the language
but not on the database. From the point of view of our model the main change on
historical time scales is the composition of the specific words included in the
finite list of core-words, which we observe to decay exponentially in time with
a rate of approximately 30 words per year for English.Comment: corrected typos and errors in reference list; 10 pages text, 15 pages
supplemental material; to appear in Physical Review
Proper identification of RR Lyrae Stars brighter than 12.5 mag
RR Lyrae stars are of great importance for investigations of Galactic
structure. However, a complete compendium of all RR-Lyraes in the solar
neighbourhood with accurate classifications and coordinates does not exist to
this day. Here we present a catalogue of 561 local RR-Lyrae stars V_max less
equal 12.5 mag according to the magnitudes given in the Combined General
Catalogue of Variable Stars (GCVS) and 16 fainter ones. The Tycho2 catalogue
contains about 100 RR Lyr stars. However, many objects have inaccurate
coordinates in the GCVS, the primary source of variable star information, so
that a reliable cross-identification is difficult. We identified RR Lyrae from
both catalogues based on an intensive literature search. In dubious cases we
carried out photometry of fields to identify the variable. Mennessier and
Colome (2002) have published a paper with Tyc2-GCVS identifications, but we
found that many of their identifications are wrong.
Keywords: astrometry -- Stars: RR Lyrae stars -- Catalogues: Tycho-2
catalogue -- Catalogues: The HST Guide Star Catalogue, Version 1.2 --
Catalogues: Combined General Catalogue of Variable StarsComment: 5 pages with 2 figures; A and A accepted Online-Data are available
under http://www.astro.uni-bonn.de/~gmaint
Searching for merger debris in the Galactic halo: Chemodynamical evidence based on local blue HB stars
We report on the discovery of a group of local A-type blue horizontal-branch
(HBA) stars moving in a prograde, comet-like orbit with very similar kinematics
and abundances. This serendipitously discovered group contains 5 or 6 local HBA
stars venturing very close to the Galactic centre; their [Fe/H] is around -1.7,
and they seem to present minimum scatter in at least Mg, Si, Ti, Fe, Al, and Cr
abundances. This ``Cometary Orbit Group'' (COG) was found while we were testing
a new method to detect the debris associated with the merger of smaller,
specific protogalactic entities into our galaxy. The method is primarily
intended to identify field HBA stars with similar kinematics and detailed,
multi-species abundance patterns as seen among members of a surviving remnant
(e.g., omega Centauri). Quite possibly, the COG is the remnant, on a highly
decayed orbit, of a merging event that took place in the relatively remote past
(i.e., at least one revolution ago).Comment: 4 pages and 2 EPS figures, accepted for publication in Astronomy and
Astrophysics Letter
Exceptional sequences of 8 line bundles on (P1)3
We investigate maximal exceptional sequences of line bundles on (P1)r, i.e., those consisting of 2r elements. For r=3 we show that they are always full, meaning that they generate the derived category. Everything is done in the discrete setup: Exceptional sequences of line bundles appear as special finite subsets s of the Picard group Zr of (P1)r, and the question of generation is understood like a process of contamination of the whole Zr out of an infectious seed s
- …