Search CORE

61,549 research outputs found

On the Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts

Author: Dębowski Łukasz
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/02/2011
Field of study

The article presents a new interpretation for Zipf-Mandelbrot's law in natural language which rests on two areas of information theory. Firstly, we construct a new class of grammar-based codes and, secondly, we investigate properties of strongly nonergodic stationary processes. The motivation for the joint discussion is to prove a proposition with a simple informal statement: If a text of length

n

describes

n^\beta

independent facts in a repetitive way then the text contains at least

n^\beta/\log n

different words, under suitable conditions on

n

. In the formal statement, two modeling postulates are adopted. Firstly, the words are understood as nonterminal symbols of the shortest grammar-based encoding of the text. Secondly, the text is assumed to be emitted by a finite-energy strongly nonergodic source whereas the facts are binary IID variables predictable in a shift-invariant way.Comment: 24 pages, no figure

arXiv.org e-Print Archive

Crossref

Universal correlations and power-law tails in financial covariance matrices

Author: Abul-Magd
Abul-Magd
Abul-Magd
Akemann
Akemann
Akemann
Ambjørn
Ambjørn
Bertuola
Biely
Biroli
Bohigas
Burda
Burda
Burda
Burda
de Carvalho
Dietz
Drożdż
G. Akemann
Galluccio
Guhr
Guhr
J. Fischmann
Jalan
Johansson
Johnstone
Kwapień
Laloux
Majumdar
Malevergne
Marčenko
Mehta
Müller
P. Vivo
Plerou
Plerou
Politi
Shuryak
Sornette
Telatar
Toscano
Tracy
Tracy
Utsugi
Verbaarschot
Vivo
Wishart
Publication venue: 'Elsevier BV'
Publication date: 29/06/2009
Field of study

Signatures of universality are detected by comparing individual eigenvalue distributions and level spacings from financial covariance matrices to random matrix predictions. A chopping procedure is devised in order to produce a statistical ensemble of asset-price covariances from a single instance of financial data sets. Local results for the smallest eigenvalue and individual spacings are very stable upon reshuffling the time windows and assets. They are in good agreement with the universal Tracy-Widom distribution and Wigner surmise, respectively. This suggests a strong degree of robustness especially in the low-lying sector of the spectra, most relevant for portfolio selections. Conversely, the global spectral density of a single covariance matrix as well as the average over all unfolded nearest-neighbour spacing distributions deviate from standard Gaussian random matrix predictions. The data are in fair agreement with a recently introduced generalised random matrix model, with correlations showing a power-law decay

arXiv.org e-Print Archive

Crossref

Publications at Bielefeld University

Brunel University Research Archive

Detecting Repetitions and Periodicities in Proteins by Tiling the Structural Space

Author: Espada Rocío
Ferreiro Diego U.
Parra R. Gonzalo
Sippl Manfred J.
Sánchez Ignacio E.
Publication venue
Publication date: 01/01/2013
Field of study

The notion of energy landscapes provides conceptual tools for understanding the complexities of protein folding and function. Energy Landscape Theory indicates that it is much easier to find sequences that satisfy the "Principle of Minimal Frustration" when the folded structure is symmetric (Wolynes, P. G. Symmetry and the Energy Landscapes of Biomolecules. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 14249-14255). Similarly, repeats and structural mosaics may be fundamentally related to landscapes with multiple embedded funnels. Here we present analytical tools to detect and compare structural repetitions in protein molecules. By an exhaustive analysis of the distribution of structural repeats using a robust metric we define those portions of a protein molecule that best describe the overall structure as a tessellation of basic units. The patterns produced by such tessellations provide intuitive representations of the repeating regions and their association towards higher order arrangements. We find that some protein architectures can be described as nearly periodic, while in others clear separations between repetitions exist. Since the method is independent of amino acid sequence information we can identify structural units that can be encoded by a variety of distinct amino acid sequences

arXiv.org e-Print Archive

CiteSeerX

Understanding Zipf's law of word frequencies through sample-space collapse in sentence formation

Author: Corominas-Murtra Bernat
Hanel Rudolf
Liu Bo
Thurner Stefan
Publication venue
Publication date: 27/05/2015
Field of study

The formation of sentences is a highly structured and history-dependent process. The probability of using a specific word in a sentence strongly depends on the 'history' of word-usage earlier in that sentence. We study a simple history-dependent model of text generation assuming that the sample-space of word usage reduces along sentence formation, on average. We first show that the model explains the approximate Zipf law found in word frequencies as a direct consequence of sample-space reduction. We then empirically quantify the amount of sample-space reduction in the sentences of ten famous English books, by analysis of corresponding word-transition tables that capture which words can follow any given word in a text. We find a highly nested structure in these transition tables and show that this `nestedness' is tightly related to the power law exponents of the observed word frequency distributions. With the proposed model it is possible to understand that the nestedness of a text can be the origin of the actual scaling exponent, and that deviations from the exact Zipf law can be understood by variations of the degree of nestedness on a book-by-book basis. On a theoretical level we are able to show that in case of weak nesting, Zipf's law breaks down in a fast transition. Unlike previous attempts to understand Zipf's law in language the sample-space reducing model is not based on assumptions of multiplicative, preferential, or self-organised critical mechanisms behind language formation, but simply used the empirically quantifiable parameter 'nestedness' to understand the statistics of word frequencies.Comment: 7 pages, 4 figures. Accepted for publication in the Journal of the Royal Society Interfac

arXiv.org e-Print Archive

Crossref

PubMed Central

International Institute for Applied Systems Analysis (IIASA)

Collective Phenomena and Non-Finite State Computation in a Human Social System

Author: DeDeo Simon
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 19/09/2013
Field of study

We investigate the computational structure of a paradigmatic example of distributed social interaction: that of the open-source Wikipedia community. We examine the statistical properties of its cooperative behavior, and perform model selection to determine whether this aspect of the system can be described by a finite-state process, or whether reference to an effectively unbounded resource allows for a more parsimonious description. We find strong evidence, in a majority of the most-edited pages, in favor of a collective-state model, where the probability of a "revert" action declines as the square root of the number of non-revert actions seen since the last revert. We provide evidence that the emergence of this social counter is driven by collective interaction effects, rather than properties of individual users.Comment: 23 pages, 4 figures, 3 tables; to appear in PLoS ON

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

FigShare

Astrometric calibration and performance of the Dark Energy Camera

We characterize the ability of the Dark Energy Camera (DECam) to perform relative astrometry across its 500~Mpix, 3 deg^2 science field of view, and across 4 years of operation. This is done using internal comparisons of ~4x10^7 measurements of high-S/N stellar images obtained in repeat visits to fields of moderate stellar density, with the telescope dithered to move the sources around the array. An empirical astrometric model includes terms for: optical distortions; stray electric fields in the CCD detectors; chromatic terms in the instrumental and atmospheric optics; shifts in CCD relative positions of up to ~10 um when the DECam temperature cycles; and low-order distortions to each exposure from changes in atmospheric refraction and telescope alignment. Errors in this astrometric model are dominated by stochastic variations with typical amplitudes of 10-30 mas (in a 30 s exposure) and 5-10 arcmin coherence length, plausibly attributed to Kolmogorov-spectrum atmospheric turbulence. The size of these atmospheric distortions is not closely related to the seeing. Given an astrometric reference catalog at density ~0.7 arcmin^{-2}, e.g. from Gaia, the typical atmospheric distortions can be interpolated to 7 mas RMS accuracy (for 30 s exposures) with 1 arcmin coherence length for residual errors. Remaining detectable error contributors are 2-4 mas RMS from unmodelled stray electric fields in the devices, and another 2-4 mas RMS from focal plane shifts between camera thermal cycles. Thus the astrometric solution for a single DECam exposure is accurate to 3-6 mas (0.02 pixels, or 300 nm) on the focal plane, plus the stochastic atmospheric distortion.Comment: Submitted to PAS

arXiv.org e-Print Archive

HAL-INSU

UCL Discovery