295 research outputs found
Entropy and Long range correlations in literary English
Recently long range correlations were detected in nucleotide sequences and in
human writings by several authors. We undertake here a systematic investigation
of two books, Moby Dick by H. Melville and Grimm's tales, with respect to the
existence of long range correlations. The analysis is based on the calculation
of entropy like quantities as the mutual information for pairs of letters and
the entropy, the mean uncertainty, per letter. We further estimate the number
of different subwords of a given length . Filtering out the contributions
due to the effects of the finite length of the texts, we find correlations
ranging to a few hundred letters. Scaling laws for the mutual information
(decay with a power law), for the entropy per letter (decay with the inverse
square root of ) and for the word numbers (stretched exponential growth with
and with a power law of the text length) were found.Comment: 8 page
Spreading and shortest paths in systems with sparse long-range connections
Spreading according to simple rules (e.g. of fire or diseases), and
shortest-path distances are studied on d-dimensional systems with a small
density p per site of long-range connections (``Small-World'' lattices). The
volume V(t) covered by the spreading quantity on an infinite system is exactly
calculated in all dimensions. We find that V(t) grows initially as t^d/d for
t>t^*$,
generalizing a previous result in one dimension. Using the properties of V(t),
the average shortest-path distance \ell(r) can be calculated as a function of
Euclidean distance r. It is found that
\ell(r) = r for r<r_c=(2p \Gamma_d (d-1)!)^{-1/d} log(2p \Gamma_d L^d), and
\ell(r) = r_c for r>r_c.
The characteristic length r_c, which governs the behavior of shortest-path
lengths, diverges with system size for all p>0. Therefore the mean separation s
\sim p^{-1/d} between shortcut-ends is not a relevant internal length-scale for
shortest-path lengths. We notice however that the globally averaged
shortest-path length, divided by L, is a function of L/s only.Comment: 4 pages, 1 eps fig. Uses psfi
Bias Analysis in Entropy Estimation
We consider the problem of finite sample corrections for entropy estimation.
New estimates of the Shannon entropy are proposed and their systematic error
(the bias) is computed analytically. We find that our results cover correction
formulas of current entropy estimates recently discussed in literature. The
trade-off between bias reduction and the increase of the corresponding
statistical error is analyzed.Comment: 5 pages, 3 figure
Statistics of finite-time Lyapunov exponents in the Ulam map
The statistical properties of finite-time Lyapunov exponents at the Ulam
point of the logistic map are investigated. The exact analytical expression for
the autocorrelation function of one-step Lyapunov exponents is obtained,
allowing the calculation of the variance of exponents computed over time
intervals of length . The variance anomalously decays as . The
probability density of finite-time exponents noticeably deviates from the
Gaussian shape, decaying with exponential tails and presenting spikes
that narrow and accumulate close to the mean value with increasing . The
asymptotic expression for this probability distribution function is derived. It
provides an adequate smooth approximation to describe numerical histograms
built for not too small , where the finiteness of bin size trimmes the sharp
peaks.Comment: 6 pages, 4 figures, to appear in Phys. Rev.
Statistical analysis of the DNA sequence of human chromosome 22
We study statistical patterns in the DNA sequence of human chromosome 22, the first completely sequenced human chromosome. We find that (i) the 33.4 x 10(6) nucleotide long human chromosome exhibits long-range power-law correlations over more than four orders of magnitude, (ii) the entropies H-n of the frequency distribution of oligonucleotides of length n (n-mers) grow sublinearly with increasing n, indicating the presence of higher-order correlations for all of the studied lengths 1 less than or equal to n less than or equal to 10, and (iii) the generalized entropies H-n(q) of n-mers decrease monotonically with increasing q and the decay of H-n(q) with q becomes steeper with increasing n less than or equal to 10, indicating that the frequency distribution of oligonucleotides becomes increasingly nonuniform as the length n increases. We investigate to what degree known biological features may explain the observed statistical patterns. We find that (iv) the presence of interspersed repeats may cause the sublinear increase of H-n with n, and that (v) the presence of monomeric tandem repeats as well as the suppression of CG dinucleotides may cause the observed decay of H-n(q) with q
Guessing probability distributions from small samples
We propose a new method for the calculation of the statistical properties, as
e.g. the entropy, of unknown generators of symbolic sequences. The probability
distribution of the elements of a population can be approximated by
the frequencies of a sample provided the sample is long enough so that
each element occurs many times. Our method yields an approximation if this
precondition does not hold. For a given we recalculate the Zipf--ordered
probability distribution by optimization of the parameters of a guessed
distribution. We demonstrate that our method yields reliable results.Comment: 10 pages, uuencoded compressed PostScrip
Finite-sample frequency distributions originating from an equiprobability distribution
Given an equidistribution for probabilities p(i)=1/N, i=1..N. What is the
expected corresponding rank ordered frequency distribution f(i), i=1..N, if an
ensemble of M events is drawn?Comment: 4 pages, 4 figure
Are we overestimating the number of cell-cycling genes? The impact of background models for time series data.
Periodic processes play fundamental roles in organisms. Prominent
examples are the cell cycle and the circadian clock. Microarray array technology
has enabled us to screen complete sets of transcripts for possible association with
such fundamental periodic processes on a system-wide level. Frequently, quite a
large number of genes has been detected as periodically expressed. However, the
small overlap of identified genes between different studies has shaded considerable
doubts about the reliability of the detected periodic expression. In this study, we
show that a major reason for the lacking agreement is the use of an inadequate
background model for the determination of significance. We demonstrate that the
choice of background model has considerable impact on the statistical significance
of periodic expression. For illustration, we reanalyzed two microarray studies of
the yeast cell cycle. Our evaluation strongly indicates that the results of previous
analyses might have been overoptimistic and that the use of more suitable
background model promises to give more realistic resultsinfo:eu-repo/semantics/publishedVersio
Search for markers of invasive growth in breast cancer: association with disease prognosis
In the present study, we analyzed the gene expression profiles of various morphological structures of breast cancer (GEO, GSE80754) to identify new markers of invasion and to assess their association with disease prognosis. Nine proteins (KIF14, DSC3, WAVE, etc.) was selected based on the literature analysis of the involvement of genes up- and down-regulated in solid and trabecular structures in cancer invasion and a heterogeneity in expression of their proteins in breast tumors. The association of these proteins with patients' survival was assessed
Transition to Stochastic Synchronization in Spatially Extended Systems
Spatially extended dynamical systems, namely coupled map lattices, driven by
additive spatio-temporal noise are shown to exhibit stochastic synchronization.
In analogy with low-dymensional systems, synchronization can be achieved only
if the maximum Lyapunov exponent becomes negative for sufficiently large noise
amplitude. Moreover, noise can suppress also the non-linear mechanism of
information propagation, that may be present in the spatially extended system.
A first example of phase transition is observed when both the linear and the
non-linear mechanisms of information production disappear at the same critical
value of the noise amplitude. The corresponding critical properties can be
hardly identified numerically, but some general argument suggests that they
could be ascribed to the Kardar-Parisi-Zhang universality class. Conversely,
when the non-linear mechanism prevails on the linear one, another type of phase
transition to stochastic synchronization occurs. This one is shown to belong to
the universality class of directed percolation.Comment: 21 pages, Latex - 14 EPS Figs - To appear on Physical Review
- …
