88 research outputs found
Long-Range Correlations in Sentence Series from <i>A Story of the Stone</i>
<div><p>A sentence is the natural unit of language. Patterns embedded in series of sentences can be used to model the formation and evolution of languages, and to solve practical problems such as evaluating linguistic ability. In this paper, we apply de-trended fluctuation analysis to detect long-range correlations embedded in sentence series from <i>A Story of the Stone</i>, one of the greatest masterpieces of Chinese literature. We identified a weak long-range correlation, with a Hurst exponent of 0.575±0.002 up to a scale of 10<sup>4</sup>. We used the structural stability to confirm the behavior of the long-range correlation, and found that different parts of the series had almost identical Hurst exponents. We found that noisy records can lead to false results and conclusions, even if the noise covers a limited proportion of the total records (e.g., less than 1%). Thus, the structural stability test is an essential procedure for confirming the existence of long-range correlations, which has been widely neglected in previous studies. Furthermore, a combination of de-trended fluctuation analysis and diffusion entropy analysis demonstrated that the sentence series was generated by a fractional Brownian motion.</p></div
Long-range correlations in the noisy and cleaned series.
<p>Noisy records can result in incorrect estimates of the Hurst exponents (solid circles) and consequently false conclusions, even if they only cover a limited proportion of the total series. The effect of noise can be removed using a cleansing procedure (open circles). The X-and E-part of the text are the first to 80th chapters, and the 81th to the 120th chapters, which are currently attributed to Xueqin Cao and E Gao, respectively.</p
State transfer networks for fGms series.
<p>Segment length is selected to be <i>s</i> = 5. Subplots (a1)-(f1) are the original state transfer networks for the fGm series with <i>H</i> = 0.5, 0.6, 0.65, 0.7, 0.75, and 0.8, respectively. The nodes that have self-links are marked with red color. The label <i>x</i>(<i>y</i>) means the state occurs for the first time at the position <i>x</i> along the time series (the <i>x</i>’th segment), and its identifier number is <i>y</i>; (a2)-(f2) The strong state transfer networks constructed by filtering out weak links (less than 25) in the original state transfer networks; (a3)-(f3) Shuffled networks. One can shuffle each original fGm series, and construct from the resulting series a shuffled network. Each displayed shuffled network is an average over 1000 realizations. Weak links also are filtered out. Except in the networks shown in (a1)-(f1), the size of a node indicates the occurring degree of the state. The width of an edge is the link’s weight.</p
Statistics of sentence lengths.
<p>(a) Sentence length series. <i>Inset</i>: some sentences with large lengths are clustered together. This cluster is due to the traditional Chinese full stop symbol (small open circle) that was used in the 30th chapter, which produced noisy data. (b) Sentence length follows a right-skewed distribution, which is the log-normal distribution in (c).</p
Structural stability test.
<p>The total polluted series was separated into 12 non-overlapping segments with a length of 2881. (a) All the curves obey almost perfect power-law relationships (gray open circles), except the 3rd segment which deviates significantly (red solid circles). (b) Hurst exponents for the 12 segments. The unreasonable large value of 0.87 for the 3rd noisy segment was corrected to 0.621 (red solid circle) after applying the cleaning procedure. There is a slightly decreasing trend.</p
Diffusion entropy analysis of the cleaned total, cleaned X-part, and E-part series.
<p>Estimated values of the scaling exponent for the three series were almost identical.</p
All the states that turn out to be motifs in the fGm series with <i>H</i> = 0.6, 0.65, 0.7, 0.75, 0.8 (the total number of occurrence states is 132).
<p>Segment length is selected to be <i>s</i> = 6.</p
Degree, degree ratio, and persistent behaviors of motifs for fGm series.
<p>Segment length is selected to be <i>s</i> = 5. (a1)-(e1) show the occurrence degrees of the states in the original and shuffled fGm series with <i>H</i> = 0.6, 0.65, 0.7, 0.75 and 0.8, respectively; (a2)-(e2) present the degree ratios for all the states (visibility graphs) in the series with <i>H</i> = 0.6, 0.65, 0.7, 0.75, and 0.8, respectively; (a3)-(e3) Relations of <i>R</i>/<i>S</i> versus <i>n</i> obtained from occurring position series of the motifs, from which one can find persistent behaviors of the motifs’ occurring along the series.</p
Degree, degree ratio, and persistent behaviors of motifs for fGm series.
<p>Segment length is selected to be <i>s</i> = 6. (a1)-(e1) show the occurrence degrees of the states in the original and shuffled fGm series with <i>H</i> = 0.6, 0.65, 0.7, 0.75 and 0.8, respectively; (a2)-(e2) present the degree ratios for all the states (visibility graphs) in the series with <i>H</i> = 0.6, 0.65, 0.7, 0.75, and 0.8, respectively; (a3)-(e3) Relations of <i>R</i>/<i>S</i> versus <i>n</i> obtained from occurring position series of the motifs, from which one can find persistent behaviors of the motifs’ occurring along the series.</p
All the states that turn out to be motifs in the stock market index series.
<p>Segment length is selected to be <i>s</i> = 6.</p
- …