156 research outputs found
Statistical methods for detecting periodic fragments in DNA sequence data
<p>Abstract</p> <p>Background</p> <p>Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification of periodic signals in DNA sequences is therefore required to understand nucleosome organisation in genomes. While various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail and confirmatory testing for a priori specified periods has not been developed.</p> <p>Results</p> <p>We compared the estimation accuracy and suitability for confirmatory testing of autocorrelation, discrete Fourier transform (DFT), integer period discrete Fourier transform (IPDFT) and a previously proposed Hybrid measure. A number of different statistical significance procedures were evaluated but a blockwise bootstrap proved superior. When applied to synthetic data whose period-10 signal had been eroded, or for which the signal was approximately period-10, the Hybrid technique exhibited superior properties during exploratory period estimation. In contrast, confirmatory testing using the blockwise bootstrap procedure identified IPDFT as having the greatest statistical power. These properties were validated on yeast sequences defined from a ChIP-chip study where the Hybrid metric confirmed the expected dominance of period-10 in nucleosome associated DNA but IPDFT identified more significant occurrences of period-10. Application to the whole genomes of yeast and mouse identified ~ 21% and ~ 19% respectively of these genomes as spanned by period-10 nucleosome positioning sequences (NPS).</p> <p>Conclusions</p> <p>For estimating the dominant period, we find the Hybrid period estimation method empirically to be the most effective for both eroded and approximate periodicity. The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of period detection in the presence of eroded periodicity. The autocorrelation method was identified as poorly suited for use with the blockwise bootstrap. Application of our methods to the genomes of two model organisms revealed a striking proportion of the yeast and mouse genomes are spanned by NPS. Despite their markedly different sizes, roughly equivalent proportions (19-21%) of the genomes lie within period-10 spans of the NPS dinucleotides {<it>AA, TT, TA</it>}. The biological significance of these regions remains to be demonstrated. To facilitate this, the genomic coordinates are available as Additional files 1, 2, and 3 in a format suitable for visualisation as tracks on popular genome browsers.</p> <p>Reviewers</p> <p>This article was reviewed by Prof Tomas Radivoyevitch, Dr Vsevolod Makeev (nominated by Dr Mikhail Gelfand), and Dr Rob D Knight.</p
Conflict and Computation on Wikipedia: a Finite-State Machine Analysis of Editor Interactions
What is the boundary between a vigorous argument and a breakdown of
relations? What drives a group of individuals across it? Taking Wikipedia as a
test case, we use a hidden Markov model to approximate the computational
structure and social grammar of more than a decade of cooperation and conflict
among its editors. Across a wide range of pages, we discover a bursty war/peace
structure where the systems can become trapped, sometimes for months, in a
computational subspace associated with significantly higher levels of
conflict-tracking "revert" actions. Distinct patterns of behavior characterize
the lower-conflict subspace, including tit-for-tat reversion. While a fraction
of the transitions between these subspaces are associated with top-down actions
taken by administrators, the effects are weak. Surprisingly, we find no
statistical signal that transitions are associated with the appearance of
particularly anti-social users, and only weak association with significant news
events outside the system. These findings are consistent with transitions being
driven by decentralized processes with no clear locus of control. Models of
belief revision in the presence of a common resource for information-sharing
predict the existence of two distinct phases: a disordered high-conflict phase,
and a frozen phase with spontaneously-broken symmetry. The bistability we
observe empirically may be a consequence of editor turn-over, which drives the
system to a critical point between them.Comment: 23 pages, 3 figures. Matches published version. Code for HMM fitting
available at http://bit.ly/sfihmm ; time series and derived finite state
machines at bit.ly/wiki_hm
Structural Information in Two-Dimensional Patterns: Entropy Convergence and Excess Entropy
We develop information-theoretic measures of spatial structure and pattern in
more than one dimension. As is well known, the entropy density of a
two-dimensional configuration can be efficiently and accurately estimated via a
converging sequence of conditional entropies. We show that the manner in which
these conditional entropies converge to their asymptotic value serves as a
measure of global correlation and structure for spatial systems in any
dimension. We compare and contrast entropy-convergence with mutual-information
and structure-factor techniques for quantifying and detecting spatial
structure.Comment: 11 pages, 5 figures,
http://www.santafe.edu/projects/CompMech/papers/2dnnn.htm
Collective Phenomena and Non-Finite State Computation in a Human Social System
We investigate the computational structure of a paradigmatic example of
distributed social interaction: that of the open-source Wikipedia community. We
examine the statistical properties of its cooperative behavior, and perform
model selection to determine whether this aspect of the system can be described
by a finite-state process, or whether reference to an effectively unbounded
resource allows for a more parsimonious description. We find strong evidence,
in a majority of the most-edited pages, in favor of a collective-state model,
where the probability of a "revert" action declines as the square root of the
number of non-revert actions seen since the last revert. We provide evidence
that the emergence of this social counter is driven by collective interaction
effects, rather than properties of individual users.Comment: 23 pages, 4 figures, 3 tables; to appear in PLoS ON
- …