148 research outputs found
Dictionary-based methods for information extraction
In this paper, we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called dictionary of a sequence. Dictionaries are intrinsically interesting and a study of their features can be of great usefulness to investigate the properties of the sequences they have been extracted from e.g. DNA strings. We then describe a procedure of string comparison between dictionary-created sequences (or artificial texts) that gives very good results in several contexts. We finally present some results on self-consistent classification problems
On the ground states of the Bernasconi model
The ground states of the Bernasconi model are binary +1/-1 sequences of
length N with low autocorrelations. We introduce the notion of perfect
sequences, binary sequences with one-valued off-peak correlations of minimum
amount. If they exist, they are ground states. Using results from the
mathematical theory of cyclic difference sets, we specify all values of N for
which perfect sequences do exist and how to construct them. For other values of
N, we investigate almost perfect sequences, i.e. sequences with two-valued
off-peak correlations of minimum amount. Numerical and analytical results
support the conjecture that almost perfect sequences do exist for all values of
N, but that they are not always ground states. We present a construction for
low-energy configurations that works if N is the product of two odd primes.Comment: 12 pages, LaTeX2e; extended content, added references; submitted to
J.Phys.
An Analysis of Resting-State Functional Transcranial Doppler Recordings from Middle Cerebral Arteries
Functional transcrannial Doppler (fTCD) is used for monitoring the hemodynamics characteristics of major cerebral arteries. Its resting-state characteristics are known only when considering the maximal velocity corresponding to the highest Doppler shift (so called the envelope signals). Significantly more information about the resting-state fTCD can be gained when considering the raw cerebral blood flow velocity (CBFV) recordings. In this paper, we considered simultaneously acquired envelope and raw CBFV signals. Specifically, we collected bilateral CBFV recordings from left and right middle cerebral arteries using 20 healthy subjects (10 females). The data collection lasted for 15 minutes. The subjects were asked to remain awake, stay silent, and try to remain thought-free during the data collection. Time, frequency and time-frequency features were extracted from both the raw and the envelope CBFV signals. The effects of age, sex and body-mass index were examined on the extracted features. The results showed that the raw CBFV signals had a higher frequency content, and its temporal structures were almost uncorrelated. The information-theoretic features showed that the raw recordings from left and right middle cerebral arteries had higher content of mutual information than the envelope signals. Age and body-mass index did not have statistically significant effects on the extracted features. Sex-based differences were observed in all three domains and for both, the envelope signals and the raw CBFV signals. These findings indicate that the raw CBFV signals provide valuable information about the cerebral blood flow which can be utilized in further validation of fTCD as a clinical tool. © 2013 Sejdić et al
Algorithmic Complexity for Short Binary Strings Applied to Psychology: A Primer
Since human randomness production has been studied and widely used to assess
executive functions (especially inhibition), many measures have been suggested
to assess the degree to which a sequence is random-like. However, each of them
focuses on one feature of randomness, leading authors to have to use multiple
measures. Here we describe and advocate for the use of the accepted universal
measure for randomness based on algorithmic complexity, by means of a novel
previously presented technique using the the definition of algorithmic
probability. A re-analysis of the classical Radio Zenith data in the light of
the proposed measure and methodology is provided as a study case of an
application.Comment: To appear in Behavior Research Method
Structural Information in Two-Dimensional Patterns: Entropy Convergence and Excess Entropy
We develop information-theoretic measures of spatial structure and pattern in
more than one dimension. As is well known, the entropy density of a
two-dimensional configuration can be efficiently and accurately estimated via a
converging sequence of conditional entropies. We show that the manner in which
these conditional entropies converge to their asymptotic value serves as a
measure of global correlation and structure for spatial systems in any
dimension. We compare and contrast entropy-convergence with mutual-information
and structure-factor techniques for quantifying and detecting spatial
structure.Comment: 11 pages, 5 figures,
http://www.santafe.edu/projects/CompMech/papers/2dnnn.htm
Acid sensing by the Drosophila olfactory system.
The odour of acids has a distinct quality that is perceived as sharp, pungent and often irritating. How acidity is sensed and translated into an appropriate behavioural response is poorly understood. Here we describe a functionally segregated population of olfactory sensory neurons in the fruitfly, Drosophila melanogaster, that are highly selective for acidity. These olfactory sensory neurons express IR64a, a member of the recently identified ionotropic receptor (IR) family of putative olfactory receptors. In vivo calcium imaging showed that IR64a+ neurons projecting to the DC4 glomerulus in the antennal lobe are specifically activated by acids. Flies in which the function of IR64a+ neurons or the IR64a gene is disrupted had defects in acid-evoked physiological and behavioural responses, but their responses to non-acidic odorants remained unaffected. Furthermore, artificial stimulation of IR64a+ neurons elicited avoidance responses. Taken together, these results identify cellular and molecular substrates for acid detection in the Drosophila olfactory system and support a labelled-line mode of acidity coding at the periphery
Mixing Bandt-Pompe and Lempel-Ziv approaches: another way to analyze the complexity of continuous-states sequences
In this paper, we propose to mix the approach underlying Bandt-Pompe
permutation entropy with Lempel-Ziv complexity, to design what we call
Lempel-Ziv permutation complexity. The principle consists of two steps: (i)
transformation of a continuous-state series that is intrinsically multivariate
or arises from embedding into a sequence of permutation vectors, where the
components are the positions of the components of the initial vector when
re-arranged; (ii) performing the Lempel-Ziv complexity for this series of
`symbols', as part of a discrete finite-size alphabet. On the one hand, the
permutation entropy of Bandt-Pompe aims at the study of the entropy of such a
sequence; i.e., the entropy of patterns in a sequence (e.g., local increases or
decreases). On the other hand, the Lempel-Ziv complexity of a discrete-state
sequence aims at the study of the temporal organization of the symbols (i.e.,
the rate of compressibility of the sequence). Thus, the Lempel-Ziv permutation
complexity aims to take advantage of both of these methods. The potential from
such a combined approach - of a permutation procedure and a complexity analysis
- is evaluated through the illustration of some simulated data and some real
data. In both cases, we compare the individual approaches and the combined
approach.Comment: 30 pages, 4 figure
Mining, compressing and classifying with extensible motifs
BACKGROUND: Motif patterns of maximal saturation emerged originally in contexts of pattern discovery in biomolecular sequences and have recently proven a valuable notion also in the design of data compression schemes. Informally, a motif is a string of intermittently solid and wild characters that recurs more or less frequently in an input sequence or family of sequences. Motif discovery techniques and tools tend to be computationally imposing, however, special classes of "rigid" motifs have been identified of which the discovery is affordable in low polynomial time. RESULTS: In the present work, "extensible" motifs are considered such that each sequence of gaps comes endowed with some elasticity, whereby the same pattern may be stretched to fit segments of the source that match all the solid characters but are otherwise of different lengths. A few applications of this notion are then described. In applications of data compression by textual substitution, extensible motifs are seen to bring savings on the size of the codebook, and hence to improve compression. In germane contexts, in which compressibility is used in its dual role as a basis for structural inference and classification, extensible motifs are seen to support unsupervised classification and phylogeny reconstruction. CONCLUSION: Off-line compression based on extensible motifs can be used advantageously to compress and classify biological sequences
Integrated information increases with fitness in the evolution of animats
One of the hallmarks of biological organisms is their ability to integrate
disparate information sources to optimize their behavior in complex
environments. How this capability can be quantified and related to the
functional complexity of an organism remains a challenging problem, in
particular since organismal functional complexity is not well-defined. We
present here several candidate measures that quantify information and
integration, and study their dependence on fitness as an artificial agent
("animat") evolves over thousands of generations to solve a navigation task in
a simple, simulated environment. We compare the ability of these measures to
predict high fitness with more conventional information-theoretic processing
measures. As the animat adapts by increasing its "fit" to the world,
information integration and processing increase commensurately along the
evolutionary line of descent. We suggest that the correlation of fitness with
information integration and with processing measures implies that high fitness
requires both information processing as well as integration, but that
information integration may be a better measure when the task requires memory.
A correlation of measures of information integration (but also information
processing) and fitness strongly suggests that these measures reflect the
functional complexity of the animat, and that such measures can be used to
quantify functional complexity even in the absence of fitness data.Comment: 27 pages, 8 figures, one supplementary figure. Three supplementary
video files available on request. Version commensurate with published text in
PLoS Comput. Bio
- …