148 research outputs found

    Dictionary-based methods for information extraction

    Get PDF
    In this paper, we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called dictionary of a sequence. Dictionaries are intrinsically interesting and a study of their features can be of great usefulness to investigate the properties of the sequences they have been extracted from e.g. DNA strings. We then describe a procedure of string comparison between dictionary-created sequences (or artificial texts) that gives very good results in several contexts. We finally present some results on self-consistent classification problems

    On the ground states of the Bernasconi model

    Full text link
    The ground states of the Bernasconi model are binary +1/-1 sequences of length N with low autocorrelations. We introduce the notion of perfect sequences, binary sequences with one-valued off-peak correlations of minimum amount. If they exist, they are ground states. Using results from the mathematical theory of cyclic difference sets, we specify all values of N for which perfect sequences do exist and how to construct them. For other values of N, we investigate almost perfect sequences, i.e. sequences with two-valued off-peak correlations of minimum amount. Numerical and analytical results support the conjecture that almost perfect sequences do exist for all values of N, but that they are not always ground states. We present a construction for low-energy configurations that works if N is the product of two odd primes.Comment: 12 pages, LaTeX2e; extended content, added references; submitted to J.Phys.

    An Analysis of Resting-State Functional Transcranial Doppler Recordings from Middle Cerebral Arteries

    Get PDF
    Functional transcrannial Doppler (fTCD) is used for monitoring the hemodynamics characteristics of major cerebral arteries. Its resting-state characteristics are known only when considering the maximal velocity corresponding to the highest Doppler shift (so called the envelope signals). Significantly more information about the resting-state fTCD can be gained when considering the raw cerebral blood flow velocity (CBFV) recordings. In this paper, we considered simultaneously acquired envelope and raw CBFV signals. Specifically, we collected bilateral CBFV recordings from left and right middle cerebral arteries using 20 healthy subjects (10 females). The data collection lasted for 15 minutes. The subjects were asked to remain awake, stay silent, and try to remain thought-free during the data collection. Time, frequency and time-frequency features were extracted from both the raw and the envelope CBFV signals. The effects of age, sex and body-mass index were examined on the extracted features. The results showed that the raw CBFV signals had a higher frequency content, and its temporal structures were almost uncorrelated. The information-theoretic features showed that the raw recordings from left and right middle cerebral arteries had higher content of mutual information than the envelope signals. Age and body-mass index did not have statistically significant effects on the extracted features. Sex-based differences were observed in all three domains and for both, the envelope signals and the raw CBFV signals. These findings indicate that the raw CBFV signals provide valuable information about the cerebral blood flow which can be utilized in further validation of fTCD as a clinical tool. © 2013 Sejdić et al

    Algorithmic Complexity for Short Binary Strings Applied to Psychology: A Primer

    Full text link
    Since human randomness production has been studied and widely used to assess executive functions (especially inhibition), many measures have been suggested to assess the degree to which a sequence is random-like. However, each of them focuses on one feature of randomness, leading authors to have to use multiple measures. Here we describe and advocate for the use of the accepted universal measure for randomness based on algorithmic complexity, by means of a novel previously presented technique using the the definition of algorithmic probability. A re-analysis of the classical Radio Zenith data in the light of the proposed measure and methodology is provided as a study case of an application.Comment: To appear in Behavior Research Method

    Structural Information in Two-Dimensional Patterns: Entropy Convergence and Excess Entropy

    Full text link
    We develop information-theoretic measures of spatial structure and pattern in more than one dimension. As is well known, the entropy density of a two-dimensional configuration can be efficiently and accurately estimated via a converging sequence of conditional entropies. We show that the manner in which these conditional entropies converge to their asymptotic value serves as a measure of global correlation and structure for spatial systems in any dimension. We compare and contrast entropy-convergence with mutual-information and structure-factor techniques for quantifying and detecting spatial structure.Comment: 11 pages, 5 figures, http://www.santafe.edu/projects/CompMech/papers/2dnnn.htm

    Acid sensing by the Drosophila olfactory system.

    Get PDF
    The odour of acids has a distinct quality that is perceived as sharp, pungent and often irritating. How acidity is sensed and translated into an appropriate behavioural response is poorly understood. Here we describe a functionally segregated population of olfactory sensory neurons in the fruitfly, Drosophila melanogaster, that are highly selective for acidity. These olfactory sensory neurons express IR64a, a member of the recently identified ionotropic receptor (IR) family of putative olfactory receptors. In vivo calcium imaging showed that IR64a+ neurons projecting to the DC4 glomerulus in the antennal lobe are specifically activated by acids. Flies in which the function of IR64a+ neurons or the IR64a gene is disrupted had defects in acid-evoked physiological and behavioural responses, but their responses to non-acidic odorants remained unaffected. Furthermore, artificial stimulation of IR64a+ neurons elicited avoidance responses. Taken together, these results identify cellular and molecular substrates for acid detection in the Drosophila olfactory system and support a labelled-line mode of acidity coding at the periphery

    Mixing Bandt-Pompe and Lempel-Ziv approaches: another way to analyze the complexity of continuous-states sequences

    Get PDF
    In this paper, we propose to mix the approach underlying Bandt-Pompe permutation entropy with Lempel-Ziv complexity, to design what we call Lempel-Ziv permutation complexity. The principle consists of two steps: (i) transformation of a continuous-state series that is intrinsically multivariate or arises from embedding into a sequence of permutation vectors, where the components are the positions of the components of the initial vector when re-arranged; (ii) performing the Lempel-Ziv complexity for this series of `symbols', as part of a discrete finite-size alphabet. On the one hand, the permutation entropy of Bandt-Pompe aims at the study of the entropy of such a sequence; i.e., the entropy of patterns in a sequence (e.g., local increases or decreases). On the other hand, the Lempel-Ziv complexity of a discrete-state sequence aims at the study of the temporal organization of the symbols (i.e., the rate of compressibility of the sequence). Thus, the Lempel-Ziv permutation complexity aims to take advantage of both of these methods. The potential from such a combined approach - of a permutation procedure and a complexity analysis - is evaluated through the illustration of some simulated data and some real data. In both cases, we compare the individual approaches and the combined approach.Comment: 30 pages, 4 figure

    Mining, compressing and classifying with extensible motifs

    Get PDF
    BACKGROUND: Motif patterns of maximal saturation emerged originally in contexts of pattern discovery in biomolecular sequences and have recently proven a valuable notion also in the design of data compression schemes. Informally, a motif is a string of intermittently solid and wild characters that recurs more or less frequently in an input sequence or family of sequences. Motif discovery techniques and tools tend to be computationally imposing, however, special classes of "rigid" motifs have been identified of which the discovery is affordable in low polynomial time. RESULTS: In the present work, "extensible" motifs are considered such that each sequence of gaps comes endowed with some elasticity, whereby the same pattern may be stretched to fit segments of the source that match all the solid characters but are otherwise of different lengths. A few applications of this notion are then described. In applications of data compression by textual substitution, extensible motifs are seen to bring savings on the size of the codebook, and hence to improve compression. In germane contexts, in which compressibility is used in its dual role as a basis for structural inference and classification, extensible motifs are seen to support unsupervised classification and phylogeny reconstruction. CONCLUSION: Off-line compression based on extensible motifs can be used advantageously to compress and classify biological sequences

    Integrated information increases with fitness in the evolution of animats

    Get PDF
    One of the hallmarks of biological organisms is their ability to integrate disparate information sources to optimize their behavior in complex environments. How this capability can be quantified and related to the functional complexity of an organism remains a challenging problem, in particular since organismal functional complexity is not well-defined. We present here several candidate measures that quantify information and integration, and study their dependence on fitness as an artificial agent ("animat") evolves over thousands of generations to solve a navigation task in a simple, simulated environment. We compare the ability of these measures to predict high fitness with more conventional information-theoretic processing measures. As the animat adapts by increasing its "fit" to the world, information integration and processing increase commensurately along the evolutionary line of descent. We suggest that the correlation of fitness with information integration and with processing measures implies that high fitness requires both information processing as well as integration, but that information integration may be a better measure when the task requires memory. A correlation of measures of information integration (but also information processing) and fitness strongly suggests that these measures reflect the functional complexity of the animat, and that such measures can be used to quantify functional complexity even in the absence of fitness data.Comment: 27 pages, 8 figures, one supplementary figure. Three supplementary video files available on request. Version commensurate with published text in PLoS Comput. Bio
    corecore