77 research outputs found

    Handwritten digit recognition by bio-inspired hierarchical networks

    Full text link
    The human brain processes information showing learning and prediction abilities but the underlying neuronal mechanisms still remain unknown. Recently, many studies prove that neuronal networks are able of both generalizations and associations of sensory inputs. In this paper, following a set of neurophysiological evidences, we propose a learning framework with a strong biological plausibility that mimics prominent functions of cortical circuitries. We developed the Inductive Conceptual Network (ICN), that is a hierarchical bio-inspired network, able to learn invariant patterns by Variable-order Markov Models implemented in its nodes. The outputs of the top-most node of ICN hierarchy, representing the highest input generalization, allow for automatic classification of inputs. We found that the ICN clusterized MNIST images with an error of 5.73% and USPS images with an error of 12.56%

    Artificial Sequences and Complexity Measures

    Get PDF
    In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools to extract, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of Artificial Text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self consistent-classification.Comment: Revised version, with major changes, of previous "Data Compression approach to Information Extraction and Classification" by A. Baronchelli and V. Loreto. 15 pages; 5 figure

    Differences between Human Plasma and Serum Metabolite Profiles

    Get PDF
    BACKGROUND: Human plasma and serum are widely used matrices in clinical and biological studies. However, different collecting procedures and the coagulation cascade influence concentrations of both proteins and metabolites in these matrices. The effects on metabolite concentration profiles have not been fully characterized. METHODOLOGY/PRINCIPAL FINDINGS: We analyzed the concentrations of 163 metabolites in plasma and serum samples collected simultaneously from 377 fasting individuals. To ensure data quality, 41 metabolites with low measurement stability were excluded from further analysis. In addition, plasma and corresponding serum samples from 83 individuals were re-measured in the same plates and mean correlation coefficients (r) of all metabolites between the duplicates were 0.83 and 0.80 in plasma and serum, respectively, indicating significantly better stability of plasma compared to serum (p = 0.01). Metabolite profiles from plasma and serum were clearly distinct with 104 metabolites showing significantly higher concentrations in serum. In particular, 9 metabolites showed relative concentration differences larger than 20%. Despite differences in absolute concentration between the two matrices, for most metabolites the overall correlation was high (mean r = 0.81±0.10), which reflects a proportional change in concentration. Furthermore, when two groups of individuals with different phenotypes were compared with each other using both matrices, more metabolites with significantly different concentrations could be identified in serum than in plasma. For example, when 51 type 2 diabetes (T2D) patients were compared with 326 non-T2D individuals, 15 more significantly different metabolites were found in serum, in addition to the 25 common to both matrices. CONCLUSIONS/SIGNIFICANCE: Our study shows that reproducibility was good in both plasma and serum, and better in plasma. Furthermore, as long as the same blood preparation procedure is used, either matrix should generate similar results in clinical and biological studies. The higher metabolite concentrations in serum, however, make it possible to provide more sensitive results in biomarker detection

    Evaluation and Characterization of Bacterial Metabolic Dynamics with a Novel Profiling Technique, Real-Time Metabolotyping

    Get PDF
    BACKGROUND: Environmental processes in ecosystems are dynamically altered by several metabolic responses in microorganisms, including intracellular sensing and pumping, battle for survival, and supply of or competition for nutrients. Notably, intestinal bacteria maintain homeostatic balance in mammals via multiple dynamic biochemical reactions to produce several metabolites from undigested food, and those metabolites exert various effects on mammalian cells in a time-dependent manner. We have established a method for the analysis of bacterial metabolic dynamics in real time and used it in combination with statistical NMR procedures. METHODOLOGY/PRINCIPAL FINDINGS: We developed a novel method called real-time metabolotyping (RT-MT), which performs sequential (1)H-NMR profiling and two-dimensional (2D) (1)H, (13)C-HSQC (heteronuclear single quantum coherence) profiling during bacterial growth in an NMR tube. The profiles were evaluated with such statistical methods as Z-score analysis, principal components analysis, and time series of statistical TOtal Correlation SpectroScopY (TOCSY). In addition, using 2D (1)H, (13)C-HSQC with the stable isotope labeling technique, we observed the metabolic kinetics of specific biochemical reactions based on time-dependent 2D kinetic profiles. Using these methods, we clarified the pathway for linolenic acid hydrogenation by a gastrointestinal bacterium, Butyrivibrio fibrisolvens. We identified trans11, cis13 conjugated linoleic acid as the intermediate of linolenic acid hydrogenation by B. fibrisolvens, based on the results of (13)C-labeling RT-MT experiments. In addition, we showed that the biohydrogenation of polyunsaturated fatty acids serves as a defense mechanism against their toxic effects. CONCLUSIONS: RT-MT is useful for the characterization of beneficial bacterium that shows potential for use as probiotic by producing bioactive compounds

    Mass-spectrometry-based metabolomics: limitations and recommendations for future progress with particular focus on nutrition research

    Get PDF
    Mass spectrometry (MS) techniques, because of their sensitivity and selectivity, have become methods of choice to characterize the human metabolome and MS-based metabolomics is increasingly used to characterize the complex metabolic effects of nutrients or foods. However progress is still hampered by many unsolved problems and most notably the lack of well established and standardized methods or procedures, and the difficulties still met in the identification of the metabolites influenced by a given nutritional intervention. The purpose of this paper is to review the main obstacles limiting progress and to make recommendations to overcome them. Propositions are made to improve the mode of collection and preparation of biological samples, the coverage and quality of mass spectrometry analyses, the extraction and exploitation of the raw data, the identification of the metabolites and the biological interpretation of the results

    Universal entropy of word ordering across linguistic families

    Get PDF
    Background The language faculty is probably the most distinctive feature of our species, and endows us with a unique ability to exchange highly structured information. In written language, information is encoded by the concatenation of basic symbols under grammatical and semantic constraints. As is also the case in other natural information carriers, the resulting symbolic sequences show a delicate balance between order and disorder. That balance is determined by the interplay between the diversity of symbols and by their specific ordering in the sequences. Here we used entropy to quantify the contribution of different organizational levels to the overall statistical structure of language. Methodology/Principal Findings We computed a relative entropy measure to quantify the degree of ordering in word sequences from languages belonging to several linguistic families. While a direct estimation of the overall entropy of language yielded values that varied for the different families considered, the relative entropy quantifying word ordering presented an almost constant value for all those families. Conclusions/Significance Our results indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in the structure of language is a statistical linguistic universal

    An Open Interface for Probabilistic Models of Text

    No full text
    An Application Program Interface (API) for modelling sequential text is described. The API is intended to shield the user from details of the modelling and probability estimation process. This should enable different implementations of models to be replaced transparently in application programs. The motivation for this API is work on the use of textual models for applications in addition to strict data compression, e.g. determination of the source of text, spelling correction or segmentation of text by inserting spaces. The API is probabilistic: that is, it supplies the probability of the next symbol in the sequence. It is general enough to deal accurately with models that include escapes for probabilities. The concepts abstracted by the API are explained together with details of the API calls
    corecore