112 research outputs found

    Artificial Sequences and Complexity Measures

    Get PDF
    In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools to extract, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of Artificial Text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self consistent-classification.Comment: Revised version, with major changes, of previous "Data Compression approach to Information Extraction and Classification" by A. Baronchelli and V. Loreto. 15 pages; 5 figure

    Characterisation of Human Embryonic Stem Cells Conditioning Media by 1H-Nuclear Magnetic Resonance Spectroscopy

    Get PDF
    BACKGROUND: Cell culture media conditioned by human foreskin fibroblasts (HFFs) provide a complex supplement of protein and metabolic factors that support in vitro proliferation of human embryonic stem cells (hESCs). However, the conditioning process is variable with different media batches often exhibiting differing capacities to maintain hESCs in culture. While recent studies have examined the protein complement of conditioned culture media, detailed information regarding the metabolic component of this media is lacking. METHODOLOGY/PRINCIPAL FINDINGS: Using a (1)H-Nuclear Magnetic Resonance ((1)H-NMR) metabonomics approach, 32 metabolites and small compounds were identified and quantified in media conditioned by passage 11 HFFs (CMp11). A number of metabolites were secreted by HFFs with significantly higher concentration of lactate, alanine, and formate detected in CMp11 compared to non-conditioned media. In contrast, levels of tryptophan, folate and niacinamide were depleted in CMp11 indicating the utilisation of these metabolites by HFFs. Multivariate statistical analysis of the (1)H-NMR data revealed marked age-related differences in the metabolic profile of CMp11 collected from HFFs every 24 h over 72 h. Additionally, the metabolic profile of CMp11 was altered following freezing at -20°C for 2 weeks. CM derived from passage 18 HFFs (CMp18) was found to be ineffective at supporting hESCs in an undifferentiated state beyond 5 days culture. Multivariate statistical comparison of CMp11 and CMp18 metabolic profiles enabled rapid and clear discrimination between the two media with CMp18 containing lower concentrations of lactate and alanine as well as higher concentrations of glucose and glutamine. CONCLUSIONS/SIGNIFICANCE: (1)H-NMR-based metabonomics offers a rapid and accurate method of characterising hESC conditioning media and is a valuable tool for monitoring, controlling and optimising hESC culture media preparation

    Differences between Human Plasma and Serum Metabolite Profiles

    Get PDF
    BACKGROUND: Human plasma and serum are widely used matrices in clinical and biological studies. However, different collecting procedures and the coagulation cascade influence concentrations of both proteins and metabolites in these matrices. The effects on metabolite concentration profiles have not been fully characterized. METHODOLOGY/PRINCIPAL FINDINGS: We analyzed the concentrations of 163 metabolites in plasma and serum samples collected simultaneously from 377 fasting individuals. To ensure data quality, 41 metabolites with low measurement stability were excluded from further analysis. In addition, plasma and corresponding serum samples from 83 individuals were re-measured in the same plates and mean correlation coefficients (r) of all metabolites between the duplicates were 0.83 and 0.80 in plasma and serum, respectively, indicating significantly better stability of plasma compared to serum (p = 0.01). Metabolite profiles from plasma and serum were clearly distinct with 104 metabolites showing significantly higher concentrations in serum. In particular, 9 metabolites showed relative concentration differences larger than 20%. Despite differences in absolute concentration between the two matrices, for most metabolites the overall correlation was high (mean r = 0.81±0.10), which reflects a proportional change in concentration. Furthermore, when two groups of individuals with different phenotypes were compared with each other using both matrices, more metabolites with significantly different concentrations could be identified in serum than in plasma. For example, when 51 type 2 diabetes (T2D) patients were compared with 326 non-T2D individuals, 15 more significantly different metabolites were found in serum, in addition to the 25 common to both matrices. CONCLUSIONS/SIGNIFICANCE: Our study shows that reproducibility was good in both plasma and serum, and better in plasma. Furthermore, as long as the same blood preparation procedure is used, either matrix should generate similar results in clinical and biological studies. The higher metabolite concentrations in serum, however, make it possible to provide more sensitive results in biomarker detection

    Evaluation and Characterization of Bacterial Metabolic Dynamics with a Novel Profiling Technique, Real-Time Metabolotyping

    Get PDF
    BACKGROUND: Environmental processes in ecosystems are dynamically altered by several metabolic responses in microorganisms, including intracellular sensing and pumping, battle for survival, and supply of or competition for nutrients. Notably, intestinal bacteria maintain homeostatic balance in mammals via multiple dynamic biochemical reactions to produce several metabolites from undigested food, and those metabolites exert various effects on mammalian cells in a time-dependent manner. We have established a method for the analysis of bacterial metabolic dynamics in real time and used it in combination with statistical NMR procedures. METHODOLOGY/PRINCIPAL FINDINGS: We developed a novel method called real-time metabolotyping (RT-MT), which performs sequential (1)H-NMR profiling and two-dimensional (2D) (1)H, (13)C-HSQC (heteronuclear single quantum coherence) profiling during bacterial growth in an NMR tube. The profiles were evaluated with such statistical methods as Z-score analysis, principal components analysis, and time series of statistical TOtal Correlation SpectroScopY (TOCSY). In addition, using 2D (1)H, (13)C-HSQC with the stable isotope labeling technique, we observed the metabolic kinetics of specific biochemical reactions based on time-dependent 2D kinetic profiles. Using these methods, we clarified the pathway for linolenic acid hydrogenation by a gastrointestinal bacterium, Butyrivibrio fibrisolvens. We identified trans11, cis13 conjugated linoleic acid as the intermediate of linolenic acid hydrogenation by B. fibrisolvens, based on the results of (13)C-labeling RT-MT experiments. In addition, we showed that the biohydrogenation of polyunsaturated fatty acids serves as a defense mechanism against their toxic effects. CONCLUSIONS: RT-MT is useful for the characterization of beneficial bacterium that shows potential for use as probiotic by producing bioactive compounds

    Mass-spectrometry-based metabolomics: limitations and recommendations for future progress with particular focus on nutrition research

    Get PDF
    Mass spectrometry (MS) techniques, because of their sensitivity and selectivity, have become methods of choice to characterize the human metabolome and MS-based metabolomics is increasingly used to characterize the complex metabolic effects of nutrients or foods. However progress is still hampered by many unsolved problems and most notably the lack of well established and standardized methods or procedures, and the difficulties still met in the identification of the metabolites influenced by a given nutritional intervention. The purpose of this paper is to review the main obstacles limiting progress and to make recommendations to overcome them. Propositions are made to improve the mode of collection and preparation of biological samples, the coverage and quality of mass spectrometry analyses, the extraction and exploitation of the raw data, the identification of the metabolites and the biological interpretation of the results

    Universal entropy of word ordering across linguistic families

    Get PDF
    Background The language faculty is probably the most distinctive feature of our species, and endows us with a unique ability to exchange highly structured information. In written language, information is encoded by the concatenation of basic symbols under grammatical and semantic constraints. As is also the case in other natural information carriers, the resulting symbolic sequences show a delicate balance between order and disorder. That balance is determined by the interplay between the diversity of symbols and by their specific ordering in the sequences. Here we used entropy to quantify the contribution of different organizational levels to the overall statistical structure of language. Methodology/Principal Findings We computed a relative entropy measure to quantify the degree of ordering in word sequences from languages belonging to several linguistic families. While a direct estimation of the overall entropy of language yielded values that varied for the different families considered, the relative entropy quantifying word ordering presented an almost constant value for all those families. Conclusions/Significance Our results indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in the structure of language is a statistical linguistic universal
    corecore