26 research outputs found
Describing the complexity of systems: multi-variable "set complexity" and the information basis of systems biology
Context dependence is central to the description of complexity. Keying on the
pairwise definition of "set complexity" we use an information theory approach
to formulate general measures of systems complexity. We examine the properties
of multi-variable dependency starting with the concept of interaction
information. We then present a new measure for unbiased detection of
multi-variable dependency, "differential interaction information." This
quantity for two variables reduces to the pairwise "set complexity" previously
proposed as a context-dependent measure of information in biological systems.
We generalize it here to an arbitrary number of variables. Critical limiting
properties of the "differential interaction information" are key to the
generalization. This measure extends previous ideas about biological
information and provides a more sophisticated basis for study of complexity.
The properties of "differential interaction information" also suggest new
approaches to data analysis. Given a data set of system measurements
differential interaction information can provide a measure of collective
dependence, which can be represented in hypergraphs describing complex system
interaction patterns. We investigate this kind of analysis using simulated data
sets. The conjoining of a generalized set complexity measure, multi-variable
dependency analysis, and hypergraphs is our central result. While our focus is
on complex biological systems, our results are applicable to any complex
system.Comment: 44 pages, 12 figures; made revisions after peer revie
Natural variation of chronological aging in the Saccharomyces cerevisiae species reveals diet-dependent mechanisms of life span control
Aging is a complex trait of broad scientific interest, especially because of its intrinsic link with common human diseases. Pioneering work on aging-related mechanisms has been made in Saccharomyces cerevisiae, mainly through the use of deletion collections isogenic to the S288c reference strain. In this study, using a recently published high-throughput approach, we quantified chronological life span (CLS) within a collection of 58 natural strains across seven different conditions. We observed a broad aging variability suggesting the implication of diverse genetic and environmental factors in chronological aging control. Two major Quantitative Trait Loci (QTLs) were identified within a biparental population obtained by crossing two natural isolates with contrasting aging behavior. Detection of these QTLs was dependent upon the nature and concentration of the carbon sources available for growth. In the first QTL, the RIM15 gene was identified as major regulator of aging under low glucose condition, lending further support to the importance of nutrient-sensing pathways in longevity control under calorie restriction. In the second QTL, we could show that the SER1 gene, encoding a conserved aminotransferase of the serine synthesis pathway not previously linked to aging, is causally associated with CLS regulation, especially under high glucose condition. These findings hint toward a new mechanism of life span control involving a trade-off between serine synthesis and aging, most likely through modulation of acetate and trehalose metabolism. More generally it shows that genetic linkage studies across natural strains represent a promising strategy to further unravel the molecular basis of aging
Examining the growth and stable isotopes of phytoplankton and periphyton communities exposed to oil sands reclamation strategies
The impacts of oil sands processed materials (OSPM) on phytoplankton and periphyton community growth and stable carbon and nitrogen isotopes were examined. Estimates of plankton and periphyton community growth, measured as chl a and dry weight, were low and similar in reference and OSPM reclamation wetlands. The use of stable isotope analyses revealed higher δ15N of plankton and periphyton in OSPM wetlands than reference wetlands, possibly due to increased TN concentrations in some OSPM wetlands.
In the laboratory, water-soluble fractions (WSF) of two types of OSPM (mature fine tailings, MFT and consolidated tailings, CT) and an amendment material (peat-mineral mixture), potential fill materials in wetland or end pit lake reclamation, were examined for phytoplankton community growth and stable carbon and nitrogen isotopes. All WSF treatments had higher chl a compared to reference water and maximum growth was observed at a 50:50 ratio of peat:CT or peat:MFT. In general, WSFs of peat had the highest concentration of total nitrogen (TN) whereas WSFs of MFT had the highest total phosphorus (TP; 3x higher). The results suggested that the addition of peat as an amendment to OSPM (particularly for MFT), contributing additional TN, could improve phytoplankton community growth in oil sands reclamation. At higher percentages of MFT WSF, there was increased turbidity due to fine clay particles that likely contributed to reduced phytoplankton growth. Turbidity could be an important factor limiting phytoplankton growth and thus reducing dietary resources and biological detritus (via sedimentation) in the initial development of an end pit lake. The WSFs also promoted the unfavourable growth of filamentous algae, highest at intermediate concentrations of peat and CT WSFs and inhibited in MFT WSFs due to light limitation. Stable N isotopes of plankton and filamentous algae suggests that 15N enrichment of algae could be a useful indicator of nutrient inputs, including OSPM seepage into natural aquatic systems, for oil sands regional monitoring programs
Functions on Probabilistic Graphical Models
Abstract—Probabilistic graphical models are tools that are used to represent the probability distribution of a vector of random variables X = (X1,..., XN). In this paper we introduce functions f(x1,..., xN) defined over the given vector. These functions also are random variables. The main result of the paper is an algorithm for finding the expected value and other moments for some classes of f(x1,..., xN). The possible applications of that algorithm are discussed. Specifically, we use it to analyze the entropy of X and to compute the relative entropy of two probability distributions of the same vector X. Finally, open problems and possible topics of future researches are discussed. I
Relations between the set-complexity and the structure of graphs and their sub-graphs
We describe some new conceptual tools for the rigorous, mathematical description of the “set-complexity” of graphs. This set-complexity has been shown previously to be a useful measure for analyzing some biological networks, and in discussing biological information in a quantitative fashion. The advances described here allow us to define some significant relationships between the set-complexity measure and the structure of graphs, and of their component sub-graphs. We show here that modular graph structures tend to maximize the set-complexity of graphs. We point out the relationship between modularity and redundancy, and discuss the significance of set-complexity in this regard. We specifically discuss the relationship between complexity and entropy in the case of complete-bipartite graphs, and present a new method for constructing highly complex, binary graphs. These results can be extended to the case of ternary graphs, and to other multi-edge graphs, which are fundamentally more relevant to biological structures and systems. Finally, our results lead us to an approach for extracting high complexity modular graphs from large, noisy graphs with low information content. We illustrate this approach with two examples
Complexity of Networks II: The Set Complexity of Edge-Colored Graphs
We previously introduced the concept of “set-complexity”, based on a context-dependent measure of information, and used this concept to describe the complexity of gene interaction networks. In the previous paper in this series we analyzed the set-complexity of binary graphs. Here we extend this analysis to graphs with multi-colored edges that more closely match biological structures like the gene interaction networks. All highly complex graphs by this measure exhibit a modular structure. A principal result of this work is that for the most complex graphs of a given size the number of edge colors is equal to the number of “modules” of the graph. Complete multipartite graphs (CMGs) are defined and analyzed, and the relation between complexity and structure of these graphs is examined in detail. We establish that the mutual information between any two nodes in a CMG can be fully expressed in terms of entropy, and present an explicit expression for the set complexity of CMGs (Theorem 3). An algorithm for generating highly complex graphs from CMGs is described. We establish several theorems relating these concepts and connecting complex graphs with a variety of practical network properties. In exploring the relation between symmetry and complexity we use the idea of a similarity matrix and its spectrum for highly complex graphs
New methods for finding associations in large data sets: Generalizing the maximal information coefficient (MIC)
We propose here a natural, but substantive, extension of the MIC. Defined for two variables, MIC has a distinct advance for detecting potentially complex dependencies. Our extension provides a similar means for dependencies among three variables. This itself is an important step for practical applications. We show that by merging two concepts, the interaction information, which is a generalization of the mutual information to three variables, and the normalized information distance, which measures informational sharing between two variables, we can extend the fundamental idea of MIC. Our results also exhibit some attractive properties that should be useful for practical applications in data analysis. Finally, the conceptual and mathematical framework presented here can be used to generalize the idea of MIC to the multi-variable case