466 research outputs found

    Simultaneous learning of instantaneous and time-delayed genetic interactions using novel information theoretic scoring technique

    Get PDF
    BACKGROUND: Understanding gene interactions is a fundamental question in systems biology. Currently, modeling of gene regulations using the Bayesian Network (BN) formalism assumes that genes interact either instantaneously or with a certain amount of time delay. However in reality, biological regulations, both instantaneous and time-delayed, occur simultaneously. A framework that can detect and model both these two types of interactions simultaneously would represent gene regulatory networks more accurately. RESULTS: In this paper, we introduce a framework based on the Bayesian Network (BN) formalism that can represent both instantaneous and time-delayed interactions between genes simultaneously. A novel scoring metric having firm mathematical underpinnings is also proposed that, unlike other recent methods, can score both interactions concurrently and takes into account the reality that multiple regulators can regulate a gene jointly, rather than in an isolated pair-wise manner. Further, a gene regulatory network (GRN) inference method employing an evolutionary search that makes use of the framework and the scoring metric is also presented. CONCLUSION: By taking into consideration the biological fact that both instantaneous and time-delayed regulations can occur among genes, our approach models gene interactions with greater accuracy. The proposed framework is efficient and can be used to infer gene networks having multiple orders of instantaneous and time-delayed regulations simultaneously. Experiments are carried out using three different synthetic networks (with three different mechanisms for generating synthetic data) as well as real life networks of Saccharomyces cerevisiae, E. coli and cyanobacteria gene expression data. The results show the effectiveness of our approach

    Simultaneous learning of instantaneous and time-delayed genetic interactions using novel information theoretic scoring technique

    Get PDF
    Background: Understanding gene interactions is a fundamental question in systems biology. Currently, modeling of gene regulations using the Bayesian Network (BN) formalism assumes that genes interact either instantaneously or with a certain amount of time delay. However in reality, biological regulations, both instantaneous and time-delayed, occur simultaneously. A framework that can detect and model both these two types of interactions simultaneously would represent gene regulatory networks more accurately. Results: In this paper, we introduce a framework based on the Bayesian Network (BN) formalism that can represent both instantaneous and time-delayed interactions between genes simultaneously. A novel scoring metric having firm mathematical underpinnings is also proposed that, unlike other recent methods, can score both interactions concurrently and takes into account the reality that multiple regulators can regulate a gene jointly, rather than in an isolated pair-wise manner. Further, a gene regulatory network (GRN) inference method employing an evolutionary search that makes use of the framework and the scoring metric is also presented. Conclusion: By taking into consideration the biological fact that both instantaneous and time-delayed regulations can occur among genes, our approach models gene interactions with greater accuracy. The proposed framework is efficient and can be used to infer gene networks having multiple orders of instantaneous and time-delayed regulations simultaneously. Experiments are carried out using three different synthetic networks (with three different mechanisms for generating synthetic data) as well as real life networks of Saccharomyces cerevisiae, E. coli and cyanobacteria gene expression data. The results show the effectiveness of our approach

    MICRAT: A Novel Algorithm for Inferring Gene Regulatory Networks Using Time Series Gene Expression Data

    Get PDF
    Background: Reconstruction of gene regulatory networks (GRNs), also known as reverse engineering of GRNs, aims to infer the potential regulation relationships between genes. With the development of biotechnology, such as gene chip microarray and RNA-sequencing, the high-throughput data generated provide us with more opportunities to infer the gene-gene interaction relationships using gene expression data and hence understand the underlying mechanism of biological processes. Gene regulatory networks are known to exhibit a multiplicity of interaction mechanisms which include functional and non-functional, and linear and non-linear relationships. Meanwhile, the regulatory interactions between genes and gene products are not spontaneous since various processes involved in producing fully functional and measurable concentrations of transcriptional factors/proteins lead to a delay in gene regulation. Many different approaches for reconstructing GRNs have been proposed, but the existing GRN inference approaches such as probabilistic Boolean networks and dynamic Bayesian networks have various limitations and relatively low accuracy. Inferring GRNs from time series microarray data or RNA-sequencing data remains a very challenging inverse problem due to its nonlinearity, high dimensionality, sparse and noisy data, and significant computational cost, which motivates us to develop more effective inference methods. Results: We developed a novel algorithm, MICRAT (Maximal Information coefficient with Conditional Relative Average entropy and Time-series mutual information), for inferring GRNs from time series gene expression data. Maximal information coefficient (MIC) is an effective measure of dependence for two-variable relationships. It captures a wide range of associations, both functional and non-functional, and thus has good performance on measuring the dependence between two genes. Our approach mainly includes two procedures. Firstly, it employs maximal information coefficient for constructing an undirected graph to represent the underlying relationships between genes. Secondly, it directs the edges in the undirected graph for inferring regulators and their targets. In this procedure, the conditional relative average entropies of each pair of nodes (or genes) are employed to indicate the directions of edges. Since the time delay might exist in the expression of regulators and target genes, time series mutual information is combined to cooperatively direct the edges for inferring the potential regulators and their targets. We evaluated the performance of MICRAT by applying it to synthetic datasets as well as real gene expression data and compare with other GRN inference methods. We inferred five 10-gene and five 100-gene networks from the DREAM4 challenge that were generated using the gene expression simulator GeneNetWeaver (GNW). MICRAT was also used to reconstruct GRNs on real gene expression data including part of the DNA-damaged response pathway (SOS DNA repair network) and experimental dataset in E. Coli. The results showed that MICRAT significantly improved the inference accuracy, compared to other inference methods, such as TDBN, etc. Conclusion: In this work, a novel algorithm, MICRAT, for inferring GRNs from time series gene expression data was proposed by taking into account dependence and time delay of expressions of a regulator and its target genes. This approach employed maximal information coefficients for reconstructing an undirected graph to represent the underlying relationships between genes. The edges were directed by combining conditional relative average entropy with time course mutual information of pairs of genes. The proposed algorithm was evaluated on the benchmark GRNs provided by the DREAM4 challenge and part of the real SOS DNA repair network in E. Coli. The experimental study showed that our approach was comparable to other methods on 10-gene datasets and outperformed other methods on 100-gene datasets in GRN inference from time series datasets

    Big data analytics in computational biology and bioinformatics

    Get PDF
    Big data analytics in computational biology and bioinformatics refers to an array of operations including biological pattern discovery, classification, prediction, inference, clustering as well as data mining in the cloud, among others. This dissertation addresses big data analytics by investigating two important operations, namely pattern discovery and network inference. The dissertation starts by focusing on biological pattern discovery at a genomic scale. Research reveals that the secondary structure in non-coding RNA (ncRNA) is more conserved during evolution than its primary nucleotide sequence. Using a covariance model approach, the stems and loops of an ncRNA secondary structure are represented as a statistical image against which an entire genome can be efficiently scanned for matching patterns. The covariance model approach is then further extended, in combination with a structural clustering algorithm and a random forests classifier, to perform genome-wide search for similarities in ncRNA tertiary structures. The dissertation then presents methods for gene network inference. Vast bodies of genomic data containing gene and protein expression patterns are now available for analysis. One challenge is to apply efficient methodologies to uncover more knowledge about the cellular functions. Very little is known concerning how genes regulate cellular activities. A gene regulatory network (GRN) can be represented by a directed graph in which each node is a gene and each edge or link is a regulatory effect that one gene has on another gene. By evaluating gene expression patterns, researchers perform in silico data analyses in systems biology, in particular GRN inference, where the “reverse engineering” is involved in predicting how a system works by looking at the system output alone. Many algorithmic and statistical approaches have been developed to computationally reverse engineer biological systems. However, there are no known bioin-formatics tools capable of performing perfect GRN inference. Here, extensive experiments are conducted to evaluate and compare recent bioinformatics tools for inferring GRNs from time-series gene expression data. Standard performance metrics for these tools based on both simulated and real data sets are generally low, suggesting that further efforts are needed to develop more reliable GRN inference tools. It is also observed that using multiple tools together can help identify true regulatory interactions between genes, a finding consistent with those reported in the literature. Finally, the dissertation discusses and presents a framework for parallelizing GRN inference methods using Apache Hadoop in a cloud environment

    Data based identification and prediction of nonlinear and complex dynamical systems

    Get PDF
    We thank Dr. R. Yang (formerly at ASU), Dr. R.-Q. Su (formerly at ASU), and Mr. Zhesi Shen for their contributions to a number of original papers on which this Review is partly based. This work was supported by ARO under Grant No. W911NF-14-1-0504. W.-X. Wang was also supported by NSFC under Grants No. 61573064 and No. 61074116, as well as by the Fundamental Research Funds for the Central Universities, Beijing Nova Programme.Peer reviewedPostprin

    Neural Circuit Dynamics and Ensemble Coding in the Locust and Fruit Fly Olfactory System

    Get PDF
    Raw sensory information is usually processed and reformatted by an organism’s brain to carry out tasks like identification, discrimination, tracking and storage. The work presented in this dissertation focuses on the processing strategies of neural circuits in the early olfactory system in two insects, the locust and the fruit fly. Projection neurons (PNs) in the antennal lobe (AL) respond to an odor presented to the locust’s antennae by firing in slow information-carrying temporal patterns, consistent across trials. Their downstream targets, the Kenyon cells (KCs) of the mushroom body (MB), receive input from large ensembles of transiently synchronous PNs at a time. The information arrives in slices of time corresponding to cycles of oscillatory activity originating in the AL. In the first part of the thesis, ensemble-level analysis techniques are used to understand how the AL-MB system deals with the problem of identifying odors across different concentrations. Individual PN odor responses can vary dramatically with concentration, but invariant patterns in PN ensemble responses are shown to allow odor identity to be extracted across a wide range of intensities by the KCs. Second, the sensitivity of the early olfactory system to stimulus history is examined. The PN ensemble and the KCs are found capable of tracking an odor in most conditions where it is pulsed or overlapping with another, but they occasionally fail (are masked) or reach intermediate states distinct from those seen for the odors presented alone or in a static mixture. The last part of the thesis focuses on the development of new recording techniques in the fruit fly, an organism with well-studied genetics and behavior. Genetically expressed fluorescent sensors of calcium offer the best available option to study ensemble activity in the fly. Here, simultaneous electrophysiology and two-photon imaging are used to estimate the correlation between G-CaMP, a popular genetically expressible calcium sensor, and electrical activity in PNs. The sensor is found to have poor temporal resolution and to miss significant spiking activity. More generally, this combination of electrophysiology and imaging enables explorations of functional connectivity and calibrated imaging of ensemble activity in the fruit fly.</p

    An information-theoretic approach to understanding the neural coding of relevant tactile features

    Get PDF
    Objective: Traditional theories in neuroscience state that tactile afferents present in the glabrous skin of the human hand encode tactile information following a submodality segregation strategy, meaning that each modality (eg. motion, vibration, shape, ... ) is encoded by a different afferent class. Modern theories suggest a submodality convergence instead, in which different afferent classes work together to capture information about the environment through tactile sense. Typically, studies involve electrophysiological recordings of tens of afferents. At the same time, the human hand is filled with around 17.000 afferents. In this thesis, we want to tackle the theoretical gap this poses. Specifically, we aim to address whether the peripheral nervous system relies on population coding to represent tactile information and whether such population coding enables us to disambiguate submodality convergence against the classical segregation. Approach: Understanding the encoding and flow of information in the nervous system is one of the main challenges of modern neuroscience. Neural signals are highly variable and may be non-linear. Moreover, there exist several candidate codes compatible with sensory and behavioral events. For example, they can rely on single cells or populations and also on rate or timing precision. Information-theoretic methods can capture non-linearities while being model independent, statistically robust, and mathematically well-grounded, becoming an ideal candidate to design pipelines for analyzing neural data. Despite information-theoretic methods being powerful for our objective, the vast majority of neural signals we can acquire from living systems makes analyses highly problem-specific. This is so because of the rich variety of biological processes that are involved (continuous, discrete, electrical, chemical, optical, ...). Main results: The first step towards solving the aforementioned challenges was to have a solid methodology we could trust and rely on. Consequently, the first deliverable from this thesis is a toolbox that gathers classical and state-of-the-art information-theoretic approaches and blends them with advanced machine learning tools to process and analyze neural data. Moreover, this toolbox also provides specific guidance on calcium imaging and electrophysiology analyses, encompassing both simulated and experimental data. We then designed an information-theoretic pipeline to analyze large-scale simulations of the tactile afferents that overcomes the current limitations of experimental studies in the field of touch and the peripheral nervous system. We dissected the importance of population coding for the different afferent classes, given their spatiotemporal dynamics. We also demonstrated that different afferent classes encode information simultaneously about very simple features, and that combining classes increases information levels, adding support to the submodality convergence theory. Significance: Fundamental knowledge about touch is essential both to design human-like robots exhibiting naturalistic exploration behavior and prostheses that can properly integrate and provide their user with relevant and useful information to interact with their environment. Demonstrating that the peripheral nervous system relies on heterogeneous population coding can change the designing paradigm of artificial systems, both in terms of which sensors to choose and which algorithms to use, especially in neuromorphic implementations

    Information-theoretic Reasoning in Distributed and Autonomous Systems

    Get PDF
    The increasing prevalence of distributed and autonomous systems is transforming decision making in industries as diverse as agriculture, environmental monitoring, and healthcare. Despite significant efforts, challenges remain in robustly planning under uncertainty. In this thesis, we present a number of information-theoretic decision rules for improving the analysis and control of complex adaptive systems. We begin with the problem of quantifying the data storage (memory) and transfer (communication) within information processing systems. We develop an information-theoretic framework to study nonlinear interactions within cooperative and adversarial scenarios, solely from observations of each agent's dynamics. This framework is applied to simulations of robotic soccer games, where the measures reveal insights into team performance, including correlations of the information dynamics to the scoreline. We then study the communication between processes with latent nonlinear dynamics that are observed only through a filter. By using methods from differential topology, we show that the information-theoretic measures commonly used to infer communication in observed systems can also be used in certain partially observed systems. For robotic environmental monitoring, the quality of data depends on the placement of sensors. These locations can be improved by either better estimating the quality of future viewpoints or by a team of robots operating concurrently. By robustly handling the uncertainty of sensor model measurements, we are able to present the first end-to-end robotic system for autonomously tracking small dynamic animals, with a performance comparable to human trackers. We then solve the issue of coordinating multi-robot systems through distributed optimisation techniques. These allow us to develop non-myopic robot trajectories for these tasks and, importantly, show that these algorithms provide guarantees for convergence rates to the optimal payoff sequence

    Social transmission of foraging behaviour in bottlenose dolphins and its interplay with climate change

    Get PDF
    Cultural behaviour, i.e., that which is transmitted socially among conspecifics, is found in a variety of taxa, including cetaceans. Different methods have been used to detect social learning in animal populations. ‘Network-based diffusion analysis’ (NBDA), for example, provides a statistical frame-work with which the importance of social learning on the spread of a behaviour can be quantified. It infers social learning if the diffusion of behaviour follows the social network and therefore relies on accurate association data among individuals. Incomplete association data can lead to uncertainty over the strengths of connections among individuals. Restricting analyses to only include individuals above a certain threshold of sightings can minimize such uncertainty, but at the same time reduce power of NBDA to detect learning when linking individuals are removed from the network. Following my General Introduction, Chapter 2 of this thesis therefore provides a tool for researchers to select an appropriate threshold for the inclusion of individuals that maximizes the power of NBDA to detect social learning. In the study of the rise and spread of cultural behaviour, ecology and genetics are potentially confounding factors as they too can drive behavioural variation between individuals, communities and populations. I use a multi-network version of NBDA, which can account for these potential confounds by including networks reflecting association patterns, genetic relatedness and habitat use, in Chapters 3 and 4 to investigate the spread of two foraging strategies, ‘shelling’ and ‘sponging’, in a population of Indo-Pacific bottlenose dolphins (Tursiops aduncus) in the western gulf of Shark Bay, Western Australia, between 2007 and 2018. Shelling (Chapter 3) appears to spread horizontally among associated individuals, which stands in stark con-trast to the predominantly vertically transmitted foraging strategies, from mother to offspring, in Shark Bay dolphins and indeed toothed whales in general. My study provides the first quantitative evidence of horizontal transmission in any toothed whale species and suggests similarities in the cultural nature of cetaceans and great apes, which rely extensively on both vertical and horizontal social learning. Conversely, the findings presented in Chapter 4 suggest vertical social transmission of sponging from mother to primarily female offspring, confirming the results of previous research using different methods. Chapters 3 and 4 illustrate how long-term data sets on individual associations, habitat use and genetics, in combination with new statistical tools like NBDA, provide an ideal framework to assess the spread of behaviour in free-ranging animal populations. In Chapter 5, I investigate the impacts of a marine heatwave, which led to catastrophic losses of habitat-forming seagrass beds and mass mortalities of fish and invertebrates in Shark Bay, on the vital rates of the resident dolphin population. Long-term demographic data and capture-recapture analyses on data collected before and after the heatwave indicate immediate and on-going reductions in both survival and reproductive rates within the dolphin population, presumably due to the cascading effects of the heatwave on lower-trophic organisms combined with a lack of ecosystem recovery. Remarkably, survival rates of sponging dolphins appear less adversely impacted compared to those of non-spongers, suggesting that their foraging niche may have buffered them against more negative impacts. Whether or not culturally different communities within a population may respond differently to environmental change remains an exciting avenue of research in the future. Finally, I discuss the broader ramifications of this thesis in the General Discussion and suggest further directions in the study of cultural behaviour in bottlenose dolphins
    corecore