167,569 research outputs found

    Big data analytics in computational biology and bioinformatics

    Get PDF
    Big data analytics in computational biology and bioinformatics refers to an array of operations including biological pattern discovery, classification, prediction, inference, clustering as well as data mining in the cloud, among others. This dissertation addresses big data analytics by investigating two important operations, namely pattern discovery and network inference. The dissertation starts by focusing on biological pattern discovery at a genomic scale. Research reveals that the secondary structure in non-coding RNA (ncRNA) is more conserved during evolution than its primary nucleotide sequence. Using a covariance model approach, the stems and loops of an ncRNA secondary structure are represented as a statistical image against which an entire genome can be efficiently scanned for matching patterns. The covariance model approach is then further extended, in combination with a structural clustering algorithm and a random forests classifier, to perform genome-wide search for similarities in ncRNA tertiary structures. The dissertation then presents methods for gene network inference. Vast bodies of genomic data containing gene and protein expression patterns are now available for analysis. One challenge is to apply efficient methodologies to uncover more knowledge about the cellular functions. Very little is known concerning how genes regulate cellular activities. A gene regulatory network (GRN) can be represented by a directed graph in which each node is a gene and each edge or link is a regulatory effect that one gene has on another gene. By evaluating gene expression patterns, researchers perform in silico data analyses in systems biology, in particular GRN inference, where the “reverse engineering” is involved in predicting how a system works by looking at the system output alone. Many algorithmic and statistical approaches have been developed to computationally reverse engineer biological systems. However, there are no known bioin-formatics tools capable of performing perfect GRN inference. Here, extensive experiments are conducted to evaluate and compare recent bioinformatics tools for inferring GRNs from time-series gene expression data. Standard performance metrics for these tools based on both simulated and real data sets are generally low, suggesting that further efforts are needed to develop more reliable GRN inference tools. It is also observed that using multiple tools together can help identify true regulatory interactions between genes, a finding consistent with those reported in the literature. Finally, the dissertation discusses and presents a framework for parallelizing GRN inference methods using Apache Hadoop in a cloud environment

    Algorithms of causal inference for the analysis of effective connectivity among brain regions

    Get PDF
    In recent years, powerful general algorithms of causal inference have been developed. In particular, in the framework of Pearl’s causality, algorithms of inductive causation (IC and IC*) provide a procedure to determine which causal connections among nodes in a network can be inferred from empirical observations even in the presence of latent variables, indicating the limits of what can be learned without active manipulation of the system. These algorithms can in principle become important complements to established techniques such as Granger causality and Dynamic Causal Modeling (DCM) to analyze causal influences (effective connectivity) among brain regions. However, their application to dynamic processes has not been yet examined. Here we study how to apply these algorithms to time-varying signals such as electrophysiological or neuroimaging signals. We propose a new algorithm which combines the basic principles of the previous algorithms with Granger causality to obtain a representation of the causal relations suited to dynamic processes. Furthermore, we use graphical criteria to predict dynamic statistical dependencies between the signals from the causal structure. We show how some problems for causal inference from neural signals (e.g., measurement noise, hemodynamic responses, and time aggregation) can be understood in a general graphical approach. Focusing on the effect of spatial aggregation, we show that when causal inference is performed at a coarser scale than the one at which the neural sources interact, results strongly depend on the degree of integration of the neural sources aggregated in the signals, and thus characterize more the intra-areal properties than the interactions among regions. We finally discuss how the explicit consideration of latent processes contributes to understand Granger causality and DCM as well as to distinguish functional and effective connectivity

    Contamination source inference in water distribution networks

    Get PDF
    We study the inference of the origin and the pattern of contamination in water distribution networks. We assume a simplified model for the dyanmics of the contamination spread inside a water distribution network, and assume that at some random location a sensor detects the presence of contaminants. We transform the source location problem into an optimization problem by considering discrete times and a binary contaminated/not contaminated state for the nodes of the network. The resulting problem is solved by Mixed Integer Linear Programming. We test our results on random networks as well as in the Modena city network

    Really Natural Linear Indexed Type Checking

    Full text link
    Recent works have shown the power of linear indexed type systems for enforcing complex program properties. These systems combine linear types with a language of type-level indices, allowing more fine-grained analyses. Such systems have been fruitfully applied in diverse domains, including implicit complexity and differential privacy. A natural way to enhance the expressiveness of this approach is by allowing the indices to depend on runtime information, in the spirit of dependent types. This approach is used in DFuzz, a language for differential privacy. The DFuzz type system relies on an index language supporting real and natural number arithmetic over constants and variables. Moreover, DFuzz uses a subtyping mechanism to make types more flexible. By themselves, linearity, dependency, and subtyping each require delicate handling when performing type checking or type inference; their combination increases this challenge substantially, as the features can interact in non-trivial ways. In this paper, we study the type-checking problem for DFuzz. We show how we can reduce type checking for (a simple extension of) DFuzz to constraint solving over a first-order theory of naturals and real numbers which, although undecidable, can often be handled in practice by standard numeric solvers

    Towards a navigational logic for graphical structures

    Get PDF
    One of the main advantages of the Logic of Nested Conditions, defined by Habel and Pennemann, for reasoning about graphs, is its generality: this logic can be used in the framework of many classes of graphs and graphical structures. It is enough that the category of these structures satisfies certain basic conditions. In a previous paper [14], we extended this logic to be able to deal with graph properties including paths, but this extension was only defined for the category of untyped directed graphs. In addition it seemed difficult to talk about paths abstractly, that is, independently of the given category of graphical structures. In this paper we approach this problem. In particular, given an arbitrary category of graphical structures, we assume that for every object of this category there is an associated edge relation that can be used to define a path relation. Moreover, we consider that edges have some kind of labels and paths can be specified by associating them to a set of label sequences. Then, after the presentation of that general framework, we show how it can be applied to several classes of graphs. Moreover, we present a set of sound inference rules for reasoning in the logic.Peer ReviewedPostprint (author's final draft
    • …
    corecore