139 research outputs found

    Reconstructing Causal Biological Networks through Active Learning

    No full text
    <div><p>Reverse-engineering of biological networks is a central problem in systems biology. The use of intervention data, such as gene knockouts or knockdowns, is typically used for teasing apart causal relationships among genes. Under time or resource constraints, one needs to carefully choose which intervention experiments to carry out. Previous approaches for selecting most informative interventions have largely been focused on discrete Bayesian networks. However, continuous Bayesian networks are of great practical interest, especially in the study of complex biological systems and their quantitative properties. In this work, we present an efficient, information-theoretic active learning algorithm for Gaussian Bayesian networks (GBNs), which serve as important models for gene regulatory networks. In addition to providing linear-algebraic insights unique to GBNs, leading to significant runtime improvements, we demonstrate the effectiveness of our method on data simulated with GBNs and the DREAM4 network inference challenge data sets. Our method generally leads to faster recovery of underlying network structure and faster convergence to final distribution of confidence scores over candidate graph structures using the full data, in comparison to random selection of intervention experiments.</p></div

    Reconstruction performance on simulated data from a GBN.

    No full text
    <p>We compared edge prediction performance between active and random learners, summarized over five trials. The dotted lines are drawn at one standard deviation from the mean in each direction. Active learner achieves higher accuracy and faster convergence than random learner.</p

    Reconstruction performance on single cell gene expression data.

    No full text
    <p>We applied our Bayesian structure learning algorithm based on GBNs to uncover the signaling pathway of 11 human proteins from expression data provided by Sachs et al. [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0150611#pone.0150611.ref005" target="_blank">5</a>]. MAP estimates of edge weights calculated using 1,000 posterior graph samples are used to generate a ranked list of (directed) edges for evaluation of accuracy. The data points for GIES are taken from Hauser and Bühlmann [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0150611#pone.0150611.ref019" target="_blank">19</a>] for comparison. The result suggests GBNs can uncover causal edges in real biological networks, and that our approach is more effective than GIES.</p

    Reconstruction performance on DREAM4 benchmark data.

    No full text
    <p>The results are summarized over five trials. The dotted lines are drawn at one standard deviation from the mean in each direction. Active learner achieves higher accuracy and faster convergence than random learner.</p

    Active learning framework for network reconstruction.

    No full text
    <p>We first estimate our belief over candidate graph structures based on the initial data set that contains observational and/or intervention samples. Then, we iteratively acquire new data instances by carrying out the optimal intervention experiment predicted to cause the largest change in our belief (in expectation) and updating the belief. The final belief is summarized into a predicted network via Bayesian model averaging.</p

    Runtime improvement of our method on simulated data.

    No full text
    <p>The results are summarized over three trials (error bands are not visible due to low variance). Our optimization technique specific to GBNs leads to significant improvement in runtime.</p

    Performance comparison with PC and GIES on DREAM4 data sets.

    No full text
    <p>We evaluated the final prediction accuracy of our active learning algorithm in identifying edges in the undirected skeleton of the ground truth network. The resulting precision-recall (PR) curves were compared to PC with different values of <i>α</i> (significance level) in {0.01, 0.05, 0.1, 0.2, 0.3} using only observational data and to GIES using both observational and intervention data. We used the implementations of PC and GIES provided in the pcalg package in R. The dashed lines are drawn at one standard deviation from the mean in each direction based on five random trials. Our performance generally dominates that of PC and GIES, suggesting the effectiveness of our Bayesian learning approach.</p

    HapTree (solid line) and HapCompass (dashed line) on simulated tetraploid genomes: Likelihood of Perfect Solution and Vector Error Rates, 1000 Trials, Block length: 10.

    No full text
    <p>HapTree (solid line) and HapCompass (dashed line) on simulated tetraploid genomes: Likelihood of Perfect Solution and Vector Error Rates, 1000 Trials, Block length: 10.</p

    Data analytics approach for melt-pool geometries in metal additive manufacturing

    No full text
    Modern data analytics was employed to understand and predict physics-based melt-pool formation by fabricating Ni alloy single tracks using powder bed fusion. An extensive database of melt-pool geometries was created, including processing parameters and material characteristics as input features. Correlation analysis provided insight for relationships between process parameters and melt-pools, and enabled the development of meaningful machine learning models via the use of highly correlated features. We successfully demonstrated that data analytics facilitates understanding of the inherent physics and reliable prediction of melt-pool geometries. This approach can serve as a basis for the melt-pool control and process optimization.</p

    HapTree (solid lines) and HapCompass (dashed lines) on simulated triploid genomes: Likelihood of Perfect Solution and Vector Error Rates, 1000 Trials, Block lengths: 10, 20, and 40.

    No full text
    <p>HapTree (solid lines) and HapCompass (dashed lines) on simulated triploid genomes: Likelihood of Perfect Solution and Vector Error Rates, 1000 Trials, Block lengths: 10, 20, and 40.</p
    • …
    corecore