15 research outputs found

    Performance comparison with PC and GIES on DREAM4 data sets.

    No full text
    <p>We evaluated the final prediction accuracy of our active learning algorithm in identifying edges in the undirected skeleton of the ground truth network. The resulting precision-recall (PR) curves were compared to PC with different values of <i>α</i> (significance level) in {0.01, 0.05, 0.1, 0.2, 0.3} using only observational data and to GIES using both observational and intervention data. We used the implementations of PC and GIES provided in the pcalg package in R. The dashed lines are drawn at one standard deviation from the mean in each direction based on five random trials. Our performance generally dominates that of PC and GIES, suggesting the effectiveness of our Bayesian learning approach.</p

    Reconstruction performance on simulated data from a GBN.

    No full text
    <p>We compared edge prediction performance between active and random learners, summarized over five trials. The dotted lines are drawn at one standard deviation from the mean in each direction. Active learner achieves higher accuracy and faster convergence than random learner.</p

    Reconstruction performance on single cell gene expression data.

    No full text
    <p>We applied our Bayesian structure learning algorithm based on GBNs to uncover the signaling pathway of 11 human proteins from expression data provided by Sachs et al. [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0150611#pone.0150611.ref005" target="_blank">5</a>]. MAP estimates of edge weights calculated using 1,000 posterior graph samples are used to generate a ranked list of (directed) edges for evaluation of accuracy. The data points for GIES are taken from Hauser and Bühlmann [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0150611#pone.0150611.ref019" target="_blank">19</a>] for comparison. The result suggests GBNs can uncover causal edges in real biological networks, and that our approach is more effective than GIES.</p

    Runtime improvement of our method on simulated data.

    No full text
    <p>The results are summarized over three trials (error bands are not visible due to low variance). Our optimization technique specific to GBNs leads to significant improvement in runtime.</p

    Reconstruction performance on DREAM4 benchmark data.

    No full text
    <p>The results are summarized over five trials. The dotted lines are drawn at one standard deviation from the mean in each direction. Active learner achieves higher accuracy and faster convergence than random learner.</p

    Active learning framework for network reconstruction.

    No full text
    <p>We first estimate our belief over candidate graph structures based on the initial data set that contains observational and/or intervention samples. Then, we iteratively acquire new data instances by carrying out the optimal intervention experiment predicted to cause the largest change in our belief (in expectation) and updating the belief. The final belief is summarized into a predicted network via Bayesian model averaging.</p

    Reconstructing Causal Biological Networks through Active Learning

    No full text
    <div><p>Reverse-engineering of biological networks is a central problem in systems biology. The use of intervention data, such as gene knockouts or knockdowns, is typically used for teasing apart causal relationships among genes. Under time or resource constraints, one needs to carefully choose which intervention experiments to carry out. Previous approaches for selecting most informative interventions have largely been focused on discrete Bayesian networks. However, continuous Bayesian networks are of great practical interest, especially in the study of complex biological systems and their quantitative properties. In this work, we present an efficient, information-theoretic active learning algorithm for Gaussian Bayesian networks (GBNs), which serve as important models for gene regulatory networks. In addition to providing linear-algebraic insights unique to GBNs, leading to significant runtime improvements, we demonstrate the effectiveness of our method on data simulated with GBNs and the DREAM4 network inference challenge data sets. Our method generally leads to faster recovery of underlying network structure and faster convergence to final distribution of confidence scores over candidate graph structures using the full data, in comparison to random selection of intervention experiments.</p></div

    Most of the difference in gene quantification arise from poorly mappable genes.

    No full text
    <p>(a, b) Log scatter plots of fraction of reads mapped to each gene between L262 and L75. Only plotting genes that are not pseudogenes and have perfect mappability scores leads to a scatter plot with near perfect correlation. Scatter plots that only show protein coding genes can be found in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0108095#pone.0108095.s009" target="_blank">Figure S9</a>. (c) Genes that are observed in both libraries and are at least 500 bp long were divided into five groups according to the mean mappability score, which is obtained by summarizing the 75 bp mappability score track on UCSC Genome Browser. For each group, we computed the fraction of genes that had a higher proportion of reads in L262 than in L75. The groups corresponding to lower mappability scores showed higher proportion of genes more represented in L262. The same trend was observed even when we restricted our analysis to protein coding genes.</p

    High-Resolution Transcriptome Analysis with Long-Read RNA Sequencing

    No full text
    <div><p>RNA sequencing (RNA-seq) enables characterization and quantification of individual transcriptomes as well as detection of patterns of allelic expression and alternative splicing. Current RNA-seq protocols depend on high-throughput short-read sequencing of cDNA. However, as ongoing advances are rapidly yielding increasing read lengths, a technical hurdle remains in identifying the degree to which differences in read length influence various transcriptome analyses. In this study, we generated two paired-end RNA-seq datasets of differing read lengths (2×75 bp and 2×262 bp) for lymphoblastoid cell line GM12878 and compared the effect of read length on transcriptome analyses, including read-mapping performance, gene and transcript quantification, and detection of allele-specific expression (ASE) and allele-specific alternative splicing (ASAS) patterns. Our results indicate that, while the current long-read protocol is considerably more expensive than short-read sequencing, there are important benefits that can only be achieved with longer read length, including lower mapping bias and reduced ambiguity in assigning reads to genomic elements, such as mRNA transcript. We show that these benefits ultimately lead to improved detection of <i>cis</i>-acting regulatory and splicing variation effects within individuals.</p></div

    The effect of read length on read-mapping performance.

    No full text
    <p>We compared the percentage of reads uniquely mapped (a) and the average runtime per a million reads (b) of Bowtie2, TopHat2, STAR, and GSNAP on L262 and L75. Only the splice-mappers – TopHat2, STAR, and GSNAP – achieved higher alignment rates for L262 compared to L75. The increase in runtime going from L75 to L262 varied greatly across the mappers, TopHat2 being the most sensitive among the four mappers tested.</p
    corecore