11 research outputs found

    Representing high throughput expression profiles via perturbation barcodes reveals compound targets

    No full text
    <div><p>High throughput mRNA expression profiling can be used to characterize the response of cell culture models to perturbations such as pharmacologic modulators and genetic perturbations. As profiling campaigns expand in scope, it is important to homogenize, summarize, and analyze the resulting data in a manner that captures significant biological signals in spite of various noise sources such as batch effects and stochastic variation. We used the L1000 platform for large-scale profiling of 978 representative genes across thousands of compound treatments. Here, a method is described that uses deep learning techniques to convert the expression changes of the landmark genes into a perturbation barcode that reveals important features of the underlying data, performing better than the raw data in revealing important biological insights. The barcode captures compound structure and target information, and predicts a compound’s high throughput screening promiscuity, to a higher degree than the original data measurements, indicating that the approach uncovers underlying factors of the expression data that are otherwise entangled or masked by noise. Furthermore, we demonstrate that visualizations derived from the perturbation barcode can be used to more sensitively assign functions to unknown compounds through a guilt-by-association approach, which we use to predict and experimentally validate the activity of compounds on the MAPK pathway. The demonstrated application of deep metric learning to large-scale chemical genetics projects highlights the utility of this and related approaches to the extraction of insights and testable hypotheses from big, sometimes noisy data.</p></div

    Visualizations of the data based on z-scores or perturbation barcodes were examined to select candidate compounds in the phenotypic neighborhood of a series of known MAPK pathway inhibitors.

    No full text
    <p><b>(A–D)</b> t-SNE maps of the data, z-scores on top, perturbation barcode maps on the bottom. <b>(A, B)</b> the entire dataset is shown with the tested compounds in dark blue. <b>(C,D)</b> The neighborhood of the query MAPK pathway inhibitor compounds (orange) is shown. Common MAPK tools used for nearest neighbor analysis are circled. <b>(E,F)</b> Results of AP-1 reporter assays. Known MAPK actives are distinguished from unknowns predicted to be active in (C,D). <b>(G,H)</b> Rather than selecting neighbors of seed MAPK tool compounds in the t-SNE map, nearest neighbors in the native datasets were selected and tested in the AP-1 reporter assay. Key as in (E,F). See Fig C in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005335#pcbi.1005335.s001" target="_blank">S1 Text</a> for breakdown by categories, including overlaps.</p

    Experimental setup and architecture of the deep model used.

    No full text
    <p><b>(A)</b> Cells treated with compounds in 384-well plates. <b>(B)</b> Cell lysate used for ligation mediated PCR with gene-specific probe pairs, and the gene expression measured using an optically addressed bead array technology. <b>(C)</b> Raw intensity is normalized and converted to relative expression changes versus control (z-scores) on a plate-wise basis. Variability is observed between biological replicates.</p

    Performance of perturbation barcodes on public LINCS data.

    No full text
    <p>Analyses correspond to Rows 1–3 of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005335#pcbi.1005335.t001" target="_blank">Table 1</a>.</p

    Is Multitask Deep Learning Practical for Pharma?

    No full text
    Multitask deep learning has emerged as a powerful tool for computational drug discovery. However, despite a number of preliminary studies, multitask deep networks have yet to be widely deployed in the pharmaceutical and biotech industries. This lack of acceptance stems from both software difficulties and lack of understanding of the robustness of multitask deep networks. Our work aims to resolve both of these barriers to adoption. We introduce a high-quality open-source implementation of multitask deep networks as part of the DeepChem open-source platform. Our implementation enables simple python scripts to construct, fit, and evaluate sophisticated deep models. We use our implementation to analyze the performance of multitask deep networks and related deep models on four collections of pharmaceutical data (three of which have not previously been analyzed in the literature). We split these data sets into train/valid/test using time and neighbor splits to test multitask deep learning performance under challenging conditions. Our results demonstrate that multitask deep networks are surprisingly robust and can offer strong improvement over random forests. Our analysis and open-source implementation in DeepChem provide an argument that multitask deep networks are ready for widespread use in commercial drug discovery

    Both the genomic and antigenomic RNA strands of the arbovirus RVFV generate vsiRNAs.

    No full text
    <p>(A) RNA species produced during RVFV infection. (−) strand genomic segments and mRNAs are depicted in blue, (+) strand antigenomes and mRNAs in red. (B) RVFV vsiRNA size distribution (control library). (C) Distribution of 21 nt RVFV vsiRNAs across the three viral genomic segments. vsiRNAs mapping to genomic strand are depicted in blue, antigenomic strand in red. (D) RVFV vsiRNA size distribution between libraries depleted of RNase III enzymes. (E) Effect of RNase III enzyme depletion on 21 nt RVFV vsiRNAs. vsiRNAs from control (black), Dcr-1 (orange), Dcr-2 (green) and Drosha (blue) depleted cells are compared. (F) RVFV vsiRNA size distribution between libraries depleted of Argonaute proteins. (G) Effect of Argonaute depletion on 21 nt RVFV vsiRNAs. vsiRNAs from control (black), Ago1 (orange), and Ago2 (green) depleted cells are compared. See also <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0055458#pone.0055458.s001" target="_blank">Figures S1</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0055458#pone.0055458.s002" target="_blank">S2</a>.</p

    VACV terminal repeat-derived vsiRNAs are derived from long, repeat-containing precursors.

    No full text
    <p>(A) RNA secondary structure prediction of one of sixty 70-mer repeats located at the genomic termini. The abundant repeat-associated VACV vsiRNA is mapped in red. See also <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0055458#pone.0055458.s004" target="_blank">Figure S4</a>. (B) Expression analysis of VACV terminal repeat-associated transcripts in <i>Drosophila</i> DL1 cells and mouse embryonic fibroblasts (MEFs) by RT-PCR. The forward primer (red) lies within the 70-mer repeat sequence, while the reverse primer (green) binds a unique sequence outside of the repetitive region. The banding pattern of PCR products reflects the amplification of variable numbers of 70-mer repeats, as depicted in the diagram. M = DNA ladder.</p

    DCV genomic strand RNA is preferentially targeted by antiviral RNAi.

    No full text
    <p>(A) RNA species produced during DCV infection. (+) strand genome is depicted in blue, (−) strand antigenome in red. (B) DCV vsiRNA size distribution (control library). (C) Distribution of 21 nt DCV-derived vsiRNAs across the viral genome. vsiRNAs mapping to genomic strand are depicted in blue, antigenomic strand in red. (D) DCV vsiRNA size distribution between libraries depleted of RNase III enzymes. (E) Effect of RNase III enzyme depletion on 21 nt DCV vsiRNAs. vsiRNAs from control (black), Dcr-1 (orange), Dcr-2 (green) and Drosha (blue) depleted cells are compared. (F) DCV vsiRNA size distribution between libraries depleted of Argonaute proteins. (G) Effect of Argonaute depletion on 21 nt DCV vsiRNAs. vsiRNAs from control (black), Ago1 (orange), and Ago2 (green) depleted cells are compared. See also <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0055458#pone.0055458.s001" target="_blank">Figures S1</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0055458#pone.0055458.s002" target="_blank">S2</a>.</p

    RNA transcripts produced by VACV are targeted by the <i>Drosophila</i> RNA silencing pathway.

    No full text
    <p>(A) The VACV genome is a dsDNA molecule with covalently closed identical termini. (B) VACV vsiRNA size distribution (control library). (C) Distribution of 21 nt VACV vsiRNAs across the viral genome. vsiRNAs mapping to the (+) strand are depicted in blue, (−) strand in red. Black arrows mark genomic termini. (D) VACV vsiRNA size distribution between libraries depleted of RNase III enzymes. (E) Effect of RNase III enzyme depletion on 21 nt VACV vsiRNAs. vsiRNAs from control (black), Dcr-1 (orange), Dcr-2 (green) and Drosha (blue) depleted cells are compared. (F) VACV vsiRNA size distribution between libraries depleted of Argonaute proteins. (G) Effect of Argonaute depletion on 21 nt VACV vsiRNAs. vsiRNAs from control (black), Ago1 (orange), and Ago2 (green) depleted cells are compared. See also <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0055458#pone.0055458.s001" target="_blank">Figures S1</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0055458#pone.0055458.s002" target="_blank">S2</a>.</p

    A putative hairpin within the RVFV S segment generates abundant vsiRNAs in <i>Drosophila</i> and mosquito cells.

    No full text
    <p>(A) RNA secondary structure prediction of S segment IGR. The highly abundant vsiRNAs are mapped in red. (B) Northern blot analysis of RVFV-infected <i>Drosophila</i> DL1 cells, <i>Aedes aegypti</i> Aag2 cells, and <i>Aedes albopictus</i> C6/36 cells, probed for the S segment stem loop vsiRNAs and tRNA<sup>val</sup> as a loading control.</p
    corecore