4,888 research outputs found

    Representation transfer for differentially private drug sensitivity prediction

    Get PDF
    Motivation Human genomic datasets often contain sensitive information that limits use and sharing of the data. In particular, simple anonymization strategies fail to provide sufficient level of protection for genomic data, because the data are inherently identifiable. Differentially private machine learning can help by guaranteeing that the published results do not leak too much information about any individual data point. Recent research has reached promising results on differentially private drug sensitivity prediction using gene expression data. Differentially private learning with genomic data is challenging because it is more difficult to guarantee privacy in high dimensions. Dimensionality reduction can help, but if the dimension reduction mapping is learned from the data, then it needs to be differentially private too, which can carry a significant privacy cost. Furthermore, the selection of any hyperparameters (such as the target dimensionality) needs to also avoid leaking private information. Results We study an approach that uses a large public dataset of similar type to learn a compact representation for differentially private learning. We compare three representation learning methods: variational autoencoders, principal component analysis and random projection. We solve two machine learning tasks on gene expression of cancer cell lines: cancer type classification, and drug sensitivity prediction. The experiments demonstrate significant benefit from all representation learning methods with variational autoencoders providing the most accurate predictions most often. Our results significantly improve over previous state-of-the-art in accuracy of differentially private drug sensitivity prediction. Availability and implementation Code used in the experiments is available at https://github.com/DPBayes/dp-representation-transfer.Peer reviewe

    Identification of a selective G1-phase benzimidazolone inhibitor by a senescence-targeted virtual screen using artificial neural networks

    Get PDF
    Cellular senescence is a barrier to tumorigenesis in normal cells and tumour cells undergo senescence responses to genotoxic stimuli, which is a potential target phenotype for cancer therapy. However, in this setting, mixed-mode responses are common with apoptosis the dominant effect. Hence, more selective senescence inducers are required. Here we report a machine learning-based in silico screen to identify potential senescence agonists. We built profiles of differentially affected biological process networks from expression data obtained under induced telomere dysfunction conditions in colorectal cancer cells and matched these to a panel of 17 protein targets with confirmatory screening data in PubChem. We trained a neural network using 3517 compounds identified as active or inactive against these targets. The resulting classification model was used to screen a virtual library of ~2M lead-like compounds. 147 virtual hits were acquired for validation in growth inhibition and senescence-associated β-galactosidase (SA-β-gal) assays. Among the found hits a benzimidazolone compound, CB-20903630, had low micromolar IC50 for growth inhibition of HCT116 cells and selectively induced SA-β-gal activity in the entire treated cell population without cytotoxicity or apoptosis induction. Growth suppression was mediated by G1 blockade involving increased p21 expression and suppressed cyclin B1, CDK1 and CDC25C. Additionally, the compound inhibited growth of multicellular spheroids and caused severe retardation of population kinetics in long term treatments. Preliminary structure-activity and structure clustering analyses are reported and expression analysis of CB-20903630 against other cell cycle suppressor compounds suggested a PI3K/AKT-inhibitor-like profile in normal cells, with different pathways affected in cancer cells

    Genomic introgression mapping of field-derived multiple-anthelmintic resistance in Teladorsagia circumcincta

    Get PDF
    Preventive chemotherapy has long been practiced against nematode parasites of livestock, leading to widespread drug resistance, and is increasingly being adopted for eradication of human parasitic nematodes even though it is similarly likely to lead to drug resistance. Given that the genetic architecture of resistance is poorly understood for any nematode, we have analyzed multidrug resistant Teladorsagia circumcincta, a major parasite of sheep, as a model for analysis of resistance selection. We introgressed a field-derived multiresistant genotype into a partially inbred susceptible genetic background (through repeated backcrossing and drug selection) and performed genome-wide scans in the backcross progeny and drug-selected F2 populations to identify the major genes responsible for the multidrug resistance. We identified variation linking candidate resistance genes to each drug class. Putative mechanisms included target site polymorphism, changes in likely regulatory regions and copy number variation in efflux transporters. This work elucidates the genetic architecture of multiple anthelmintic resistance in a parasitic nematode for the first time and establishes a framework for future studies of anthelmintic resistance in nematode parasites of humans

    ProGAP: Progressive Graph Neural Networks with Differential Privacy Guarantees

    Full text link
    Graph Neural Networks (GNNs) have become a popular tool for learning on graphs, but their widespread use raises privacy concerns as graph data can contain personal or sensitive information. Differentially private GNN models have been recently proposed to preserve privacy while still allowing for effective learning over graph-structured datasets. However, achieving an ideal balance between accuracy and privacy in GNNs remains challenging due to the intrinsic structural connectivity of graphs. In this paper, we propose a new differentially private GNN called ProGAP that uses a progressive training scheme to improve such accuracy-privacy trade-offs. Combined with the aggregation perturbation technique to ensure differential privacy, ProGAP splits a GNN into a sequence of overlapping submodels that are trained progressively, expanding from the first submodel to the complete model. Specifically, each submodel is trained over the privately aggregated node embeddings learned and cached by the previous submodels, leading to an increased expressive power compared to previous approaches while limiting the incurred privacy costs. We formally prove that ProGAP ensures edge-level and node-level privacy guarantees for both training and inference stages, and evaluate its performance on benchmark graph datasets. Experimental results demonstrate that ProGAP can achieve up to 5%-10% higher accuracy than existing state-of-the-art differentially private GNNs

    A new landscape of host–protozoa interactions involving the extracellular vesicles world

    Get PDF
    This version is free to view and download for private research and study only. Not for re-distribution, re-sale or use in derivative works. © Cambridge University Press 2018Extracellular vesicles (EVs) are released by a wide number of cells including blood cells, immune system cells, tumour cells, adult and embryonic stem cells. EVs are a heterogeneous group of vesicles (~30–1000 nm) including microvesicles and exosomes. The physiological release of EVs represents a normal state of the cell, raising a metabolic equilibrium between catabolic and anabolic processes. Moreover, when the cells are submitted to stress with different inducers or in pathological situations (malignancies, chronic diseases, infectious diseases.), they respond with an intense and dynamic release of EVs. The EVs released from stimulated cells vs those that are released constitutively may themselves differ, both physically and in their cargo. EVs contain protein, lipids, nucleic acids and biomolecules that can alter cell phenotypes or modulate neighbouring cells. In this review, we have summarized findings involving EVs in certain protozoan diseases. We have commented on strategies to study the communicative roles of EVs during host–pathogen interaction and hypothesized on the use of EVs for diagnostic, preventative and therapeutic purposes in infectious diseases. This kind of communication could modulate the innate immune system and reformulate concepts in parasitism. Moreover, the information provided within EVs could produce alternatives in translational medicine.Peer reviewedFinal Accepted Versio

    Trustworthy machine learning through the lens of privacy and security

    Get PDF
    Nowadays, machine learning (ML) becomes ubiquitous and it is transforming society. However, there are still many incidents caused by ML-based systems when ML is deployed in real-world scenarios. Therefore, to allow wide adoption of ML in the real world, especially in critical applications such as healthcare, finance, etc., it is crucial to develop ML models that are not only accurate but also trustworthy (e.g., explainable, privacy-preserving, secure, and robust). Achieving trustworthy ML with different machine learning paradigms (e.g., deep learning, centralized learning, federated learning, etc.), and application domains (e.g., computer vision, natural language, human study, malware systems, etc.) is challenging, given the complicated trade-off among utility, scalability, privacy, explainability, and security. To bring trustworthy ML to real-world adoption with the trust of communities, this study makes a contribution of introducing a series of novel privacy-preserving mechanisms in which the trade-off between model utility and trustworthiness is optimized in different application domains, including natural language models, federated learning with human and mobile sensing applications, image classification, and explainable AI. The proposed mechanisms reach deployment levels of commercialized systems in real-world trials while providing trustworthiness with marginal utility drops and rigorous theoretical guarantees. The developed solutions enable safe, efficient, and practical analyses of rich and diverse user-generated data in many application domains

    Bioinformatics and Machine Learning for Cancer Biology

    Get PDF
    Cancer is a leading cause of death worldwide, claiming millions of lives each year. Cancer biology is an essential research field to understand how cancer develops, evolves, and responds to therapy. By taking advantage of a series of “omics” technologies (e.g., genomics, transcriptomics, and epigenomics), computational methods in bioinformatics and machine learning can help scientists and researchers to decipher the complexity of cancer heterogeneity, tumorigenesis, and anticancer drug discovery. Particularly, bioinformatics enables the systematic interrogation and analysis of cancer from various perspectives, including genetics, epigenetics, signaling networks, cellular behavior, clinical manifestation, and epidemiology. Moreover, thanks to the influx of next-generation sequencing (NGS) data in the postgenomic era and multiple landmark cancer-focused projects, such as The Cancer Genome Atlas (TCGA) and Clinical Proteomic Tumor Analysis Consortium (CPTAC), machine learning has a uniquely advantageous role in boosting data-driven cancer research and unraveling novel methods for the prognosis, prediction, and treatment of cancer
    • …
    corecore