163 research outputs found

    On the Differential Privacy of Bayesian Inference

    Get PDF
    We study how to communicate findings of Bayesian inference to third parties, while preserving the strong guarantee of differential privacy. Our main contributions are four different algorithms for private Bayesian inference on proba-bilistic graphical models. These include two mechanisms for adding noise to the Bayesian updates, either directly to the posterior parameters, or to their Fourier transform so as to preserve update consistency. We also utilise a recently introduced posterior sampling mechanism, for which we prove bounds for the specific but general case of discrete Bayesian networks; and we introduce a maximum-a-posteriori private mechanism. Our analysis includes utility and privacy bounds, with a novel focus on the influence of graph structure on privacy. Worked examples and experiments with Bayesian na{\"i}ve Bayes and Bayesian linear regression illustrate the application of our mechanisms.Comment: AAAI 2016, Feb 2016, Phoenix, Arizona, United State

    Addressing practical challenges of Bayesian optimisation

    Full text link
    This thesis focuses on addressing several challenges in applying Bayesian optimisation in real world problems. The contributions of this thesis are new Bayesian optimisation algorithms for three practical problems: finding stable solutions, optimising cascaded processes and privacy-aware optimisation

    Towards private and robust machine learning for information security

    Get PDF
    Many problems in information security are pattern recognition problems. For example, determining if a digital communication can be trusted amounts to certifying that the communication does not carry malicious or secret content, which can be distilled into the problem of recognising the difference between benign and malicious content. At a high level, machine learning is the study of how patterns are formed within data, and how learning these patterns generalises beyond the potentially limited data pool at a practitioner’s disposal, and so has become a powerful tool in information security. In this work, we study the benefits machine learning can bring to two problems in information security. Firstly, we show that machine learning can be used to detect which websites are visited by an internet user over an encrypted connection. By analysing timing and packet size information of encrypted network traffic, we train a machine learning model that predicts the target website given a stream of encrypted network traffic, even if browsing is performed over an anonymous communication network. Secondly, in addition to studying how machine learning can be used to design attacks, we study how it can be used to solve the problem of hiding information within a cover medium, such as an image or an audio recording, which is commonly referred to as steganography. How well an algorithm can hide information within a cover medium amounts to how well the algorithm models and exploits areas of redundancy. This can again be reduced to a pattern recognition problem, and so we apply machine learning to design a steganographic algorithm that efficiently hides a secret message with an image. Following this, we proceed with discussions surrounding why machine learning is not a panacea for information security, and can be an attack vector in and of itself. We show that machine learning can leak private and sensitive information about the data it used to learn, and how malicious actors can exploit vulnerabilities in these learning algorithms to compel them to exhibit adversarial behaviours. Finally, we examine the problem of the disconnect between image recognition systems learned by humans and by machine learning models. While human classification of an image is relatively robust to noise, machine learning models do not possess this property. We show how an attacker can cause targeted misclassifications against an entire data distribution by exploiting this property, and go onto introduce a mitigation that ameliorates this undesirable trait of machine learning

    Novel Approaches to Preserving Utility in Privacy Enhancing Technologies

    Get PDF
    Significant amount of individual information are being collected and analyzed today through a wide variety of applications across different industries. While pursuing better utility by discovering knowledge from the data, an individual’s privacy may be compromised during an analysis: corporate networks monitor their online behavior, advertising companies collect and share their private information, and cybercriminals cause financial damages through security breaches. To this end, the data typically goes under certain anonymization techniques, e.g., CryptoPAn [Computer Networks’04], which replaces real IP addresses with prefix-preserving pseudonyms, or Differentially Private (DP) [ICALP’06] techniques which modify the answer to a query by adding a zero-mean noise distributed according to, e.g., a Laplace distribution. Unfortunately, most such techniques either are vulnerable to adversaries with prior knowledge, e.g., some network flows in the data, or require heavy data sanitization or perturbation, both of which may result in a significant loss of data utility. Therefore, the fundamental trade-off between privacy and utility (i.e., analysis accuracy) has attracted significant attention in various settings [ICALP’06, ACM CCS’14]. In line with this track of research, in this dissertation we aim to build utility-maximized and privacy-preserving tools for Internet communications. Such tools can be employed not only by dissidents and whistleblowers, but also by ordinary Internet users on a daily basis. To this end, we combine the development of practical systems with rigorous theoretical analysis, and incorporate techniques from various disciplines such as computer networking, cryptography, and statistical analysis. During the research, we proposed three different frameworks in some well-known settings outlined in the following. First, we propose The Multi-view Approach to preserve both privacy and utility in network trace anonymization, Second, The R2DP Approach which is a novel technique on differentially private mechanism design with maximized utility, and Third, The DPOD Approach that is a novel framework on privacy preserving Anomaly detection in the outsourcing setting

    Identifying Novel Targetable Genes and Pathways in Cancer by Integrating Diverse Omics Data.

    Full text link
    Omics technologies for high-throughput profiling of human genome, transcriptome and proteome are revolutionizing cancer research and driving a paradigm shift in clinical care, from “one size” fits all treatments to molecularly informed therapies. The success of this new precision medicine paradigm will depend on our ability to combine diverse omics-based measurements to distill clinically relevant information that can be acted upon. This thesis developed bioinformatics approaches to integrate multi-omics datasets and applied these approaches in three distinct studies that identified novel actionable genes and pathways in cancer. In the first study, we aim at finding alternative targetable proteins in non-small cell lung cancers (NSCLC) with activating mutations in KRAS, a well-know but undruggable oncogene, by profiling their transcriptome, proteome and phosphoproteome. By reconstructing targetable networks associated with KRAS dependency, we nominate lymphocyte-specific protein tyrosine kinase (LCK) as a critical gene for cell proliferation in these samples, suggesting LCK as a novel druggable protein in KRAS-dependent NSCLC. In the second study, we aim at identifying oncogenic gene fusions in NSCLC patients of unknown driver gene. By characterizing the highly heterogeneous fusion’s landscape in NSCLC, we show that gene fusions incidence is an independent prognostic factor for poor outcome and discover novel Neurorregulin 1 (NRG1) fusions present exclusively in patients of unknown driver; resembling previously reported kinase fusions. This warrants further studies of the therapeutic opportunities for patients with NRG1 rearrangements. Finally in the third study, we aim at characterizing cancer-related genes that overlap and could be regulated by natural antisense transcripts. By determining the extent of antisense gene expression across human cancers and comparing with well-documented sense-antisense pairs, our results raise the possibility that antisense transcripts could modulate the expression of well-known tumor suppressors and oncogenes. This study provides a resource, oncoNATdb, a catalogue of cancer related genes with significant antisense transcription, which will enable researchers to investigate the mechanisms of sense-antisense regulation and their role in cancer. We anticipate that the computational methods developed and the results found in this thesis would assist others with similar tasks and inspire further studies of the therapeutic opportunities provided by these novel targets.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/107215/1/oabalbin_1.pd

    Bioinspired Materials Design: A Text Mining Approach to Determining Design Principles of Biological Materials

    Get PDF
    Biological materials are often more efficient and tend to have a wider range and combination of properties than present-day engineered materials. Despite the limited set of components, biological materials are able to achieve great diversity in their material properties by the arrangements of the material components, which form unique structures. The structure-property relationships are known as structural design principles. With the utilization of these design principles, materials designers can develop bioinspired engineered materials with similarly improved effectiveness. While considerable research has been conducted on biological materials, identifying beneficial structural design principles can be time-intensive. To aid materials designers, the research in this dissertation focuses on the development of a text mining algorithm that can quickly identify potential structural design principles of biological materials with respect to a chosen material property or combination of properties. The development of the text mining tool involves four separate stages. The first stage centers on the creation of a basic information retrieval algorithm to extract passages describing property-specific structural design principles from a corpus of materials journal articles. Although the Stage 1 tool identifies over 90% of the principles (recall), only 32% of the returned passages are relevant (precision). The second stage investigates text classification techniques to refine the program in order to improve precision. The classic techniques of machine learning classifiers, statistical features, and part-of-speech analyses, are evaluated for effectiveness in sorting passages into relevant and irrelevant classes. In the third stage, manual identification of patterns in the returned passages is employed to create a rule-based method. The resulting Stage 3 algorithm’s precision values increase to 45%. In the final stage of algorithm development, the manual rule-based classification method is revisited to identify stricter rules to further emphasize precision. The Stage 4 algorithm successfully improves overall precision to 65% and reduces the number of returned passages by 74%, which allows a materials designer to more quickly identify useful principles. Finally, the research concludes with a validation that the text mining tool effectively identifies structural design principles and that the principles can be used in the development of bioinspired materials
    • …
    corecore