845 research outputs found

    Privacy-Friendly Collaboration for Cyber Threat Mitigation

    Full text link
    Sharing of security data across organizational boundaries has often been advocated as a promising way to enhance cyber threat mitigation. However, collaborative security faces a number of important challenges, including privacy, trust, and liability concerns with the potential disclosure of sensitive data. In this paper, we focus on data sharing for predictive blacklisting, i.e., forecasting attack sources based on past attack information. We propose a novel privacy-enhanced data sharing approach in which organizations estimate collaboration benefits without disclosing their datasets, organize into coalitions of allied organizations, and securely share data within these coalitions. We study how different partner selection strategies affect prediction accuracy by experimenting on a real-world dataset of 2 billion IP addresses and observe up to a 105% prediction improvement.Comment: This paper has been withdrawn as it has been superseded by arXiv:1502.0533

    The BioRef Infrastructure, a Framework for Real-Time, Federated, Privacy-Preserving, and Personalized Reference Intervals: Design, Development, and Application.

    Get PDF
    BACKGROUND Reference intervals (RIs) for patient test results are in standard use across many medical disciplines, allowing physicians to identify measurements indicating potentially pathological states with relative ease. The process of inferring cohort-specific RIs is, however, often ignored because of the high costs and cumbersome efforts associated with it. Sophisticated analysis tools are required to automatically infer relevant and locally specific RIs directly from routine laboratory data. These tools would effectively connect clinical laboratory databases to physicians and provide personalized target ranges for the respective cohort population. OBJECTIVE This study aims to describe the BioRef infrastructure, a multicentric governance and IT framework for the estimation and assessment of patient group-specific RIs from routine clinical laboratory data using an innovative decentralized data-sharing approach and a sophisticated, clinically oriented graphical user interface for data analysis. METHODS A common governance agreement and interoperability standards have been established, allowing the harmonization of multidimensional laboratory measurements from multiple clinical databases into a unified "big data" resource. International coding systems, such as the International Classification of Diseases, Tenth Revision (ICD-10); unique identifiers for medical devices from the Global Unique Device Identification Database; type identifiers from the Global Medical Device Nomenclature; and a universal transfer logic, such as the Resource Description Framework (RDF), are used to align the routine laboratory data of each data provider for use within the BioRef framework. With a decentralized data-sharing approach, the BioRef data can be evaluated by end users from each cohort site following a strict "no copy, no move" principle, that is, only data aggregates for the intercohort analysis of target ranges are exchanged. RESULTS The TI4Health distributed and secure analytics system was used to implement the proposed federated and privacy-preserving approach and comply with the limitations applied to sensitive patient data. Under the BioRef interoperability consensus, clinical partners enable the computation of RIs via the TI4Health graphical user interface for query without exposing the underlying raw data. The interface was developed for use by physicians and clinical laboratory specialists and allows intuitive and interactive data stratification by patient factors (age, sex, and personal medical history) as well as laboratory analysis determinants (device, analyzer, and test kit identifier). This consolidated effort enables the creation of extremely detailed and patient group-specific queries, allowing the generation of individualized, covariate-adjusted RIs on the fly. CONCLUSIONS With the BioRef-TI4Health infrastructure, a framework for clinical physicians and researchers to define precise RIs immediately in a convenient, privacy-preserving, and reproducible manner has been implemented, promoting a vital part of practicing precision medicine while streamlining compliance and avoiding transfers of raw patient data. This new approach can provide a crucial update on RIs and improve patient care for personalized medicine

    MVG Mechanism: Differential Privacy under Matrix-Valued Query

    Full text link
    Differential privacy mechanism design has traditionally been tailored for a scalar-valued query function. Although many mechanisms such as the Laplace and Gaussian mechanisms can be extended to a matrix-valued query function by adding i.i.d. noise to each element of the matrix, this method is often suboptimal as it forfeits an opportunity to exploit the structural characteristics typically associated with matrix analysis. To address this challenge, we propose a novel differential privacy mechanism called the Matrix-Variate Gaussian (MVG) mechanism, which adds a matrix-valued noise drawn from a matrix-variate Gaussian distribution, and we rigorously prove that the MVG mechanism preserves (ϵ,δ)(\epsilon,\delta)-differential privacy. Furthermore, we introduce the concept of directional noise made possible by the design of the MVG mechanism. Directional noise allows the impact of the noise on the utility of the matrix-valued query function to be moderated. Finally, we experimentally demonstrate the performance of our mechanism using three matrix-valued queries on three privacy-sensitive datasets. We find that the MVG mechanism notably outperforms four previous state-of-the-art approaches, and provides comparable utility to the non-private baseline.Comment: Appeared in CCS'1

    Scalable Privacy-Compliant Virality Prediction on Twitter

    Get PDF
    The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to predict the most influential tweets and authors, navigating the volume, velocity, and variety of social big data, with many compromises. In this paper, we revisit content popularity prediction on Twitter. We argue that strict alignment of data acquisition, storage and analysis algorithms is necessary to avoid the common trade-offs between scalability, accuracy and privacy compliance. We propose a new framework for the rapid acquisition of large-scale datasets, high accuracy supervisory signal and multilanguage sentiment prediction while respecting every privacy request applicable. We then apply a novel gradient boosting framework to achieve state-of-the-art results in virality ranking, already before including tweet's visual or propagation features. Our Gradient Boosted Regression Tree is the first to offer explainable, strong ranking performance on benchmark datasets. Since the analysis focused on features available early, the model is immediately applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective Content Analysi

    SoK: Chasing Accuracy and Privacy, and Catching Both in Differentially Private Histogram Publication

    Get PDF
    Histograms and synthetic data are of key importance in data analysis. However, researchers have shown that even aggregated data such as histograms, containing no obvious sensitive attributes, can result in privacy leakage. To enable data analysis, a strong notion of privacy is required to avoid risking unintended privacy violations.Such a strong notion of privacy is differential privacy, a statistical notion of privacy that makes privacy leakage quantifiable. The caveat regarding differential privacy is that while it has strong guarantees for privacy, privacy comes at a cost of accuracy. Despite this trade-off being a central and important issue in the adoption of differential privacy, there exists a gap in the literature regarding providing an understanding of the trade-off and how to address it appropriately. Through a systematic literature review (SLR), we investigate the state-of-the-art within accuracy improving differentially private algorithms for histogram and synthetic data publishing. Our contribution is two-fold: 1) we identify trends and connections in the contributions to the field of differential privacy for histograms and synthetic data and 2) we provide an understanding of the privacy/accuracy trade-off challenge by crystallizing different dimensions to accuracy improvement. Accordingly, we position and visualize the ideas in relation to each other and external work, and deconstruct each algorithm to examine the building blocks separately with the aim of pinpointing which dimension of accuracy improvement each technique/approach is targeting. Hence, this systematization of knowledge (SoK) provides an understanding of in which dimensions and how accuracy improvement can be pursued without sacrificing privacy

    Whole-brain radiomics for clustered federated personalization in brain tumor segmentation

    Full text link
    Federated learning and its application to medical image segmentation have recently become a popular research topic. This training paradigm suffers from statistical heterogeneity between participating institutions' local datasets, incurring convergence slowdown as well as potential accuracy loss compared to classical training. To mitigate this effect, federated personalization emerged as the federated optimization of one model per institution. We propose a novel personalization algorithm tailored to the feature shift induced by the usage of different scanners and acquisition parameters by different institutions. This method is the first to account for both inter and intra-institution feature shift (multiple scanners used in a single institution). It is based on the computation, within each centre, of a series of radiomic features capturing the global texture of each 3D image volume, followed by a clustering analysis pooling all feature vectors transferred from the local institutions to the central server. Each computed clustered decentralized dataset (potentially including data from different institutions) then serves to finetune a global model obtained through classical federated learning. We validate our approach on the Federated Brain Tumor Segmentation 2022 Challenge dataset (FeTS2022). Our code is available at (https://github.com/MatthisManthe/radiomics_CFFL).Comment: Accepted at Medical Imaging with Deep Learning (MiDL) 2023 conferenc

    Towards Private Medical Data Donations by Using Privacy Preserving Technologies

    Get PDF
    corecore