27,975 research outputs found

    Balancing Privacy and Accuracy in IoT using Domain-Specific Features for Time Series Classification

    Get PDF
    ε-Differential Privacy (DP) has been popularly used for anonymizing data to protect sensitive information and for machine learning (ML) tasks. However, there is a trade-off in balancing privacy and achieving ML accuracy since ε-DP reduces the model’s accuracy for classification tasks. Moreover, not many studies have applied DP to time series from sensors and Internet-of-Things (IoT) devices. In this work, we try to achieve the accuracy of ML models trained with ε-DP data to be as close to the ML models trained with non-anonymized data for two different physiological time series. We propose to transform time series into domain-specific 2D (image) representations such as scalograms, recurrence plots (RP), and their joint representation as inputs for training classifiers. The advantages of using these image representations render our proposed approach secure by preventing data leaks since these image transformations are irreversible. These images allow us to apply state-of-the-art image classifiers to obtain accuracy comparable to classifiers trained on non-anonymized data by ex- ploiting the additional information such as textured patterns from these images. In order to achieve classifier performance with anonymized data close to non-anonymized data, it is important to identify the value of ε and the input feature. Experimental results demonstrate that the performance of the ML models with scalograms and RP was comparable to ML models trained on their non-anonymized versions. Motivated by the promising results, an end-to-end IoT ML edge-cloud architecture capable of detecting input drifts is designed that employs our technique to train ML models on ε-DP physiological data. Our classification approach ensures the privacy of individuals while processing and analyzing the data at the edge securely and efficiently

    Lacunar infarcts, depression and anxiety symptoms one year after stroke

    Get PDF
    Background: Mood disorders are frequent after stroke and are associated with poorer quality of life. Previous studies have reported conflicting results as to stroke subtype in the incidence of poststroke mood disorders. We explored the relationship between subcortical ischemic stroke subtype (lacunar) and presence of such symptoms at 1 year after stroke. Methods: Anonymized data were accessed from the Virtual International Stroke Trials Archive. Stroke subtypes were classified according to the Trial of Org 10172 in Acute Stroke Treatment classification. Depression and anxiety symptoms were assessed using Hospital Anxiety and Depression Scale. We investigated independent predictors of depression and anxiety symptoms using a logistic regression model. Results: Data were available for 2160 patients. Almost one fifth of the patients developed both anxiety and depression at 1-year follow-up. After adjusting for confounders, the lacunar subtype was least associated with both anxiety (odds ratio [OR] = .61; 95% confidence interval [CI] = .46-.80) and depression symptoms (OR = .71; CI = .55-.93) versus other stroke subtypes. Conclusions: Lacunar strokes have a weaker association with presence of anxiety and depression symptoms compared with other subtypes

    An Automated Social Graph De-anonymization Technique

    Full text link
    We present a generic and automated approach to re-identifying nodes in anonymized social networks which enables novel anonymization techniques to be quickly evaluated. It uses machine learning (decision forests) to matching pairs of nodes in disparate anonymized sub-graphs. The technique uncovers artefacts and invariants of any black-box anonymization scheme from a small set of examples. Despite a high degree of automation, classification succeeds with significant true positive rates even when small false positive rates are sought. Our evaluation uses publicly available real world datasets to study the performance of our approach against real-world anonymization strategies, namely the schemes used to protect datasets of The Data for Development (D4D) Challenge. We show that the technique is effective even when only small numbers of samples are used for training. Further, since it detects weaknesses in the black-box anonymization scheme it can re-identify nodes in one social network when trained on another.Comment: 12 page

    Name Disambiguation from link data in a collaboration graph using temporal and topological features

    Get PDF
    In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error leads to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and more importantly, improper attribution of credit (or blame). The task of entity disambiguation partitions the records belonging to multiple persons with the objective that each decomposed partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from link information obtained from a collaboration network. Our method is non-intrusive of privacy as it uses only the time-stamped graph topology of an anonymized network. Experimental results on two real-life academic collaboration networks show that the proposed method has satisfactory performance.Comment: The short version of this paper has been accepted to ASONAM 201

    Privacy- and Utility-Preserving NLP with Anonymized Data: A case study of Pseudonymization

    Full text link
    This work investigates the effectiveness of different pseudonymization techniques, ranging from rule-based substitutions to using pre-trained Large Language Models (LLMs), on a variety of datasets and models used for two widely used NLP tasks: text classification and summarization. Our work provides crucial insights into the gaps between original and anonymized data (focusing on the pseudonymization technique) and model quality and fosters future research into higher-quality anonymization techniques to better balance the trade-offs between data protection and utility preservation. We make our code, pseudonymized datasets, and downstream models publicly availableComment: 10 pages. Accepted for TrustNLP workshop at ACL202

    CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks

    Full text link
    The unprecedented increase in the usage of computer vision technology in society goes hand in hand with an increased concern in data privacy. In many real-world scenarios like people tracking or action recognition, it is important to be able to process the data while taking careful consideration in protecting people's identity. We propose and develop CIAGAN, a model for image and video anonymization based on conditional generative adversarial networks. Our model is able to remove the identifying characteristics of faces and bodies while producing high-quality images and videos that can be used for any computer vision task, such as detection or tracking. Unlike previous methods, we have full control over the de-identification (anonymization) procedure, ensuring both anonymization as well as diversity. We compare our method to several baselines and achieve state-of-the-art results.Comment: CVPR 202
    • …
    corecore