Search CORE

27,975 research outputs found

Balancing Privacy and Accuracy in IoT using Domain-Specific Features for Time Series Classification

Author: Lakhanpal Pranshul
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2023
Field of study

ε-Differential Privacy (DP) has been popularly used for anonymizing data to protect sensitive information and for machine learning (ML) tasks. However, there is a trade-off in balancing privacy and achieving ML accuracy since ε-DP reduces the model’s accuracy for classification tasks. Moreover, not many studies have applied DP to time series from sensors and Internet-of-Things (IoT) devices. In this work, we try to achieve the accuracy of ML models trained with ε-DP data to be as close to the ML models trained with non-anonymized data for two different physiological time series. We propose to transform time series into domain-specific 2D (image) representations such as scalograms, recurrence plots (RP), and their joint representation as inputs for training classifiers. The advantages of using these image representations render our proposed approach secure by preventing data leaks since these image transformations are irreversible. These images allow us to apply state-of-the-art image classifiers to obtain accuracy comparable to classifiers trained on non-anonymized data by ex- ploiting the additional information such as textured patterns from these images. In order to achieve classifier performance with anonymized data close to non-anonymized data, it is important to identify the value of ε and the input feature. Experimental results demonstrate that the performance of the ML models with scalograms and RP was comparable to ML models trained on their non-anonymized versions. Motivated by the promising results, an end-to-end IoT ML edge-cloud architecture capable of detecting input drifts is designed that employs our technique to train ML models on ε-DP physiological data. Our classification approach ensures the privacy of individuals while processing and analyzing the data at the edge securely and efficiently

DigitalCommons@CalPoly

Lacunar infarcts, depression and anxiety symptoms one year after stroke

Author: Adams
Ali
Appelros
Ayerbe
Ayerbe
Broomfield
Domenico Inzitari
Egeto
Francesco Arba
Graeme J. Hankey
Grau
Grool
Kennedy R. Lees
Kolominsky-Rabas
Lees
Myzoon Ali
Pohjasvaara
Provinciali
Santos
Schepers
Terence J. Quinn
White
Publication venue: 'Elsevier BV'
Publication date: 01/04/2016
Field of study

Background: Mood disorders are frequent after stroke and are associated with poorer quality of life. Previous studies have reported conflicting results as to stroke subtype in the incidence of poststroke mood disorders. We explored the relationship between subcortical ischemic stroke subtype (lacunar) and presence of such symptoms at 1 year after stroke. Methods: Anonymized data were accessed from the Virtual International Stroke Trials Archive. Stroke subtypes were classified according to the Trial of Org 10172 in Acute Stroke Treatment classification. Depression and anxiety symptoms were assessed using Hospital Anxiety and Depression Scale. We investigated independent predictors of depression and anxiety symptoms using a logistic regression model. Results: Data were available for 2160 patients. Almost one fifth of the patients developed both anxiety and depression at 1-year follow-up. After adjusting for confounders, the lacunar subtype was least associated with both anxiety (odds ratio [OR] = .61; 95% confidence interval [CI] = .46-.80) and depression symptoms (OR = .71; CI = .55-.93) versus other stroke subtypes. Conclusions: Lacunar strokes have a weaker association with presence of anxiety and depression symptoms compared with other subtypes

Crossref

Enlighten

ResearchOnline@GCU

An Automated Social Graph De-anonymization Technique

Author: Criminisi A.
Dwork C.
Ho T. K.
Narayanan A.
Publication venue
Publication date: 07/08/2014
Field of study

We present a generic and automated approach to re-identifying nodes in anonymized social networks which enables novel anonymization techniques to be quickly evaluated. It uses machine learning (decision forests) to matching pairs of nodes in disparate anonymized sub-graphs. The technique uncovers artefacts and invariants of any black-box anonymization scheme from a small set of examples. Despite a high degree of automation, classification succeeds with significant true positive rates even when small false positive rates are sought. Our evaluation uses publicly available real world datasets to study the performance of our approach against real-world anonymization strategies, namely the schemes used to protect datasets of The Data for Development (D4D) Challenge. We show that the technique is effective even when only small numbers of samples are used for training. Further, since it detects weaknesses in the black-box anonymization scheme it can re-identify nodes in one social network when trained on another.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Name Disambiguation from link data in a collaboration graph using temporal and topological features

Author: Hasan Mohammad Al
Saha Tanay Kumar
Zhang Baichuan
Publication venue
Publication date: 01/12/2015
Field of study

In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error leads to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and more importantly, improper attribution of credit (or blame). The task of entity disambiguation partitions the records belonging to multiple persons with the objective that each decomposed partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from link information obtained from a collaboration network. Our method is non-intrusive of privacy as it uses only the time-stamped graph topology of an anonymized network. Experimental results on two real-life academic collaboration networks show that the proposed method has satisfactory performance.Comment: The short version of this paper has been accepted to ASONAM 201

arXiv.org e-Print Archive

IUPUIScholarWorks

Privacy- and Utility-Preserving NLP with Anonymized Data: A case study of Pseudonymization

Author: Chernodub Artem
Raheja Vipul
Yermilov Oleksandr
Publication venue
Publication date: 08/06/2023
Field of study

This work investigates the effectiveness of different pseudonymization techniques, ranging from rule-based substitutions to using pre-trained Large Language Models (LLMs), on a variety of datasets and models used for two widely used NLP tasks: text classification and summarization. Our work provides crucial insights into the gaps between original and anonymized data (focusing on the pseudonymization technique) and model quality and fosters future research into higher-quality anonymization techniques to better balance the trade-offs between data protection and utility preservation. We make our code, pseudonymized datasets, and downstream models publicly availableComment: 10 pages. Accepted for TrustNLP workshop at ACL202

arXiv.org e-Print Archive

CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks

Author: Elezi Ismail
Leal-Taixé Laura
Maximov Maxim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/11/2020
Field of study

The unprecedented increase in the usage of computer vision technology in society goes hand in hand with an increased concern in data privacy. In many real-world scenarios like people tracking or action recognition, it is important to be able to process the data while taking careful consideration in protecting people's identity. We propose and develop CIAGAN, a model for image and video anonymization based on conditional generative adversarial networks. Our model is able to remove the identifying characteristics of faces and bodies while producing high-quality images and videos that can be used for any computer vision task, such as detection or tracking. Unlike previous methods, we have full control over the de-identification (anonymization) procedure, ensuring both anonymization as well as diversity. We compare our method to several baselines and achieve state-of-the-art results.Comment: CVPR 202

arXiv.org e-Print Archive

Crossref