Search CORE

8,061 research outputs found

Local Differential Privacy In Smart Manufacturing: Application Scenario, Mechanisms and Tools

Author: Gärtner Sascha
Herberger David
Hübner Marco
Oberle Michael
Publication venue: Hannover : publish-Ing.
Publication date: 01/01/2022
Field of study

To utilize the potential of machine learning and deep learning, enormous amounts of data are required. To find the optimal solution, it is beneficial to share and publish data sets. Due to privacy leaks in publically released datasets and the exposure of sensitive information of individuals by attackers, the research field of differential privacy addresses solutions to avoid this in the future. Compared to other domains, the application of differential privacy in the manufacturing context is very challenging. Manufacturing data contains sensitive information about the companies and their process knowledge, products, and orders. Furthermore, data of individuals operating machines could be exposed and thus their performance evaluated. This paper describes scenarios of how differential privacy can be used in the manufacturing context. In particular, the potential threats that arise when sharing manufacturing data are addressed. This is described by identifying different manufacturing parameters and their variable types. Simplified examples show how the differentially private mechanisms can be applied to binary, numeric, categorical variables, and time series. Finally, libraries are presented which enable the productive use of differential privacy

Institutionelles Repositorium der Leibniz Universität Hannover

Going Beyond Obscurity: Organizational Approaches to Data Anonymization

Author: Hargitai Viktor
Shklovski Irina
Wasowski Andrzej
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

The IT University of Copenhagen's Repository

Balancing Data Protection and Model Accuracy : An Investigation of Protection Methods on Machine Learning Model Performance for a Bank Marketing Dataset

Author: Du Jiahuan
Guruprasad Shravya
Publication venue
Publication date: 01/01/2023
Field of study

The practice of sharing customer data among companies for marketing purposes is becoming increasingly common. However, sharing customer-level data poses potential risks and serious problems for businesses, such as substantial declines in brand value, erosion of customer trust, loss of competitive advantage, and the imposition of legal penalties (Schneider et al. 2017). These may eventually lead to financial loss and reputation damage for the companies. With the growing awareness of the value of personal information, more companies and customers are concerned about protecting data privacy. In this paper, we used marketing data from a Portuguese bank to explore methods for balancing prediction accuracy and customer data privacy using various machine learning and data privacy techniques. The dataset includes observations from 45211 respondents and the observation period is from May 2008 to November 2010. Our goal is to find a method that enables third parties to share data with the bank while safeguarding customer privacy and maintaining accuracy in predicting customer behaviour. We tested several machine learning models: Logistic Regression, Random Forest, and Neural Network (feedforward) on original data and then chose Random Forest, which gave the best prediction performance, as the model to proceed to explore. After using two different data privacy methods (Sampling and Random Noise) on the original data, we found the Random Forest model gives us accuracy levels that are very close to the accuracy before using the privacy methods. By doing this, we demonstrated a method for companies to protect customer data privacy without sacrificing predictive accuracy. The results of this study will have significant implications for companies that seek to share customer data while maintaining high levels of privacy and accuracy.nhhma

NHH Brage

Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach

Author: Fernández-Loría Carlos
Han Xintian
Provost Foster
Publication venue
Publication date: 13/10/2021
Field of study

We examine counterfactual explanations for explaining the decisions made by model-based AI systems. The counterfactual approach we consider defines an explanation as a set of the system's data inputs that causally drives the decision (i.e., changing the inputs in the set changes the decision) and is irreducible (i.e., changing any subset of the inputs does not change the decision). We (1) demonstrate how this framework may be used to provide explanations for decisions made by general, data-driven AI systems that may incorporate features with arbitrary data types and multiple predictive models, and (2) propose a heuristic procedure to find the most useful explanations depending on the context. We then contrast counterfactual explanations with methods that explain model predictions by weighting features according to their importance (e.g., SHAP, LIME) and present two fundamental reasons why we should carefully consider whether importance-weight explanations are well-suited to explain system decisions. Specifically, we show that (i) features that have a large importance weight for a model prediction may not affect the corresponding decision, and (ii) importance weights are insufficient to communicate whether and how features influence decisions. We demonstrate this with several concise examples and three detailed case studies that compare the counterfactual approach with SHAP to illustrate various conditions under which counterfactual explanations explain data-driven decisions better than importance weights

arXiv.org e-Print Archive

AIS Electronic Library (AISeL)

Interviewer Effects on Nonresponse

Author: Annelies G. Blom
Edith D. de Leeuw
Joop J. Hox
Publication venue
Publication date
Field of study

In face-to-face surveys interviewers play a crucial role in making contact with and gaining cooperation from sample units. While some analyses investigate the influence of interviewers on nonresponse, they are typically restricted to single-country studies. However, interviewer training, contacting and cooperation strategies as well as survey climates may differ across countries. Combining call-record data from the European Social Survey (ESS) with data from a detailed interviewer questionnaire on attitudes and doorstep behavior we find systematic country differences in nonresponse processes, which can in part be explained by differences in interviewer characteristics, such as contacting strategies and avowed doorstep behavior.

Research Papers in Economics

Privacy Measurement in Tabular Synthetic Data: State of the Art and Future Research Directions

Author: Boudewijn Alexander
Chauvenet Carlo Rossi
Cocca Vanessa
De Schepper Karel
Ferraris Andrea Filippo
Panfilo Daniele
Zinutti Sabrina
Publication venue
Publication date: 29/11/2023
Field of study

Synthetic data (SD) have garnered attention as a privacy enhancing technology. Unfortunately, there is no standard for quantifying their degree of privacy protection. In this paper, we discuss proposed quantification approaches. This contributes to the development of SD privacy standards; stimulates multi-disciplinary discussion; and helps SD researchers make informed modeling and evaluation decisions.Comment: 20 pages, 4 tables, 8 figures; NeurIPS 2023 Workshop on Synthetic Data Generation with Generative A

arXiv.org e-Print Archive

Application Of Blockchain Technology And Integration Of Differential Privacy: Issues In E-Health Domains

Author: Isie David
Publication venue: UND Scholarly Commons
Publication date: 01/01/2023
Field of study

A systematic and comprehensive review of critical applications of Blockchain Technology with Differential Privacy integration lies within privacy and security enhancement. This paper aims to highlight the research issues in the e-Health domain (e.g., EMR) and to review the current research directions in Differential Privacy integration with Blockchain Technology.Firstly, the current state of concerns in the e-Health domain are identified as follows: (a) healthcare information poses a high level of security and privacy concerns due to its sensitivity; (b) due to vulnerabilities surrounding the healthcare system, a data breach is common and poses a risk for attack by an adversary; and (c) the current privacy and security apparatus needs further fortification. Secondly, Blockchain Technology (BT) is one of the approaches to address these privacy and security issues. The alternative solution is the integration of Differential Privacy (DP) with Blockchain Technology. Thirdly, collections of scientific journals and research papers, published between 2015 and 2022, from IEEE, Science Direct, Google Scholar, ACM, and PubMed on the e-Health domain approach are summarized in terms of security and privacy. The methodology uses a systematic mapping study (SMS) to identify and select relevant research papers and academic journals regarding DP and BT. With this understanding of the current privacy issues in EMR, this paper focuses on three categories: (a) e-Health Record Privacy, (b) Real-Time Health Data, and (c) Health Survey Data Protection. In this study, evidence exists to identify inherent issues and technical challenges associated with the integration of Differential Privacy and Blockchain Technology

UND Scholarly Commons (University of North Dakota)