Search CORE

150 research outputs found

Enforcing privacy via access control and data perturbation.

Author: Zhong J
Publication venue: RMIT University
Publication date: 01/01/2013
Field of study

With the increasing availability of large collections of personal and sensitive information to a wide range of user communities, services should take more responsibility for data privacy when disseminating information, which requires data sharing control. In most cases, data are stored in a repository at the site of the domain server, which takes full responsibility for their management. The data can be provided to known recipients, or published without restriction on recipients. To ensure that such data is used without breaching privacy, proper access control models and privacy protection methods are needed. This thesis presents an approach to protect personal and sensitive information that is stored on one or more data servers. There are three main privacy requirements that need to be considered when designing a system for privacy-preserving data access. The first requirement is privacy-aware access control. In traditional privacy-aware contexts, built-in conditions or granular access control are used to assign user privileges at a fine-grained level. Very frequently, users and their privileges are diverse. Hence, it is necessary to deploy proper access control on both subject and object servers that impose the conditions on carrying out user operations. This thesis defines a dual privacy-aware access control model, consisting of a subject server that manages user privileges and an object server that deals with granular data. Both servers extract user operations and server conditions from the original requests and convert them to privacy labels that contain access control attributes. In cross-domain cases, traditional solutions adopt roaming tables to support multiple-domain access. However, building roaming tables for all domains is costly and maintaining these tables can become an issue. Furthermore, when roaming occurs, the party responsible for multi-domain data management has to be clearly identified. In this thesis, a roaming adjustment mechanism is presented for both subject and object servers. By defining such a dual server control model and request process flow, the responsibility for data administration can be properly managed. The second requirement is the consideration of access purpose, namely why the subject requests access to the object and how the subject is going to use the object. The existing solutions overlook the different interpretations of purposes in distinct domains. This thesis proposes a privilege-oriented, purpose-based method that enhances the privacy-aware access control model mentioned in the previous paragraph. It includes a component that interprets the subject's intention and the conditions imposed by the servers on operations; and a component that caters for object types and object owner's intention. The third requirement is maintaining data utility while protecting privacy when data are shared without restriction on recipients. Most existing approaches achieve a high level of privacy at the expense of data usability. To the best of our knowledge, there is no solution that is able to keep both. This thesis combines data privacy protection with data utility by building a framework that defines a privacy protection process flow. It also includes two data privacy protection algorithms that are based on Chebyshev polynomials and fractal sequences, respectively. Experiments show that the both algorithms are resistant to two main data privacy attacks, but with little loss of accuracy

RMIT Research Repository

Privacy Preservation Intrusion Detection Technique for SCADA Systems

Author: Creech Gideon
Keshk Marwa
Moustafa Nour
Sitnikova Elena
Publication venue
Publication date: 07/11/2017
Field of study

Supervisory Control and Data Acquisition (SCADA) systems face the absence of a protection technique that can beat different types of intrusions and protect the data from disclosure while handling this data using other applications, specifically Intrusion Detection System (IDS). The SCADA system can manage the critical infrastructure of industrial control environments. Protecting sensitive information is a difficult task to achieve in reality with the connection of physical and digital systems. Hence, privacy preservation techniques have become effective in order to protect sensitive/private information and to detect malicious activities, but they are not accurate in terms of error detection, sensitivity percentage of data disclosure. In this paper, we propose a new Privacy Preservation Intrusion Detection (PPID) technique based on the correlation coefficient and Expectation Maximisation (EM) clustering mechanisms for selecting important portions of data and recognizing intrusive events. This technique is evaluated on the power system datasets for multiclass attacks to measure its reliability for detecting suspicious activities. The experimental results outperform three techniques in the above terms, showing the efficiency and effectiveness of the proposed technique to be utilized for current SCADA systems

arXiv.org e-Print Archive

Crossref

THE INTERPLAY BETWEEN PRIVACY AND FAIRNESS IN LEARNING AND DECISION MAKING PROBLEMS

Author: Tran Cuong
Publication venue: SURFACE at Syracuse University
Publication date: 04/08/2023
Field of study

The availability of large datasets and computational resources has driven significant progress in Artificial Intelligence (AI) and, especially,Machine Learning (ML). These advances have rendered AI systems instrumental for many decision making and policy operations involving individuals: they include assistance in legal decisions, lending, and hiring, as well determinations of resources and benefits, all of which have profound social and economic impacts. While data-driven systems have been successful in an increasing number of tasks, the use of rich datasets, combined with the adoption of black-box algorithms, has sparked concerns about how these systems operate. How much information these systems leak about the individuals whose data is used as input and how they handle biases and fairness issues are two of these critical concerns. While some people argue that privacy and fairness are in alignment, the majority instead believe these are two contrasting metrics. This thesis firstly studies the interaction between privacy and fairness in machine learning and decision problems. It focuses on the scenario when fairness and privacy are at odds and investigates different factors that can explain for such behaviors. It then proposes effective and efficient mitigation solutions to improve fairness under privacy constraints. In the second part, it analyzes the connection between fairness and other machine learning concepts such as model compression and adversarial robustness. Finally, it introduces a novel privacy concept and an initial implementation to protect such proposed users privacy at inference time

Syracuse University Research Facility and Collaborative Environment

인공지능 보안

Author: 배호
Publication venue: 서울대학교 대학원
Publication date: 01/02/2021
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 자연과학대학 협동과정 생물정보학전공, 2021. 2. 윤성로.With the development of machine learning (ML), expectations for artificial intelligence (AI) technologies have increased daily. In particular, deep neural networks have demonstrated outstanding performance in many fields. However, if a deep-learning (DL) model causes mispredictions or misclassifications, it can cause difficulty, owing to malicious external influences. This dissertation discusses DL security and privacy issues and proposes methodologies for security and privacy attacks. First, we reviewed security attacks and defenses from two aspects. Evasion attacks use adversarial examples to disrupt the classification process, and poisoning attacks compromise training by compromising the training data. Next, we reviewed attacks on privacy that can exploit exposed training data and defenses, including differential privacy and encryption. For adversarial DL, we study the problem of finding adversarial examples against ML-based portable document format (PDF) malware classifiers. We believe that our problem is more challenging than those against ML models for image processing, owing to the highly complex data structure of PDFs, compared with traditional image datasets, and the requirement that the infected PDF should exhibit malicious behavior without being detected. We propose an attack using generative adversarial networks that effectively generates evasive PDFs using a variational autoencoder robust against adversarial examples. For privacy in DL, we study the problem of avoiding sensitive data being misused and propose a privacy-preserving framework for deep neural networks. Our methods are based on generative models that preserve the privacy of sensitive data while maintaining a high prediction performance. Finally, we study the security aspect in biological domains to detect maliciousness in deoxyribonucleic acid sequences and watermarks to protect intellectual properties. In summary, the proposed DL models for security and privacy embrace a diversity of research by attempting actual attacks and defenses in various fields.인공지능 모델을 사용하기 위해서는 개인별 데이터 수집이 필수적이다. 반면 개인의 민감한 데이터가 유출되는 경우에는 프라이버시 침해의 소지가 있다. 인공지능 모델을 사용하는데 수집된 데이터가 외부에 유출되지 않도록 하거나, 익명화, 부호화 등의 보안 기법을 인공지능 모델에 적용하는 분야를 Private AI로 분류할 수 있다. 또한 인공지능 모델이 노출될 경우 지적 소유권이 무력화될 수 있는 문제점과, 악의적인 학습 데이터를 이용하여 인공지능 시스템을 오작동할 수 있고 이러한 인공지능 모델 자체에 대한 위협은 Secure AI로 분류할 수 있다. 본 논문에서는 학습 데이터에 대한 공격을 기반으로 신경망의 결손 사례를 보여준다. 기존의 AEs 연구들은 이미지를 기반으로 많은 연구가 진행되었다. 보다 복잡한 heterogenous한 PDF 데이터로 연구를 확장하여 generative 기반의 모델을 제안하여 공격 샘플을 생성하였다. 다음으로 이상 패턴을 보이는 샘플을 검출할 수 있는 DNA steganalysis 방어 모델을 제안한다. 마지막으로 개인 정보 보호를 위해 generative 모델 기반의 익명화 기법들을 제안한다. 요약하면 본 논문은 인공지능 모델을 활용한 공격 및 방어 알고리즘과 신경망을 활용하는데 발생되는 프라이버시 이슈를 해결할 수 있는 기계학습 알고리즘에 기반한 일련의 방법론을 제안한다.Abstract i List of Figures vi List of Tables xiii 1 Introduction 1 2 Background 6 2.1 Deep Learning: a brief overview . . . . . . . . . . . . . . . . . . . 6 2.2 Security Attacks on Deep Learning Models . . . . . . . . . . . . . 10 2.2.1 Evasion Attacks . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 Poisoning Attack . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 Defense Techniques Against Deep Learning Models . . . . . . . . . 26 2.3.1 Defense Techniques against Evasion Attacks . . . . . . . . 27 2.3.2 Defense against Poisoning Attacks . . . . . . . . . . . . . . 36 2.4 Privacy issues on Deep Learning Models . . . . . . . . . . . . . . . 38 2.4.1 Attacks on Privacy . . . . . . . . . . . . . . . . . . . . . . 39 2.4.2 Defenses Against Attacks on Privacy . . . . . . . . . . . . 40 3 Attacks on Deep Learning Models 47 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.1 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.2 Portable Document Format (PDF) . . . . . . . . . . . . . . 55 3.1.3 PDF Malware Classifiers . . . . . . . . . . . . . . . . . . . 57 3.1.4 Evasion Attacks . . . . . . . . . . . . . . . . . . . . . . . 58 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 60 3.2.2 Feature Selection Process . . . . . . . . . . . . . . . . . . 61 3.2.3 Seed Selection for Mutation . . . . . . . . . . . . . . . . . 62 3.2.4 Evading Model . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.5 Model architecture . . . . . . . . . . . . . . . . . . . . . . 67 3.2.6 PDF Repacking and Verification . . . . . . . . . . . . . . . 67 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.3.1 Datasets and Model Training . . . . . . . . . . . . . . . . . 68 3.3.2 Target Classifiers . . . . . . . . . . . . . . . . . . . . . . . 71 3.3.3 CVEs for Various Types of PDF Malware . . . . . . . . . . 72 3.3.4 Malicious Signature . . . . . . . . . . . . . . . . . . . . . 72 3.3.5 AntiVirus Engines (VirusTotal) . . . . . . . . . . . . . . . 76 3.3.6 Feature Mutation Result for Contagio . . . . . . . . . . . . 76 3.3.7 Feature Mutation Result for CVEs . . . . . . . . . . . . . . 78 3.3.8 Malicious Signature Verification . . . . . . . . . . . . . . . 78 3.3.9 Evasion Speed . . . . . . . . . . . . . . . . . . . . . . . . 80 3.3.10 AntiVirus Engines (VirusTotal) Result . . . . . . . . . . . . 82 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4 Defense on Deep Learning Models 88 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.1.1 Message-Hiding Regions . . . . . . . . . . . . . . . . . . . 91 4.1.2 DNA Steganography . . . . . . . . . . . . . . . . . . . . . 92 4.1.3 Example of Message Hiding . . . . . . . . . . . . . . . . . 94 4.1.4 DNA Steganalysis . . . . . . . . . . . . . . . . . . . . . . 95 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.2.2 Proposed Model Architecture . . . . . . . . . . . . . . . . 103 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . 105 4.3.2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.3.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.3.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . 107 4.3.5 Message Hiding Procedure . . . . . . . . . . . . . . . . . . 108 4.3.6 Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . 109 4.3.7 Performance Comparison . . . . . . . . . . . . . . . . . . . 109 4.3.8 Analyzing Malicious Code in DNA Sequences . . . . . . . 112 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5 Privacy: Generative Models for Anonymizing Private Data 115 5.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.1.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.1.2 Anonymization using GANs . . . . . . . . . . . . . . . . . 119 5.1.3 Security Principle of Anonymized GANs . . . . . . . . . . 123 5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.2 Target Classifiers . . . . . . . . . . . . . . . . . . . . . . . 126 5.2.3 Model Training . . . . . . . . . . . . . . . . . . . . . . . . 126 5.2.4 Evaluation Process . . . . . . . . . . . . . . . . . . . . . . 126 5.2.5 Comparison to Differential Privacy . . . . . . . . . . . . . 128 5.2.6 Performance Comparison . . . . . . . . . . . . . . . . . . . 128 5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6 Privacy: Privacy-preserving Inference for Deep Learning Models 132 6.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.1.2 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.1.3 Deep Private Generation Framework . . . . . . . . . . . . . 137 6.1.4 Security Principle . . . . . . . . . . . . . . . . . . . . . . . 141 6.1.5 Threat to the Classifier . . . . . . . . . . . . . . . . . . . . 143 6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.2.2 Experimental Process . . . . . . . . . . . . . . . . . . . . . 146 6.2.3 Target Classifiers . . . . . . . . . . . . . . . . . . . . . . . 147 6.2.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . 147 6.2.5 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . 149 6.2.6 Performance Comparison . . . . . . . . . . . . . . . . . . . 150 6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7 Conclusion 153 7.0.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.0.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 155 Bibliography 157 Abstract in Korean 195Docto

SNU Open Repository and Archive

SoK: Chasing Accuracy and Privacy, and Catching Both in Differentially Private Histogram Publication

Author: Nelson Boel
Reuben Jenni
Publication venue
Publication date: 01/01/2020
Field of study

Histograms and synthetic data are of key importance in data analysis. However, researchers have shown that even aggregated data such as histograms, containing no obvious sensitive attributes, can result in privacy leakage. To enable data analysis, a strong notion of privacy is required to avoid risking unintended privacy violations. Such a strong notion of privacy is differential privacy, a statistical notion of privacy that makes privacy leakage quantifiable. The caveat regarding differential privacy is that while it has strong guarantees for privacy, privacy comes at a cost of accuracy. Despite this trade off being a central and important issue in the adoption of differential privacy, there exists a gap in the literature regarding providing an understanding of the trade off and how to address it appropriately. Through a systematic literature review (SLR), we investigate the state-of-the-art within accuracy improving differentially private algorithms for histogram and synthetic data publishing. Our contribution is two-fold: 1) we identify trends and connections in the contributions to the field of differential privacy for histograms and synthetic data and 2) we provide an understanding of the privacy/accuracy trade off challenge by crystallizing different dimensions to accuracy improvement. Accordingly, we position and visualize the ideas in relation to each other and external work, and deconstruct each algorithm to examine the building blocks separately with the aim of pinpointing which dimension of accuracy improvement each technique/approach is targeting. Hence, this systematization of knowledge (SoK) provides an understanding of in which dimensions and how accuracy improvement can be pursued without sacrificing privacy

arXiv.org e-Print Archive

Chalmers Research

Protection of data privacy based on artificial intelligence in Cyber-Physical Systems

Author: Keshk Marwa
Publication venue: UNSW, Sydney
Publication date: 01/01/2021
Field of study

With the rapid evolution of cyber attack techniques, the security and privacy of Cyber-Physical Systems (CPSs) have become key challenges. CPS environments have several properties that make them unique in efforts to appropriately secure them when compared with the processes, techniques and processes that have evolved for traditional IT networks and platforms. CPS ecosystems are comprised of heterogeneous systems, each with long lifespans. They use multitudes of operating systems and communication protocols and are often designed without security as a consideration. From a privacy perspective, there are also additional challenges. It is hard to capture and filter the heterogeneous data sources of CPSs, especially power systems, as their data should include network traffic and the sensing data of sensors. Protecting such data during the stages of collection, analysis and publication still open the possibility of new cyber threats disrupting the operational loops of power systems. Moreover, while protecting the original data of CPSs, identifying cyberattacks requires intrusion detection that produces high false alarm rates. This thesis significantly contributes to the protection of heterogeneous data sources, along with the high performance of discovering cyber-attacks in CPSs, especially smart power networks (i.e., power systems and their networks). For achieving high data privacy, innovative privacy-preserving techniques based on Artificial Intelligence (AI) are proposed to protect the original and sensitive data generated by CPSs and their networks. For cyber-attack discovery, meanwhile applying privacy-preserving techniques, new anomaly detection algorithms are developed to ensure high performances in terms of data utility and accuracy detection. The first main contribution of this dissertation is the development of a privacy preservation intrusion detection methodology that uses the correlation coefficient, independent component analysis, and Expectation Maximisation (EM) clustering algorithms to select significant data portions and discover cyber attacks against power networks. Before and after applying this technique, machine learning algorithms are used to assess their capabilities to classify normal and suspicious vectors. The second core contribution of this work is the design of a new privacy-preserving anomaly detection technique protecting the confidential information of CPSs and discovering malicious observations. Firstly, a data pre-processing technique filters and transforms data into a new format that accomplishes the aim of preserving privacy. Secondly, an anomaly detection technique using a Gaussian mixture model which fits selected features, and a Kalman filter technique that accurately computes the posterior probabilities of legitimate and anomalous events are employed. The third significant contribution of this thesis is developing a novel privacy-preserving framework for achieving the privacy and security criteria of smart power networks. In the first module, a two-level privacy module is developed, including an enhanced proof of work technique-based blockchain for accomplishing data integrity and a variational autoencoder approach for changing the data to an encoded data format to prevent inference attacks. In the second module, a long short-term memory deep learning algorithm is employed in anomaly detection to train and validate the outputs from the two-level privacy modules

UNSWorks

Effective and Secure Healthcare Machine Learning System with Explanations Based on High Quality Crowdsourcing Data

Author: Xue Qinghan
Publication venue: Lehigh Preserve
Publication date
Field of study

Affordable cloud computing technologies allow users to efficiently outsource, store, and manage their Personal Health Records (PHRs) and share with their caregivers or physicians. With this exponential growth of the stored large scale clinical data and the growing need for personalized care, researchers are keen on developing data mining methodologies to learn efficient hidden patterns in such data. While studies have shown that those progresses can significantly improve the performance of various healthcare applications for clinical decision making and personalized medicine, the collected medical datasets are highly ambiguous and noisy. Thus, it is essential to develop a better tool for disease progression and survival rate predictions, where dataset needs to be cleaned before it is used for predictions and useful feature selection techniques need to be employed before prediction models can be constructed. In addition, having predictions without explanations prevent medical personnel and patients from adopting such healthcare deep learning models. Thus, any prediction models must come with some explanations. Finally, despite the efficiency of machine learning systems and their outstanding prediction performance, it is still a risk to reuse pre-trained models since most machine learning modules that are contributed and maintained by third parties lack proper checking to ensure that they are robust to various adversarial attacks. We need to design mechanisms for detection such attacks. In this thesis, we focus on addressing all the above issues: (i) Privacy Preserving Disease Treatment & Complication Prediction System (PDTCPS): A privacy-preserving disease treatment, complication prediction scheme (PDTCPS) is proposed, which allows authorized users to conduct searches for disease diagnosis, personalized treatments, and prediction of potential complications. (ii) Incentivizing High Quality Crowdsourcing Data For Disease Prediction: A new incentive model with individual rationality and platform profitability features is developed to encourage different hospitals to share high quality data so that better prediction models can be constructed. We also explore how data cleaning and feature selection techniques affect the performance of the prediction models. (iii) Explainable Deep Learning Based Medical Diagnostic System: A deep learning based medical diagnosis system (DL-MDS) is present which integrates heterogeneous medical data sources to produce better disease diagnosis with explanations for authorized users who submit their personalized health related queries. (iv) Attacks on RNN based Healthcare Learning Systems and Their Detection & Defense Mechanisms: Potential attacks on Recurrent Neural Network (RNN) based ML systems are identified and low-cost detection & defense schemes are designed to prevent such adversarial attacks. Finally, we conduct extensive experiments using both synthetic and real-world datasets to validate the feasibility and practicality of our proposed systems

Lehigh University: Lehigh Preserve