761 research outputs found

    Enhancing Accuracy-Privacy Trade-off in Differentially Private Split Learning

    Full text link
    Split learning (SL) aims to protect user data privacy by distributing deep models between client-server and keeping private data locally. Only processed or `smashed' data can be transmitted from the clients to the server during the SL process. However, recently proposed model inversion attacks can recover the original data from the smashed data. In order to enhance privacy protection against such attacks, a strategy is to adopt differential privacy (DP), which involves safeguarding the smashed data at the expense of some accuracy loss. This paper presents the first investigation into the impact on accuracy when training multiple clients in SL with various privacy requirements. Subsequently, we propose an approach that reviews the DP noise distributions of other clients during client training to address the identified accuracy degradation. We also examine the application of DP to the local model of SL to gain insights into the trade-off between accuracy and privacy. Specifically, findings reveal that introducing noise in the later local layers offers the most favorable balance between accuracy and privacy. Drawing from our insights in the shallower layers, we propose an approach to reduce the size of smashed data to minimize data leakage while maintaining higher accuracy, optimizing the accuracy-privacy trade-off. Additionally, a smaller size of smashed data reduces communication overhead on the client side, mitigating one of the notable drawbacks of SL. Experiments with popular datasets demonstrate that our proposed approaches provide an optimal trade-off for incorporating DP into SL, ultimately enhancing training accuracy for multi-client SL with varying privacy requirements

    Performance Analysis Of Data-Driven Algorithms In Detecting Intrusions On Smart Grid

    Get PDF
    The traditional power grid is no longer a practical solution for power delivery due to several shortcomings, including chronic blackouts, energy storage issues, high cost of assets, and high carbon emissions. Therefore, there is a serious need for better, cheaper, and cleaner power grid technology that addresses the limitations of traditional power grids. A smart grid is a holistic solution to these issues that consists of a variety of operations and energy measures. This technology can deliver energy to end-users through a two-way flow of communication. It is expected to generate reliable, efficient, and clean power by integrating multiple technologies. It promises reliability, improved functionality, and economical means of power transmission and distribution. This technology also decreases greenhouse emissions by transferring clean, affordable, and efficient energy to users. Smart grid provides several benefits, such as increasing grid resilience, self-healing, and improving system performance. Despite these benefits, this network has been the target of a number of cyber-attacks that violate the availability, integrity, confidentiality, and accountability of the network. For instance, in 2021, a cyber-attack targeted a U.S. power system that shut down the power grid, leaving approximately 100,000 people without power. Another threat on U.S. Smart Grids happened in March 2018 which targeted multiple nuclear power plants and water equipment. These instances represent the obvious reasons why a high level of security approaches is needed in Smart Grids to detect and mitigate sophisticated cyber-attacks. For this purpose, the US National Electric Sector Cybersecurity Organization and the Department of Energy have joined their efforts with other federal agencies, including the Cybersecurity for Energy Delivery Systems and the Federal Energy Regulatory Commission, to investigate the security risks of smart grid networks. Their investigation shows that smart grid requires reliable solutions to defend and prevent cyber-attacks and vulnerability issues. This investigation also shows that with the emerging technologies, including 5G and 6G, smart grid may become more vulnerable to multistage cyber-attacks. A number of studies have been done to identify, detect, and investigate the vulnerabilities of smart grid networks. However, the existing techniques have fundamental limitations, such as low detection rates, high rates of false positives, high rates of misdetection, data poisoning, data quality and processing, lack of scalability, and issues regarding handling huge volumes of data. Therefore, these techniques cannot ensure safe, efficient, and dependable communication for smart grid networks. Therefore, the goal of this dissertation is to investigate the efficiency of machine learning in detecting cyber-attacks on smart grids. The proposed methods are based on supervised, unsupervised machine and deep learning, reinforcement learning, and online learning models. These models have to be trained, tested, and validated, using a reliable dataset. In this dissertation, CICDDoS 2019 was used to train, test, and validate the efficiency of the proposed models. The results show that, for supervised machine learning models, the ensemble models outperform other traditional models. Among the deep learning models, densely neural network family provides satisfactory results for detecting and classifying intrusions on smart grid. Among unsupervised models, variational auto-encoder, provides the highest performance compared to the other unsupervised models. In reinforcement learning, the proposed Capsule Q-learning provides higher detection and lower misdetection rates, compared to the other model in literature. In online learning, the Online Sequential Euclidean Distance Routing Capsule Network model provides significantly better results in detecting intrusion attacks on smart grid, compared to the other deep online models

    Towards Data Privacy and Utility in the Applications of Graph Neural Networks

    Get PDF
    Graph Neural Networks (GNNs) are essential for handling graph-structured data, often containing sensitive information. It’s vital to maintain a balance between data privacy and usability. To address this, this dissertation introduces three studies aimed at enhancing privacy and utility in GNN applications, particularly in node classification, link prediction, and graph classification. The first work tackles celebrity privacy in social networks. We develop a novel framework using adversarial learning for link-privacy preserved graph embedding, which effectively safeguards sensitive links without compromising the graph’s structure and node attributes. This approach is validated using real social network data. In the second work, we confront challenges in federated graph learning with non-independent and identically distributed (non-IID) data. We introduce PPFL-GNN, a privacy-preserving federated graph neural network framework that mitigates overfitting on the client side and inefficient aggregation on the server side. It leverages local graph data for embeddings and employs embedding alignment techniques for enhanced privacy, addressing the hurdles in federated learning on non-IID graph data. The third work explores Few-Shot graph classification, which aims to classify novel graph types with limited labeled data. We propose a unique framework combining Meta-learning and contrastive learning to better utilize graph structures in molecular and social network datasets. Additionally, we offer benchmark graph datasets with extensive node-attribute dimensions for future research. These studies collectively advance the field of graph-based machine learning by addressing critical issues of data privacy and utility in GNN applications

    HashVFL: Defending Against Data Reconstruction Attacks in Vertical Federated Learning

    Full text link
    Vertical Federated Learning (VFL) is a trending collaborative machine learning model training solution. Existing industrial frameworks employ secure multi-party computation techniques such as homomorphic encryption to ensure data security and privacy. Despite these efforts, studies have revealed that data leakage remains a risk in VFL due to the correlations between intermediate representations and raw data. Neural networks can accurately capture these correlations, allowing an adversary to reconstruct the data. This emphasizes the need for continued research into securing VFL systems. Our work shows that hashing is a promising solution to counter data reconstruction attacks. The one-way nature of hashing makes it difficult for an adversary to recover data from hash codes. However, implementing hashing in VFL presents new challenges, including vanishing gradients and information loss. To address these issues, we propose HashVFL, which integrates hashing and simultaneously achieves learnability, bit balance, and consistency. Experimental results indicate that HashVFL effectively maintains task performance while defending against data reconstruction attacks. It also brings additional benefits in reducing the degree of label leakage, mitigating adversarial attacks, and detecting abnormal inputs. We hope our work will inspire further research into the potential applications of HashVFL

    Information Leakage Attacks and Countermeasures

    Get PDF
    The scientific community has been consistently working on the pervasive problem of information leakage, uncovering numerous attack vectors, and proposing various countermeasures. Despite these efforts, leakage incidents remain prevalent, as the complexity of systems and protocols increases, and sophisticated modeling methods become more accessible to adversaries. This work studies how information leakages manifest in and impact interconnected systems and their users. We first focus on online communications and investigate leakages in the Transport Layer Security protocol (TLS). Using modern machine learning models, we show that an eavesdropping adversary can efficiently exploit meta-information (e.g., packet size) not protected by the TLS’ encryption to launch fingerprinting attacks at an unprecedented scale even under non-optimal conditions. We then turn our attention to ultrasonic communications, and discuss their security shortcomings and how adversaries could exploit them to compromise anonymity network users (even though they aim to offer a greater level of privacy compared to TLS). Following up on these, we delve into physical layer leakages that concern a wide array of (networked) systems such as servers, embedded nodes, Tor relays, and hardware cryptocurrency wallets. We revisit location-based side-channel attacks and develop an exploitation neural network. Our model demonstrates the capabilities of a modern adversary but also presents an inexpensive tool to be used by auditors for detecting such leakages early on during the development cycle. Subsequently, we investigate techniques that further minimize the impact of leakages found in production components. Our proposed system design distributes both the custody of secrets and the cryptographic operation execution across several components, thus making the exploitation of leaks difficult

    Privacy-preserving machine learning system at the edge

    Get PDF
    Data privacy in machine learning has become an urgent problem to be solved, along with machine learning's rapid development and the large attack surface being explored. Pre-trained deep neural networks are increasingly deployed in smartphones and other edge devices for a variety of applications, leading to potential disclosures of private information. In collaborative learning, participants keep private data locally and communicate deep neural networks updated on their local data, but still, the private information encoded in the networks' gradients can be explored by adversaries. This dissertation aims to perform dedicated investigations on privacy leakage from neural networks and to propose privacy-preserving machine learning systems for edge devices. Firstly, the systematization of knowledge is conducted to identify the key challenges and existing/adaptable solutions. Then a framework is proposed to measure the amount of sensitive information memorized in each layer's weights of a neural network based on the generalization error. Results show that, when considered individually, the last layers encode a larger amount of information from the training data compared to the first layers. To protect such sensitive information in weights, DarkneTZ is proposed as a framework that uses an edge device's Trusted Execution Environment (TEE) in conjunction with model partitioning to limit the attack surface against neural networks. The performance of DarkneTZ is evaluated, including CPU execution time, memory usage, and accurate power consumption, using two small and six large image classification models. Due to the limited memory of the edge device's TEE, model layers are partitioned into more sensitive layers (to be executed inside the device TEE), and a set of layers to be executed in the untrusted part of the operating system. Results show that even if a single layer is hidden, one can provide reliable model privacy and defend against state of art membership inference attacks, with only a 3% performance overhead. This thesis further strengthens investigations from neural network weights (in on-device machine learning deployment) to gradients (in collaborative learning). An information-theoretical framework is proposed, by adapting usable information theory and considering the attack outcome as a probability measure, to quantify private information leakage from network gradients. The private original information and latent information are localized in a layer-wise manner. After that, this work performs sensitivity analysis over the gradients \wrt~private information to further explore the underlying cause of information leakage. Numerical evaluations are conducted on six benchmark datasets and four well-known networks and further measure the impact of training hyper-parameters and defense mechanisms. Last but not least, to limit the privacy leakages in gradients, I propose and implement a Privacy-preserving Federated Learning (PPFL) framework for mobile systems. TEEs are utilized on clients for local training, and on servers for secure aggregation, so that model/gradient updates are hidden from adversaries. This work leverages greedy layer-wise training to train each model's layer inside the trusted area until its convergence. The performance evaluation of the implementation shows that PPFL significantly improves privacy by defending against data reconstruction, property inference, and membership inference attacks while incurring small communication overhead and client-side system overheads. This thesis offers a better understanding of the sources of private information in machine learning and provides frameworks to fully guarantee privacy and achieve comparable ML model utility and system overhead with regular machine learning framework.Open Acces

    Adversarial Deep Learning and Security with a Hardware Perspective

    Get PDF
    Adversarial deep learning is the field of study which analyzes deep learning in the presence of adversarial entities. This entails understanding the capabilities, objectives, and attack scenarios available to the adversary to develop defensive mechanisms and avenues of robustness available to the benign parties. Understanding this facet of deep learning helps us improve the safety of the deep learning systems against external threats from adversaries. However, of equal importance, this perspective also helps the industry understand and respond to critical failures in the technology. The expectation of future success has driven significant interest in developing this technology broadly. Adversarial deep learning stands as a balancing force to ensure these developments remain grounded in the real-world and proceed along a responsible trajectory. Recently, the growth of deep learning has begun intersecting with the computer hardware domain to improve performance and efficiency for resource constrained application domains. The works investigated in this dissertation constitute our pioneering efforts in migrating adversarial deep learning into the hardware domain alongside its parent field of research

    Recoverable Privacy-Preserving Image Classification through Noise-like Adversarial Examples

    Full text link
    With the increasing prevalence of cloud computing platforms, ensuring data privacy during the cloud-based image related services such as classification has become crucial. In this study, we propose a novel privacypreserving image classification scheme that enables the direct application of classifiers trained in the plaintext domain to classify encrypted images, without the need of retraining a dedicated classifier. Moreover, encrypted images can be decrypted back into their original form with high fidelity (recoverable) using a secret key. Specifically, our proposed scheme involves utilizing a feature extractor and an encoder to mask the plaintext image through a newly designed Noise-like Adversarial Example (NAE). Such an NAE not only introduces a noise-like visual appearance to the encrypted image but also compels the target classifier to predict the ciphertext as the same label as the original plaintext image. At the decoding phase, we adopt a Symmetric Residual Learning (SRL) framework for restoring the plaintext image with minimal degradation. Extensive experiments demonstrate that 1) the classification accuracy of the classifier trained in the plaintext domain remains the same in both the ciphertext and plaintext domains; 2) the encrypted images can be recovered into their original form with an average PSNR of up to 51+ dB for the SVHN dataset and 48+ dB for the VGGFace2 dataset; 3) our system exhibits satisfactory generalization capability on the encryption, decryption and classification tasks across datasets that are different from the training one; and 4) a high-level of security is achieved against three potential threat models. The code is available at https://github.com/csjunjun/RIC.git.Comment: 23 pages, 9 figure

    A Modified LeNet CNN for Breast Cancer Diagnosis in Ultrasound Images

    Get PDF
    Convolutional neural networks (CNNs) have been extensively utilized in medical image processing to automatically extract meaningful features and classify various medical conditions, enabling faster and more accurate diagnoses. In this paper, LeNet, a classic CNN architecture, has been successfully applied to breast cancer data analysis. It demonstrates its ability to extract discriminative features and classify malignant and benign tumors with high accuracy, thereby supporting early detection and diagnosis of breast cancer. LeNet with corrected Rectified Linear Unit (ReLU), a modification of the traditional ReLU activation function, has been found to improve the performance of LeNet in breast cancer data analysis tasks via addressing the “dying ReLU” problem and enhancing the discriminative power of the extracted features. This has led to more accurate, reliable breast cancer detection and diagnosis and improved patient outcomes. Batch normalization improves the performance and training stability of small and shallow CNN architecture like LeNet. It helps to mitigate the effects of internal covariate shift, which refers to the change in the distribution of network activations during training. This classifier will lessen the overfitting problem and reduce the running time. The designed classifier is evaluated against the benchmarking deep learning models, proving that this has produced a higher recognition rate. The accuracy of the breast image recognition rate is 89.91%. This model will achieve better performance in segmentation, feature extraction, classification, and breast cancer tumor detection
    • …
    corecore