64 research outputs found

    Dynamic adversarial mining - effectively applying machine learning in adversarial non-stationary environments.

    Get PDF
    While understanding of machine learning and data mining is still in its budding stages, the engineering applications of the same has found immense acceptance and success. Cybersecurity applications such as intrusion detection systems, spam filtering, and CAPTCHA authentication, have all begun adopting machine learning as a viable technique to deal with large scale adversarial activity. However, the naive usage of machine learning in an adversarial setting is prone to reverse engineering and evasion attacks, as most of these techniques were designed primarily for a static setting. The security domain is a dynamic landscape, with an ongoing never ending arms race between the system designer and the attackers. Any solution designed for such a domain needs to take into account an active adversary and needs to evolve over time, in the face of emerging threats. We term this as the ‘Dynamic Adversarial Mining’ problem, and the presented work provides the foundation for this new interdisciplinary area of research, at the crossroads of Machine Learning, Cybersecurity, and Streaming Data Mining. We start with a white hat analysis of the vulnerabilities of classification systems to exploratory attack. The proposed ‘Seed-Explore-Exploit’ framework provides characterization and modeling of attacks, ranging from simple random evasion attacks to sophisticated reverse engineering. It is observed that, even systems having prediction accuracy close to 100%, can be easily evaded with more than 90% precision. This evasion can be performed without any information about the underlying classifier, training dataset, or the domain of application. Attacks on machine learning systems cause the data to exhibit non stationarity (i.e., the training and the testing data have different distributions). It is necessary to detect these changes in distribution, called concept drift, as they could cause the prediction performance of the model to degrade over time. However, the detection cannot overly rely on labeled data to compute performance explicitly and monitor a drop, as labeling is expensive and time consuming, and at times may not be a possibility altogether. As such, we propose the ‘Margin Density Drift Detection (MD3)’ algorithm, which can reliably detect concept drift from unlabeled data only. MD3 provides high detection accuracy with a low false alarm rate, making it suitable for cybersecurity applications; where excessive false alarms are expensive and can lead to loss of trust in the warning system. Additionally, MD3 is designed as a classifier independent and streaming algorithm for usage in a variety of continuous never-ending learning systems. We then propose a ‘Dynamic Adversarial Mining’ based learning framework, for learning in non-stationary and adversarial environments, which provides ‘security by design’. The proposed ‘Predict-Detect’ classifier framework, aims to provide: robustness against attacks, ease of attack detection using unlabeled data, and swift recovery from attacks. Ideas of feature hiding and obfuscation of feature importance are proposed as strategies to enhance the learning framework\u27s security. Metrics for evaluating the dynamic security of a system and recover-ability after an attack are introduced to provide a practical way of measuring efficacy of dynamic security strategies. The framework is developed as a streaming data methodology, capable of continually functioning with limited supervision and effectively responding to adversarial dynamics. The developed ideas, methodology, algorithms, and experimental analysis, aim to provide a foundation for future work in the area of ‘Dynamic Adversarial Mining’, wherein a holistic approach to machine learning based security is motivated

    Robust Representation Learning for Privacy-Preserving Machine Learning: A Multi-Objective Autoencoder Approach

    Full text link
    Several domains increasingly rely on machine learning in their applications. The resulting heavy dependence on data has led to the emergence of various laws and regulations around data ethics and privacy and growing awareness of the need for privacy-preserving machine learning (ppML). Current ppML techniques utilize methods that are either purely based on cryptography, such as homomorphic encryption, or that introduce noise into the input, such as differential privacy. The main criticism given to those techniques is the fact that they either are too slow or they trade off a model s performance for improved confidentiality. To address this performance reduction, we aim to leverage robust representation learning as a way of encoding our data while optimizing the privacy-utility trade-off. Our method centers on training autoencoders in a multi-objective manner and then concatenating the latent and learned features from the encoding part as the encoded form of our data. Such a deep learning-powered encoding can then safely be sent to a third party for intensive training and hyperparameter tuning. With our proposed framework, we can share our data and use third party tools without being under the threat of revealing its original form. We empirically validate our results on unimodal and multimodal settings, the latter following a vertical splitting system and show improved performance over state-of-the-art

    A Unified Framework for Analyzing and Detecting Malicious Examples of DNN Models

    Full text link
    Deep Neural Networks are well known to be vulnerable to adversarial attacks and backdoor attacks, where minor modifications on the input can mislead the models to give wrong results. Although defenses against adversarial attacks have been widely studied, research on mitigating backdoor attacks is still at an early stage. It is unknown whether there are any connections and common characteristics between the defenses against these two attacks. In this paper, we present a unified framework for detecting malicious examples and protecting the inference results of Deep Learning models. This framework is based on our observation that both adversarial examples and backdoor examples have anomalies during the inference process, highly distinguishable from benign samples. As a result, we repurpose and revise four existing adversarial defense methods for detecting backdoor examples. Extensive evaluations indicate these approaches provide reliable protection against backdoor attacks, with a higher accuracy than detecting adversarial examples. These solutions also reveal the relations of adversarial examples, backdoor examples and normal samples in model sensitivity, activation space and feature space. This can enhance our understanding about the inherent features of these two attacks, as well as the defense opportunities

    Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation

    Full text link
    Data valuation aims to quantify the usefulness of individual data sources in training machine learning (ML) models, and is a critical aspect of data-centric ML research. However, data valuation faces significant yet frequently overlooked privacy challenges despite its importance. This paper studies these challenges with a focus on KNN-Shapley, one of the most practical data valuation methods nowadays. We first emphasize the inherent privacy risks of KNN-Shapley, and demonstrate the significant technical difficulties in adapting KNN-Shapley to accommodate differential privacy (DP). To overcome these challenges, we introduce TKNN-Shapley, a refined variant of KNN-Shapley that is privacy-friendly, allowing for straightforward modifications to incorporate DP guarantee (DP-TKNN-Shapley). We show that DP-TKNN-Shapley has several advantages and offers a superior privacy-utility tradeoff compared to naively privatized KNN-Shapley in discerning data quality. Moreover, even non-private TKNN-Shapley achieves comparable performance as KNN-Shapley. Overall, our findings suggest that TKNN-Shapley is a promising alternative to KNN-Shapley, particularly for real-world applications involving sensitive data.Comment: NeurIPS 2023 Spotligh

    Machine learning techniques for identification using mobile and social media data

    Get PDF
    Networked access and mobile devices provide near constant data generation and collection. Users, environments, applications, each generate different types of data; from the voluntarily provided data posted in social networks to data collected by sensors on mobile devices, it is becoming trivial to access big data caches. Processing sufficiently large amounts of data results in inferences that can be characterized as privacy invasive. In order to address privacy risks we must understand the limits of the data exploring relationships between variables and how the user is reflected in them. In this dissertation we look at data collected from social networks and sensors to identify some aspect of the user or their surroundings. In particular, we find that from social media metadata we identify individual user accounts and from the magnetic field readings we identify both the (unique) cellphone device owned by the user and their course-grained location. In each project we collect real-world datasets and apply supervised learning techniques, particularly multi-class classification algorithms to test our hypotheses. We use both leave-one-out cross validation as well as k-fold cross validation to reduce any bias in the results. Throughout the dissertation we find that unprotected data reveals sensitive information about users. Each chapter also contains a discussion about possible obfuscation techniques or countermeasures and their effectiveness with regards to the conclusions we present. Overall our results show that deriving information about users is attainable and, with each of these results, users would have limited if any indication that any type of analysis was taking place

    Towards privacy-aware mobile-based continuous authentication systems

    Get PDF
    User authentication is used to verify the identify of individuals attempting to gain access to a certain system. It traditionally refers to the initial authentication using knowledge factors (e.g. passwords), or ownership factors (e.g. smart cards). However, initial authentication cannot protect the computer (or smartphone), if left unattended, after the initial login. Thus, continuous authentication was proposed to complement initial authentication by transparently and continuously testing the users\u27 behavior against the stored profile (machine learning model). Since continuous authentication utilizes users\u27 behavioral data to build machine learning models, certain privacy and security concerns have to be addressed before these systems can be widely deployed. In almost all of the continuous authentication research, non-privacy-preserving classification methods were used (such as SVM or KNN). The motivation of this work is twofold: (1) studying the implications of such assumption on continuous authentication security, and users\u27 privacy, and (2) proposing privacy-aware solutions to address the threats introduced by these assumptions. First, we study and propose reconstruction attacks and model inversion attacks in relation to continuous authentication systems, and we implement solutions that can be effective against our proposed attacks. We conduct this research assuming that a certain cloud service (which rely on continuous authentication) was compromised, and that the adversary is trying to utilize this compromised system to access a user\u27s account on another cloud service. We identify two types of adversaries based on how their knowledge is obtained: (1) full-profile adversary that has access to the victim\u27s profile, and (2) decision value adversary who is an active adversary that only has access to the cloud service mobile app (which is used to obtain a feature vector). Eventually, both adversaries use the user\u27s compromised feature vectors to generate raw data based on our proposed reconstruction methods: a numerical method that is tied to a single attacked system (set of features), and a randomized algorithm that is not restricted to a single set of features. We conducted experiments using a public data set where we evaluated the attacks performed by our two types of adversaries and two types of reconstruction algorithms, and we have shown that our attacks are feasible. Finally, we analyzed the results, and provided recommendations to resist our attacks. Our remedies directly limit the effectiveness of model inversion attacks; thus, dealing with decision value adversaries. Second, we study privacy-enhancing technologies for machine learning that can potentially prevent full-profile adversaries from utilizing the stored profiles to obtain the original feature vectors. We also study the problem of restricting undesired inference on users\u27 private data within the context of continuous authentication. We propose a gesture-based continuous authentication framework that utilizes supervised dimensionality reduction (S-DR) techniques to protect against undesired inference attacks, and meets the non-invertibility (security) requirement of cancelable biometrics. These S-DR methods are Discriminant Component Analysis (DCA), and Multiclass Discriminant Ratio (MDR). Using experiments on a public data set, our results show that DCA and MDR provide better privacy/utility performance than random projection, which was extensively utilized in cancelable biometrics. Third, since using DCA (or MDR) requires computing the projection matrix from data distributed across multiple data owners, we proposed privacy-preserving PCA/DCA protocols that enable a data user (cloud server) to compute the projection matrices without compromising the privacy of the individual data owners. To achieve this, we propose new protocols for computing the scatter matrices using additive homomorphic encryption, and performing the Eigen decomposition using Garbled circuits. We implemented our protocols using Java and Obliv-C, and conducted experiments on public datasets. We show that our protocols are efficient, and preserve the privacy while maintaining the accuracy

    Inferences from Interactions with Smart Devices: Security Leaks and Defenses

    Get PDF
    We unlock our smart devices such as smartphone several times every day using a pin, password, or graphical pattern if the device is secured by one. The scope and usage of smart devices\u27 are expanding day by day in our everyday life and hence the need to make them more secure. In the near future, we may need to authenticate ourselves on emerging smart devices such as electronic doors, exercise equipment, power tools, medical devices, and smart TV remote control. While recent research focuses on developing new behavior-based methods to authenticate these smart devices, pin and password still remain primary methods to authenticate a user on a device. Although the recent research exposes the observation-based vulnerabilities, the popular belief is that the direct observation attacks can be thwarted by simple methods that obscure the attacker\u27s view of the input console (or screen). In this dissertation, we study the users\u27 hand movement pattern while they type on their smart devices. The study concentrates on the following two factors; (1) finding security leaks from the observed hand movement patterns (we showcase that the user\u27s hand movement on its own reveals the user\u27s sensitive information) and (2) developing methods to build lightweight, easy to use, and more secure authentication system. The users\u27 hand movement patterns were captured through video camcorder and inbuilt motion sensors such as gyroscope and accelerometer in the user\u27s device

    A Review on Cybersecurity based on Machine Learning and Deep Learning Algorithms

    Get PDF
    Machin learning (ML) and Deep Learning (DL) technique have been widely applied to areas like image processing and speech recognition so far. Likewise, ML and DL plays a critical role in detecting and preventing in the field of cybersecurity. In this review, we focus on recent ML and DL algorithms that have been proposed in cybersecurity, network intrusion detection, malware detection. We also discuss key elements of cybersecurity, main principle of information security and the most common methods used to threaten cybersecurity. Finally, concluding remarks are discussed including the possible research topics that can be taken into consideration to enhance various cyber security applications using DL and ML algorithms

    Learn2Perturb: an End-to-end Feature Perturbation Learning to Improve Adversarial Robustness

    Get PDF
    Deep neural networks have been achieving state-of-the-art performance across a wide variety of applications, and due to their outstanding performance, they are being deployed in safety and security critical systems. However, in recent years, deep neural networks have been shown to be very vulnerable to optimally crafted input samples called adversarial examples. Although the adversarial perturbations are imperceptible to humans, especially, in the domain of computer vision, they have been very successful in fooling strong deep models.The vulnerability of deep models to adversarial attacks limits their widespread deployment for safety-critical applications. As a result, adversarial attack and defense algorithms have drawn great attention in the literature. Many defense algorithms have been proposed to overcome the threat of adversarial attacks, and many of these algorithms use adversarial training (adding perturbations during the training stage). Alongside other adversarial defense approaches being investigated, there has been a very recent interest in improving adversarial robustness in deep neural networks through the introduction of perturbations during the training process. However, such methods leverage fixed, pre-defined perturbations and require significant hyper-parameter tuning that makes them very difficult to leverage in a general fashion. In this work, we introduce Learn2Perturb, an end-to-end feature perturbation learning approach for improving the adversarial robustness of deep neural networks. More specifically, we introduce novel perturbation-injection modules that are incorporated at each layer to perturb the feature space and increase uncertainty in the network. This feature perturbation is performed at both the training and the inference stages. Furthermore, inspired by the Expectation-Maximization approach, an alternating back-propagation training algorithm is introduced to train the network and noise parameters consecutively. Experimental results on CIFAR-10 and CIFAR-100 datasets show that the proposed Learn2Perturb method can result in deep neural networks which are 4–7 percent more robust on Linf FGSM and PDG adversarial attacks and significantly outperforms the state-of-the-art against L2 C&W attack and a wide range of well-known black-box attacks
    • …
    corecore