46 research outputs found
A Survey on Malware Detection with Graph Representation Learning
Malware detection has become a major concern due to the increasing number and
complexity of malware. Traditional detection methods based on signatures and
heuristics are used for malware detection, but unfortunately, they suffer from
poor generalization to unknown attacks and can be easily circumvented using
obfuscation techniques. In recent years, Machine Learning (ML) and notably Deep
Learning (DL) achieved impressive results in malware detection by learning
useful representations from data and have become a solution preferred over
traditional methods. More recently, the application of such techniques on
graph-structured data has achieved state-of-the-art performance in various
domains and demonstrates promising results in learning more robust
representations from malware. Yet, no literature review focusing on graph-based
deep learning for malware detection exists. In this survey, we provide an
in-depth literature review to summarize and unify existing works under the
common approaches and architectures. We notably demonstrate that Graph Neural
Networks (GNNs) reach competitive results in learning robust embeddings from
malware represented as expressive graph structures, leading to an efficient
detection by downstream classifiers. This paper also reviews adversarial
attacks that are utilized to fool graph-based detection methods. Challenges and
future research directions are discussed at the end of the paper.Comment: Preprint, submitted to ACM Computing Surveys on March 2023. For any
suggestions or improvements, please contact me directly by e-mai
Developing Robust Models, Algorithms, Databases and Tools With Applications to Cybersecurity and Healthcare
As society and technology becomes increasingly interconnected, so does the threat landscape. Once isolated threats now pose serious concerns to highly interdependent systems, highlighting the fundamental need for robust machine learning. This dissertation contributes novel tools, algorithms, databases, and modelsâthrough the lens of robust machine learningâin a research effort to solve large-scale societal problems affecting millions of people in the areas of cybersecurity and healthcare.
(1) Tools: We develop TIGER, the first comprehensive graph robustness toolbox; and our ROBUSTNESS SURVEY identifies critical yet missing areas of graph robustness research.
(2) Algorithms: Our survey and toolbox reveal existing work has overlooked lateral attacks on computer authentication networks. We develop D2M, the first algorithmic framework to quantify and mitigate network vulnerability to lateral attacks by modeling lateral attack movement from a graph theoretic perspective.
(3) Databases: To prevent lateral attacks altogether, we develop MALNET-GRAPH, the worldâs largest cybersecurity graph databaseâcontaining over 1.2M graphs across 696 classesâand show the first large-scale results demonstrating the effectiveness of malware detection through a graph medium. We extend MALNET-GRAPH by constructing the largest binary-image cybersecurity databaseâcontaining 1.2M images, 133Ămore images than the only other public databaseâenabling new discoveries in malware detection and classification research restricted to a few industry labs (MALNET-IMAGE).
(4) Models: To protect systems from adversarial attacks, we develop UNMASK, the first model that flags semantic incoherence in computer vision systems, which detects up to 96.75% of attacks, and defends the model by correctly classifying up to 93% of attacks. Inspired by UNMASKâs ability to protect computer visions systems from adversarial attack, we develop REST, which creates noise robust models through a novel combination of adversarial training, spectral regularization, and sparsity regularization. In the presence of noise, our method improves state-of-the-art sleep stage scoring by 71%âallowing us to diagnose sleep disorders earlier on and in the home environmentâwhile using 19Ă less parameters and 15Ăless MFLOPS. Our work has made significant impact to industry and society: the UNMASK framework laid the foundation for a multi-million dollar DARPA GARD award; the TIGER toolbox for graph robustness analysis is a part of the Nvidia Data Science Teaching Kit, available to educators around the world; we released MALNET, the worldâs largest graph classification database with 1.2M graphs; and the D2M framework has had major impact to Microsoft products, inspiring changes to the productâs approach to lateral attack detection.Ph.D
Building Secure and Reliable Deep Learning Systems from a Systems Security Perspective
As deep learning (DL) is becoming a key component in many business and safety-critical systems, such as self-driving cars or AI-assisted robotic surgery, adversaries have started placing them on their radar. To understand their potential threats, recent work studied the worst-case behaviors of deep neural networks (DNNs), such as mispredictions caused by adversarial examples or models altered by data poisoning attacks. However, most of the prior work narrowly considers DNNs as an isolated mathematical concept, and this perspective overlooks a holistic pictureâleaving out the security threats that involve vulnerable interactions between DNNs and hardware or system-level components.
In this dissertation, on three separate projects, I conduct a study on how DL systems, owing to the computational properties of DNNs, become particularly vulnerable to existing well-studied attacks. First, I study how over-parameterization hurts a systemâs resilience to fault-injection attacks. Even with a single bit-flip, when chosen carefully, an attacker can inflict an accuracy drop up to 100%, and half of a DNNâs parameters have at least one bit that degrades its accuracy over 10%. An adversary who wields Rowhammer, a fault attack that flips random or targeted bits in the physical memory (DRAM), can exploit this graceless degradation in practice. Second, I study how computational regularities compromise the confidentiality of a system. Leveraging the information leaked by a DNN processing a single sample, an adversary can steal the DNNâs often proprietary architecture. An attacker armed with Flush+Reload, a remote side-channel attack, can accurately perform this reconstruction against a DNN deployed in the cloud. Third, I will show how input-adaptive DNNs, e.g., multi-exit networks, fail to promise computational efficiency in an adversarial setting. By adding imperceptible input perturbations, an attacker can significantly increase a multi-exit networkâs computations to have predictions on an input. This vulnerability also leads to exploitation in resource-constrained settings such as an IoT scenario, where input-adaptive networks are gaining traction. Finally, building on the lessons learned from my projects, I conclude my dissertation by outlining future research directions for designing secure and reliable DL systems
Cyber-Physical Threat Intelligence for Critical Infrastructures Security
Modern critical infrastructures can be considered as large scale Cyber Physical Systems (CPS). Therefore, when designing, implementing, and operating systems for Critical Infrastructure Protection (CIP), the boundaries between physical security and cybersecurity are blurred. Emerging systems for Critical Infrastructures Security and Protection must therefore consider integrated approaches that emphasize the interplay between cybersecurity and physical security techniques. Hence, there is a need for a new type of integrated security intelligence i.e., Cyber-Physical Threat Intelligence (CPTI). This book presents novel solutions for integrated Cyber-Physical Threat Intelligence for infrastructures in various sectors, such as Industrial Sites and Plants, Air Transport, Gas, Healthcare, and Finance. The solutions rely on novel methods and technologies, such as integrated modelling for cyber-physical systems, novel reliance indicators, and data driven approaches including BigData analytics and Artificial Intelligence (AI). Some of the presented approaches are sector agnostic i.e., applicable to different sectors with a fair customization effort. Nevertheless, the book presents also peculiar challenges of specific sectors and how they can be addressed. The presented solutions consider the European policy context for Security, Cyber security, and Critical Infrastructure protection, as laid out by the European Commission (EC) to support its Member States to protect and ensure the resilience of their critical infrastructures. Most of the co-authors and contributors are from European Research and Technology Organizations, as well as from European Critical Infrastructure Operators. Hence, the presented solutions respect the European approach to CIP, as reflected in the pillars of the European policy framework. The latter includes for example the Directive on security of network and information systems (NIS Directive), the Directive on protecting European Critical Infrastructures, the General Data Protection Regulation (GDPR), and the Cybersecurity Act Regulation. The sector specific solutions that are described in the book have been developed and validated in the scope of several European Commission (EC) co-funded projects on Critical Infrastructure Protection (CIP), which focus on the listed sectors. Overall, the book illustrates a rich set of systems, technologies, and applications that critical infrastructure operators could consult to shape their future strategies. It also provides a catalogue of CPTI case studies in different sectors, which could be useful for security consultants and practitioners as well
Privacy-Preserving Data Collection and Sharing in Modern Mobile Internet Systems
With the ubiquity and widespread use of mobile devices such as laptops, smartphones, smartwatches, and IoT devices, large volumes of user data are generated and recorded. While there is great value in collecting, analyzing and sharing this data for improving products and services, data privacy poses a major concern.
This dissertation research addresses the problem of privacy-preserving data collection and sharing in the context of both mobile trajectory data and mobile Internet access data. The first contribution of this dissertation research is the design and development of a system for utility-aware synthesis of differentially private and attack-resilient location traces, called AdaTrace. Given a set of real location traces, AdaTrace executes a four-phase process consisting of feature extraction, synopsis construction, noise injection, and generation of synthetic location traces. Compared to representative prior approaches, the location traces generated by AdaTrace offer up to 3-fold improvement in utility, measured using a variety of utility metrics and datasets, while preserving both differential privacy and attack resilience.
The second contribution of this dissertation research is the design and development of locally private protocols for privacy-sensitive collection of mobile and Web user data. Motivated by the excessive utility loss of existing Local Differential Privacy (LDP) protocols under small user populations, this dissertation introduces the notion of Condensed Local Differential Privacy (CLDP) and a suite of protocols satisfying CLDP to enable the collection of various types of user data, ranging from ordinal data types in finite metric spaces (malware infection statistics), to non-ordinal items (OS versions and transaction categories), and to sequences of ordinal or non-ordinal items. Using cybersecurity data and case studies from Symantec, a major cybersecurity vendor, we show that proposed CLDP protocols are practical for key tasks including malware outbreak detection, OS vulnerability analysis, and inspecting suspicious activities on infected machines.
The third contribution of this dissertation research is the development of a framework and a prototype system for evaluating privacy-utility tradeoffs of different LDP protocols, called LDPLens. LDPLens introduces metrics to evaluate protocol tradeoffs based on factors such as the utility metric, the data collection scenario, and the user-specified adversary metric. We develop a common Bayesian adversary model to analyze LDP protocols, and we formally and experimentally analyze Adversarial Success Rate (ASR) under each protocol. Motivated by the findings that numerous factors impact the ASR and utility behaviors of LDP protocols, we develop LDPLens to provide effective recommendations for finding the most suitable protocol in a given setting. Our three case studies with real-world datasets demonstrate that using the protocol recommended by LDPLens can offer substantial reduction in utility loss or in ASR, compared to using a randomly chosen protocol.Ph.D
Cyber-Physical Threat Intelligence for Critical Infrastructures Security
Modern critical infrastructures can be considered as large scale Cyber Physical Systems (CPS). Therefore, when designing, implementing, and operating systems for Critical Infrastructure Protection (CIP), the boundaries between physical security and cybersecurity are blurred. Emerging systems for Critical Infrastructures Security and Protection must therefore consider integrated approaches that emphasize the interplay between cybersecurity and physical security techniques. Hence, there is a need for a new type of integrated security intelligence i.e., Cyber-Physical Threat Intelligence (CPTI). This book presents novel solutions for integrated Cyber-Physical Threat Intelligence for infrastructures in various sectors, such as Industrial Sites and Plants, Air Transport, Gas, Healthcare, and Finance. The solutions rely on novel methods and technologies, such as integrated modelling for cyber-physical systems, novel reliance indicators, and data driven approaches including BigData analytics and Artificial Intelligence (AI). Some of the presented approaches are sector agnostic i.e., applicable to different sectors with a fair customization effort. Nevertheless, the book presents also peculiar challenges of specific sectors and how they can be addressed. The presented solutions consider the European policy context for Security, Cyber security, and Critical Infrastructure protection, as laid out by the European Commission (EC) to support its Member States to protect and ensure the resilience of their critical infrastructures. Most of the co-authors and contributors are from European Research and Technology Organizations, as well as from European Critical Infrastructure Operators. Hence, the presented solutions respect the European approach to CIP, as reflected in the pillars of the European policy framework. The latter includes for example the Directive on security of network and information systems (NIS Directive), the Directive on protecting European Critical Infrastructures, the General Data Protection Regulation (GDPR), and the Cybersecurity Act Regulation. The sector specific solutions that are described in the book have been developed and validated in the scope of several European Commission (EC) co-funded projects on Critical Infrastructure Protection (CIP), which focus on the listed sectors. Overall, the book illustrates a rich set of systems, technologies, and applications that critical infrastructure operators could consult to shape their future strategies. It also provides a catalogue of CPTI case studies in different sectors, which could be useful for security consultants and practitioners as well