7 research outputs found
Modélisation formelle des systèmes de détection d'intrusions
L’écosystème de la cybersécurité évolue en permanence en termes du nombre, de la diversité, et de la complexité des attaques. De ce fait, les outils de détection deviennent inefficaces face à certaines attaques. On distingue généralement trois types de systèmes de détection d’intrusions : détection par anomalies, détection par signatures et détection hybride. La détection par anomalies est fondée sur la caractérisation du comportement habituel du système, typiquement de manière statistique. Elle permet de détecter des attaques connues ou inconnues, mais génère aussi un très grand nombre de faux positifs. La détection par signatures permet de détecter des attaques connues en définissant des règles qui décrivent le comportement connu d’un attaquant. Cela demande une bonne connaissance du comportement de l’attaquant. La détection hybride repose sur plusieurs méthodes de détection incluant celles sus-citées. Elle présente l’avantage d’être plus précise pendant la détection. Des outils tels que Snort et Zeek offrent des langages de bas niveau pour l’expression de règles de reconnaissance d’attaques. Le nombre d’attaques potentielles étant très grand, ces bases de règles deviennent rapidement difficiles à gérer et à maintenir. De plus, l’expression de règles avec état dit stateful est particulièrement ardue pour reconnaître une séquence d’événements. Dans cette thèse, nous proposons une approche stateful basée sur les diagrammes d’état-transition algébriques (ASTDs) afin d’identifier des attaques complexes. Les ASTDs permettent de représenter de façon graphique et modulaire une spécification, ce qui facilite la maintenance et la compréhension des règles. Nous étendons la notation ASTD avec de nouvelles fonctionnalités pour représenter des attaques complexes. Ensuite, nous spécifions plusieurs attaques avec la notation étendue et exécutons les spécifications obtenues sur des flots d’événements à l’aide d’un interpréteur pour identifier des attaques. Nous évaluons aussi les performances de l’interpréteur avec des outils industriels tels que Snort et Zeek. Puis, nous réalisons un compilateur afin de générer du code exécutable à partir d’une spécification ASTD, capable d’identifier de façon efficiente les séquences d’événements.Abstract : The cybersecurity ecosystem continuously evolves with the number, the diversity,
and the complexity of cyber attacks. Generally, we have three types of Intrusion
Detection System (IDS) : anomaly-based detection, signature-based detection, and
hybrid detection. Anomaly detection is based on the usual behavior description of
the system, typically in a static manner. It enables detecting known or unknown attacks
but also generating a large number of false positives. Signature based detection
enables detecting known attacks by defining rules that describe known attacker’s behavior.
It needs a good knowledge of attacker behavior. Hybrid detection relies on
several detection methods including the previous ones. It has the advantage of being
more precise during detection. Tools like Snort and Zeek offer low level languages to
represent rules for detecting attacks. The number of potential attacks being large,
these rule bases become quickly hard to manage and maintain. Moreover, the representation
of stateful rules to recognize a sequence of events is particularly arduous. In this thesis, we propose a stateful approach based on algebraic state-transition
diagrams (ASTDs) to identify complex attacks. ASTDs allow a graphical and modular
representation of a specification, that facilitates maintenance and understanding of
rules. We extend the ASTD notation with new features to represent complex attacks.
Next, we specify several attacks with the extended notation and run the resulting specifications
on event streams using an interpreter to identify attacks. We also evaluate
the performance of the interpreter with industrial tools such as Snort and Zeek. Then,
we build a compiler in order to generate executable code from an ASTD specification,
able to efficiently identify sequences of events
Reliable Malware Analysis and Detection using Topology Data Analysis
Increasingly, malwares are becoming complex and they are spreading on
networks targeting different infrastructures and personal-end devices to
collect, modify, and destroy victim information. Malware behaviors are
polymorphic, metamorphic, persistent, able to hide to bypass detectors and
adapt to new environments, and even leverage machine learning techniques to
better damage targets. Thus, it makes them difficult to analyze and detect with
traditional endpoint detection and response, intrusion detection and prevention
systems. To defend against malwares, recent work has proposed different
techniques based on signatures and machine learning. In this paper, we propose
to use an algebraic topological approach called topological-based data analysis
(TDA) to efficiently analyze and detect complex malware patterns. Next, we
compare the different TDA techniques (i.e., persistence homology, tomato, TDA
Mapper) and existing techniques (i.e., PCA, UMAP, t-SNE) using different
classifiers including random forest, decision tree, xgboost, and lightgbm. We
also propose some recommendations to deploy the best-identified models for
malware detection at scale. Results show that TDA Mapper (combined with PCA) is
better for clustering and for identifying hidden relationships between malware
clusters compared to PCA. Persistent diagrams are better to identify
overlapping malware clusters with low execution time compared to UMAP and
t-SNE. For malware detection, malware analysts can use Random Forest and
Decision Tree with t-SNE and Persistent Diagram to achieve better performance
and robustness on noised data
Responsible Design Patterns for Machine Learning Pipelines
Integrating ethical practices into the AI development process for artificial
intelligence (AI) is essential to ensure safe, fair, and responsible operation.
AI ethics involves applying ethical principles to the entire life cycle of AI
systems. This is essential to mitigate potential risks and harms associated
with AI, such as algorithm biases. To achieve this goal, responsible design
patterns (RDPs) are critical for Machine Learning (ML) pipelines to guarantee
ethical and fair outcomes. In this paper, we propose a comprehensive framework
incorporating RDPs into ML pipelines to mitigate risks and ensure the ethical
development of AI systems. Our framework comprises new responsible AI design
patterns for ML pipelines identified through a survey of AI ethics and data
management experts and validated through real-world scenarios with expert
feedback. The framework guides AI developers, data scientists, and
policy-makers to implement ethical practices in AI development and deploy
responsible AI systems in production.Comment: 20 pages, 4 figures, 5 table
Formal modeling of intrusion detection systems
L'écosystème de la cybersécurité évolue en permanence en termes du nombre, de la diversité, et de la complexité des attaques. De ce fait, les outils de détection deviennent inefficaces face à certaines attaques. On distingue généralement trois types de système de détection d'intrusions: détection par anomalies, détection par signatures et détection hybride. La détection par anomalies est fondée sur la caractérisation du comportement habituel du système, typiquement de manière statistique. Elle permet de détecter des attaques connues ou inconnues, mais génère aussi un très grand nombre de faux positifs. La détection par signatures permet de détecter des attaques connues en définissant des règles qui décrivent le comportement connu d'un attaquant. Cela demande une bonne connaissance du comportement de l'attaquant. La détection hybride repose sur plusieurs méthodes de détection incluant celles sus-citées. Elle présente l'avantage d'être plus précise pendant la détection. Des outils tels que Snort et Zeek offrent des langages de bas niveau pour l'expression de règles de reconnaissance d'attaques. Le nombre d'attaques potentielles étant très grand, ces bases de règles deviennent rapidement difficiles à gérer et à maintenir. De plus, l'expression de règles avec état dit stateful est particulièrement ardue pour reconnaître une séquence d'événements. Dans cette thèse, nous proposons une approche stateful afin d'identifier des attaques complexes. Nous considérons l'approche diagramme état-transition hiérarchique, en utilisant les ASTDs. Les ASTDs permettent de représenter de façon graphique et modulaire une spécification, ce qui facilite la maintenance et la compréhension des règles. Nous étendons la notation ASTD avec de nouvelles fonctionnalités pour représenter des attaques complexes. Ensuite, nous spécifions plusieurs attaques avec la notation étendue et exécutons les spécifications obtenues sur des flots d'événements à l'aide d'un interpréteur pour identifier des attaques. Nous évaluons aussi les performances de l'interpréteur avec des outils industriels tels que Snort et Zeek. Puis, nous réalisons un compilateur afin de générer du code exécutable à partir d'une spécification ASTD, capable d'identifier efficacement les séquences d'événements.The cybersecurity ecosystem continuously evolves with the number, the diversity, and the complexity of cyber attacks. Generally, we have three IDS types: anomaly-based detection, signature-based detection, and hybrid detection. Anomaly detection is based on the usual behavior description of the system, typically in a static manner. It enables detecting known or unknown attacks, but generating also a large number of false positives. Signature based detection enables detecting known attacks by defining rules that describe known attacker's behavior. It needs a good knowledge of attacker behavior. Hybrid detection relies on several detection methods including the previous ones. It has the advantage of being more precise during detection. Tools like Snort and Zeek offer low level languages to represent rules for detecting attacks. The number of potential attacks being large, these rule bases become quickly hard to manage and maintain. Moreover, the representation of stateful rules to recognize a sequence of events is particularly arduous. In this thesis, we propose a stateful approach to identify complex attacks. We consider the hierarchical state-transition diagram approach, using the ASTDs. ASTDs allow a graphical and modular representation of a specification that facilitates maintenance and understanding of rules. We extend the ASTD notation with new features to represent complex attacks. Next, we specify several attacks with the extended notation and run the resulting specifications on event streams using an interpreter to identify attacks. We also evaluate the performance of the interpreter with industrial tools such as Snort and Zeek. Then, we build a compiler in order to generate executable code from an ASTD specification, able to efficiently identify sequences of events
Threat Assessment in Machine Learning based Systems
Machine learning is a field of artificial intelligence (AI) that is becoming
essential for several critical systems, making it a good target for threat
actors. Threat actors exploit different Tactics, Techniques, and Procedures
(TTPs) against the confidentiality, integrity, and availability of Machine
Learning (ML) systems. During the ML cycle, they exploit adversarial TTPs to
poison data and fool ML-based systems. In recent years, multiple security
practices have been proposed for traditional systems but they are not enough to
cope with the nature of ML-based systems. In this paper, we conduct an
empirical study of threats reported against ML-based systems with the aim to
understand and characterize the nature of ML threats and identify common
mitigation strategies. The study is based on 89 real-world ML attack scenarios
from the MITRE's ATLAS database, the AI Incident Database, and the literature;
854 ML repositories from the GitHub search and the Python Packaging Advisory
database, selected based on their reputation. Attacks from the AI Incident
Database and the literature are used to identify vulnerabilities and new types
of threats that were not documented in ATLAS. Results show that convolutional
neural networks were one of the most targeted models among the attack
scenarios. ML repositories with the largest vulnerability prominence include
TensorFlow, OpenCV, and Notebook. In this paper, we also report the most
frequent vulnerabilities in the studied ML repositories, the most targeted ML
phases and models, the most used TTPs in ML phases and attack scenarios. This
information is particularly important for red/blue teams to better conduct
attacks/defenses, for practitioners to prevent threats during ML development,
and for researchers to develop efficient defense mechanisms
The Different Faces of AI Ethics Across the World: A Principle-Implementation Gap Analysis
Artificial Intelligence (AI) is transforming our daily life with several
applications in healthcare, space exploration, banking and finance. These rapid
progresses in AI have brought increasing attention to the potential impacts of
AI technologies on society, with ethically questionable consequences. In recent
years, several ethical principles have been released by governments, national
and international organisations. These principles outline high-level precepts
to guide the ethical development, deployment, and governance of AI. However,
the abstract nature, diversity, and context-dependency of these principles make
them difficult to implement and operationalize, resulting in gaps between
principles and their execution. Most recent work analysed and summarized
existing AI principles and guidelines but they did not provide findings on
principle-implementation gaps and how to mitigate them. These findings are
particularly important to ensure that AI implementations are aligned with
ethical principles and values. In this paper, we provide a contextual and
global evaluation of current ethical AI principles for all continents, with the
aim to identify potential principle characteristics tailored to specific
countries or applicable across countries. Next, we analyze the current level of
AI readiness and current implementations of ethical AI principles in different
countries, to identify gaps in the implementation of AI principles and their
causes. Finally, we propose recommendations to mitigate the
principle-implementation gaps