Search CORE

8 research outputs found

Classification de Logiciels Malveillants Dirigée par les Données et Assistée par des Méthodes d’Apprentissage Automatique

Author: Puodzius Cassius
Publication venue: HAL CCSD
Publication date: 19/12/2022
Field of study

Historically, malware (MW) analysis has heavily resorted to human savvy for manual signature creation to detect and classify MW.This procedure is very costly and time consuming, thus unable to cope with modern cyber threat scenario.The solution is to widely automate MW analysis.Toward this goal, MW classification allows optimizing the handling of large MW corpora by identifying resemblances across similar instances.Consequently, MW classification figures as a key activity related to MW analysis, which is paramount in the operation of computer security as a whole.This thesis addresses the problem of MW classification taking an approach in which human intervention is spared as much as possible.Furthermore, we steer clear of subjectivity inherent to human analysis by designing MW classification solely on data directly extracted from MW analysis, thus taking a data-driven approach.Our objective is to improve the automation of malware analysis and to combine it with machine learning methods that are able to autonomously spot and reveal unwitting commonalities within data.We phased our work in three stages.Initially we focused on improving MW analysis and its automation, studying new ways of leveraging symbolic execution in MW analysis and developing a distributed framework to scale up our computational power.Then we concentrated on the representation of MW behavior, with painstaking attention to its accuracy and robustness.Finally, we fixed attention on MW clustering, devising a methodology that has no restriction in the combination of syntactical and behavioral features and remains scalable in practice.As for our main contributions, we revamp the use of symbolic execution for MW analysis with special attention to the optimal use of SMT solver tactics and hyperparameter settings;we conceive a new evaluation paradigm for MW analysis systems;we formulate a compact graph representation of behavior, along with a corresponding function for pairwise similarity computation, which is accurate and robust;and we elaborate a new MW clustering strategy based on ensemble clustering that is flexible with respect to the combination of syntactical and behavioral features.Historiquement, l'analyse des logiciels malveillants (ou malware, MW) a fortement fait appel au savoir-faire humain pour la création manuelle de signatures permettant de détecter et de classer les MW.Cette procédure est très coûteuse et prend beaucoup de temps, ce qui ne permet pas de faire face aux scénario modernes de cybermenaces.La solution consiste à automatiser largement l'analyse des MW.Dans ce but, la classification des MW permet d'optimiser le traitement de grands corpus de MW en identifiant les ressemblances entre des instances similaires.La classification des MW est donc une activité clé liée à l'analyse des MW.Cette thèse aborde le problème de la classification des MW en adoptant une approche pour laquelle l'intervention humaine est évitée autant que possible.De plus, nous contournons la subjectivité inhérente à l'analyse humaine en concevant la classification uniquement à partir de données directement issues de l'analyse des MW, adoptant ainsi une approche dirigée par les données.Notre objectif est d'améliorer l'automatisation de l'analyse des MW et de la combiner avec des méthodes d'apprentissage automatique capables de repérer et de révéler de manière autonome des points communs imprévisibles au sein des données.Nous avons échelonné notre travail en trois étapes.Dans un premier temps, nous nous sommes concentrés sur l'amélioration de l'analyse des MW et sur son automatisation, étudiant de nouvelles façons d'exploiter l'exécution symbolique dans l'analyse des MW et développant un cadre d'exécution distribué pour augmenter notre puissance de calcul.Nous nous sommes ensuite concentrés sur la représentation du comportement des MW, en accordant une attention particulière à sa précision et à sa robustesse.Enfin, nous nous sommes focalisés sur le partitionnement des MW, en concevant une méthodologie qui qui ne restreint pas la combinaison des caractéristiques syntaxiques et comportementales, et qui monte bien en charge en pratique.Quant à nos principales contributions, nous revisitions l'usage de l'exécution symbolique pour l'analyse des MW en accordant une attention particulière à l'utilisation optimale des tactiques des solveurs SMT et aux réglages des hyperparamètres ;nous concevons un nouveau paradigme d'évaluation pour les systèmes d'analyse des MW ;nous formulons une représentation compacte du comportement sous la forme de graphe, ainsi qu'une fonction associée pour le calcul de la similarité par paire, qui est précise et robuste ;et nous élaborons une nouvelle stratégie de partitionnement des MW basée sur un partitionnement d'ensemble flexible en ce qui concerne la combinaison des caractéristiques syntaxiques et comportementales

HAL-CentraleSupelec

Thèses en Ligne

INRIA a CCSD electronic archive server

Theses.fr

HAL-Rennes 1

Accurate and Robust Malware Analysis through Similarity of External Calls Dependency Graphs (ECDG)

Author: Heuser Annelie
Noureddine Lamine
Puodzius Cassius
Zendra Olivier
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/08/2021
Field of study

The authors received the price of Best Paper Award IWCC 2021 for this presentation performed in the workshop.International audienceMalware is a primary concern in cybersecurity, being one of the attacker's favorite cyberweapons. Over time, malware evolves not only in complexity but also in diversity and quantity. Malware analysis automation is thus crucial. In this paper we present ECDGs, a shorter call graph representation, and a new similarity function that is accurate and robust. Toward this goal, we revisit some principles of malware analysis research to define basic primitives and an evaluation paradigm addressed for the setup of more reliable experiments. Our benchmark shows that our similarity function is very efficient in practice, achieving speedup rates of 3.30x and 354, 11x wrt. radiff2 for the standard and the cache-enhanced implementations, respectively. Our evaluations generate clusters that produce almost unerring results-homogeneity score of 0.983 for the accuracy phase-and marginal information loss for a highly polluted dataset-NMI score of 0.974 between initial and final clusters of the robustness phase. Overall, ECDGs and our similarity function enable autonomous frameworks for malware search and clustering that can assist human-based analysis or improve classification models for malware analysis

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1

SE-PAC: A Self-Evolving PAcker Classifier against rapid packers evolution

Author: Heuser Annelie
Noureddine Lamine
Puodzius Cassius
Zendra Olivier
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/04/2021
Field of study

International audiencePackers are widespread tools used by malware authors to hinder static malware detection and analysis. Identifying the packer used to pack a malware is essential to properly unpack and analyze the malware, be it manually or automatically. While many well-known packers are used, there is a growing trend for new custom packers that make malware analysis and detection harder. Research works have been very effective in identifying known packers or their variants, with signature-based, supervised machine learning or similarity-based techniques. However, identifying new packer classes remains an open problem. This paper presents a self-evolving packer classifier that provides an effective, incremental, and robust solution to cope with the rapid evolution of packers. We propose a composite pairwise distance metric combining different types of packer features. We derive an incremental clustering approach able to identify both (variants of) known packer classes and new ones, as well as to update clusters automatically and efficiently. Our system thus continuously enhances, integrates, adapts and evolves packer knowledge. Moreover, to optimize post clustering packer processing costs, we introduce a new post clustering strategy for selecting small subsets of relevant samples from the clusters. Our approach effectiveness and time-resilience are assessed with: 1) a real-world malware feed dataset composed of 16k packed binaries, comprising 29 unique packers, and 2) a synthetic dataset composed of 19k manually crafted packed binaries, comprising 31 unique packers (including custom ones)

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Optimizing symbolic execution for malware behavior classification

Author: Baranov Eduard
Biondi Fabrizio
Decourbe Olivier
Given-Wilson Thomas
Legay Axel
Puodzius Cassius
Quilbeuf Jean
Sebastio Stefano
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Increasingly software correctness, reliability, and security is being analyzed using tools that combine various formal and heuristic approaches. Often such analysis becomes expensive in terms of time and at the cost of high quality results. In this experience report we explore the tuning and optimization of the tools underlying binary malware detection and classification. We identify heuristics and SMT solver tactics for the effective symbolic execution of binary files. We combine these with effective heuristics for the construction of behavioral signatures of programs that can be used for a supervised learning multi-class malware classifier. Further, a set of experiments following the full-factorial design allowed us to identify the correlations between heuristics and the overall performance of the classifier

DIAL UCLouvain

Shorter hash-based signatures

Author: Buchmann
Buchmann
Buchmann
Buchmann
Busold
Cassius Puodzius
Dahmen
Eisenbarth
Eisenbarth
Geovandro C.C.F. Pereira
Halevi
Hoffstein
Hülsing
Hülsing
Mateus
Misoczki
Naor
Paulo S.L.M. Barreto
Rev
Rohde
Schnorr
Yuval
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref