5 research outputs found
A Review on Malware Analysis by using an Approach of Machine Learning Techniques
In the Internet age, malware (such as viruses, trojans, ransomware, and bots) has posed serious andevolving security threats to Internet users. To protect legitimate users from these threats, anti-malware softwareproducts from different companies, including Comodo, Kaspersky, Kingsoft, and Symantec, provide the majordefense against malware. Unfortunately, driven by the economic benefits, the number of new malware sampleshas explosively increased: anti-malware vendors are now confronted with millions of potential malware samplesper year. In order to keep on combating the increase in malware samples, there is an urgent need to developintelligent methods for effective and efficient malware detection from the real and large daily sample collection.One of the most common approaches in literature is using machine learning techniques, to automatically learnmodels and patterns behind such complexity, and to develop technologies to keep pace with malware evolution.This survey aims at providing an overview on the way machine learning has been used so far in the context ofmalware analysis in Windows environments. This paper gives an survey on the features related to malware filesor documents and what machine learning techniques they employ (i.e., what algorithm is used to process the inputand produce the output). Different issues and challenges are also discussed
Malware Analysis with Machine Learning
Tese de mestrado, Segurança Informática, Universidade de Lisboa, Faculdade de Ciências, 2022Malware attacks have been one of the most serious cyber risks in recent years. Almost every week, the
number of vulnerability reports is increasing in the security communities. One of the key causes for the
exponential growth is the fact that malware authors started introducing mutations to avoid detection.
This means that malicious files from the same malware family, with the same malicious behaviour, are
constantly modified or obfuscated using a variety of technics to make them appear to be different.
Characteristics retrieved from raw binary files or disassembled code are used in existing machine
learning-based malware categorization algorithms. The variety of such attributes has made it difficult to
develop generic malware categorization methods that operate well in a variety of operating scenarios.
To be effective in evaluating and categorizing such enormous volumes of data, it is necessary
to divide them into groups and identify their respective families based on their behaviour. Malicious
software is converted to a greyscale image representation, due to the possibility to capture subtle changes
while keeping the global structure helps to detect variations. Motivated by the Machine Learning results
achieved in the ImageNet challenge, this dissertation proposes an agnostic deep learning solution, for
efficiently classifying malware into families based on a collection of discriminant patterns retrieved
from its visualization as images.
In this thesis, we present Malwizard, an adaptable Python solution suited for companies or end users, that allows them to automatically obtain a fast malware analysis. The solution was implemented
as an Outlook add-in and an API service for the SOAR platforms, as emails are the first vector for this
type of attack, with companies being the most attractive targets.
The Microsoft Classification Challenge dataset was used in the evaluation of the noble
approach. Therefore, its image representation was ciphered and generated the correspondent ciphered
image to evaluate if the same patterns could be identified using traditional machine learning techniques.
Thus, allowing the privacy concerns to be addressed, maintaining the data analysed by neural networks
secure to unauthorized parties.
Experimental comparison demonstrates the noble approach performed close to the best analysed
model on a plain text dataset, completing the task in one-third of the time. Regarding the encrypted
dataset, classical techniques need to be adapted in order to be efficient
Convolutional neural networks for malware classification
According to AV vendors malicious software has been growing exponentially
last years. One of the main reasons for these high volumes is that in order
to evade detection, malware authors started using polymorphic and metamorphic
techniques. As a result, traditional signature-based approaches to
detect malware are being insufficient against new malware and the categorization
of malware samples had become essential to know the basis of the
behavior of malware and to fight back cybercriminals.
During the last decade, solutions that fight against malicious software had
begun using machine learning approaches. Unfortunately, there are few opensource
datasets available for the academic community. One of the biggest
datasets available was released last year in a competition hosted on Kaggle
with data provided by Microsoft for the Big Data Innovators Gathering
(BIG 2015). This thesis presents two novel and scalable approaches using
Convolutional Neural Networks (CNNs) to assign malware to its corresponding
family. On one hand, the first approach makes use of CNNs to learn a
feature hierarchy to discriminate among samples of malware represented as
gray-scale images. On the other hand, the second approach uses the CNN
architecture introduced by Yoon Kim [12] to classify malware samples according
their x86 instructions. The proposed methods achieved an improvement
of 93.86% and 98,56% with respect to the equal probability benchmark