518 research outputs found
Machine Learning Aided Static Malware Analysis: A Survey and Tutorial
Malware analysis and detection techniques have been evolving during the last
decade as a reflection to development of different malware techniques to evade
network-based and host-based security protections. The fast growth in variety
and number of malware species made it very difficult for forensics
investigators to provide an on time response. Therefore, Machine Learning (ML)
aided malware analysis became a necessity to automate different aspects of
static and dynamic malware investigation. We believe that machine learning
aided static analysis can be used as a methodological approach in technical
Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware
analysis that has been thoroughly studied before. In this paper, we address
this research gap by conducting an in-depth survey of different machine
learning methods for classification of static characteristics of 32-bit
malicious Portable Executable (PE32) Windows files and develop taxonomy for
better understanding of these techniques. Afterwards, we offer a tutorial on
how different machine learning techniques can be utilized in extraction and
analysis of a variety of static characteristic of PE binaries and evaluate
accuracy and practical generalization of these techniques. Finally, the results
of experimental study of all the method using common data was given to
demonstrate the accuracy and complexity. This paper may serve as a stepping
stone for future researchers in cross-disciplinary field of machine learning
aided malware forensics.Comment: 37 Page
Malware detection based on dynamic analysis features
The widespread usage of mobile devices and their seamless adaptation to each users' needs by the means of useful applications (Apps), makes them a prime target for malware developers to get access to sensitive user data, such as banking details, or to hold data hostage and block user access. These apps are distributed in marketplaces that host millions and therefore have their own forms of automated malware detection in place in order to deter malware developers and keep their app store (and reputation) trustworthy, but there are still a number of apps that are able to bypass these detectors and remain available in the marketplace for any user to download. Current malware detection strategies rely mostly on using features extracted statically, dynamically or a conjunction of both, and making them suitable for machine learning applications, in order to scale detection to cover the number of apps that are submited to the marketplace. In this article, the main focus is the study of the effectiveness of these automated malware detection methods and their ability to keep up with the proliferation of new malware and its ever-shifting trends. By analising the performance of ML algorithms trained, with real world data, on diferent time periods and time scales with features extracted statically, dynamically and from user-feedback, we are able to identify the optimal setup to maximise malware detection.O uso generalizado de dispositivos móveis e sua adaptação perfeita à s necessidades de cada utilizador por meio de aplicativos úteis (Apps) tornam-os um alvo principal para que criadores de malware obtenham acesso a dados confidenciais do usuário, como detalhes bancários, ou para reter dados e bloquear o acesso do utilizador. Estas apps são distribuÃdas em mercados que alojam milhões, e portanto, têm as suas próprias formas de detecção automatizada de malware, a fim de dissuadir os desenvolvedores de malware e manter sua loja de apps (e reputação) confiável, mas ainda existem várias apps capazes de ignorar esses detectores e permanecerem disponÃveis no mercado para qualquer utilizador fazer o download. As estratégias atuais de detecção de malware dependem principalmente do uso de recursos extraÃdos estaticamente, dinamicamente ou de uma conjunção de ambos, e de torná-los adequados para aplicações de aprendizagem automática, a fim de dimensionar a detecção para cobrir o número de apps que são enviadas ao mercado. Neste artigo, o foco principal é o estudo da eficácia dos métodos automáticos de detecção de malware e as suas capacidades de acompanhar a popularidade de novo malware, bem como as suas tendências em constante mudança. Analisando o desempenho de algoritmos de ML treinados, com dados do mundo real, em diferentes perÃodos e escalas de tempo com recursos extraÃdos estaticamente, dinamicamente e com feedback do utilizador, é possÃvel identificar a configuração ideal para maximizar a detecção de malware
Neural malware detection
At the heart of today’s malware problem lies theoretically infinite diversity created by metamorphism. The majority of conventional machine learning techniques tackle the problem with the assumptions that a sufficiently large number of training samples exist and that the training set is independent and identically distributed. However, the lack of semantic features combined with the models under these wrong assumptions result largely in overfitting with many false positives against real world samples, resulting in systems being left vulnerable to various adversarial attacks. A key observation is that modern malware authors write a script that automatically generates an arbitrarily large number of diverse samples that share similar characteristics in program logic, which is a very cost-effective way to evade detection with minimum effort. Given that many malware campaigns follow this paradigm of economic malware manufacturing model, the samples within a campaign are likely to share coherent semantic characteristics. This opens up a possibility of one-to-many detection. Therefore, it is crucial to capture this non-linear metamorphic pattern unique to the campaign in order to detect these seemingly diverse but identically rooted variants. To address these issues, this dissertation proposes novel deep learning models, including generative static malware outbreak detection model, generative dynamic malware detection model using spatio-temporal isomorphic dynamic features, and instruction cognitive malware detection. A comparative study on metamorphic threats is also conducted as part of the thesis. Generative adversarial autoencoder (AAE) over convolutional network with global average pooling is introduced as a fundamental deep learning framework for malware detection, which captures highly complex non-linear metamorphism through translation invariancy and local variation insensitivity. Generative Adversarial Network (GAN) used as a part of the framework enables oneshot training where semantically isomorphic malware campaigns are identified by a single malware instance sampled from the very initial outbreak. This is a major innovation because, to the best of our knowledge, no approach has been found to this challenging training objective against the malware distribution that consists of a large number of very sparse groups artificially driven by arms race between attackers and defenders. In addition, we propose a novel method that extracts instruction cognitive representation from uninterpreted raw binary executables, which can be used for oneto- many malware detection via one-shot training against frequency spectrum of the Transformer’s encoded latent representation. The method works regardless of the presence of diverse malware variations while remaining resilient to adversarial attacks that mostly use random perturbation against raw binaries. Comprehensive performance analyses including mathematical formulations and experimental evaluations are provided, with the proposed deep learning framework for malware detection exhibiting a superior performance over conventional machine learning methods. The methods proposed in this thesis are applicable to a variety of threat environments here artificially formed sparse distributions arise at the cyber battle fronts.Doctor of Philosoph
Performance of Malware Classification on Machine Learning using Feature Selection
The exponential growth of malware has created a significant threat in our daily lives, which heavily rely on computers running all kinds of software. Malware writers create malicious software by creating new variants, new innovations, new infections and more obfuscated malware by using techniques such as packing and encrypting techniques. Malicious software classification and detection play an important role and a big challenge for cyber security research. Due to the increasing rate of false alarm, the accurate classification and detection of malware is a big necessity issue to be solved. In this research, eight malware family have been classifying according to their family the research provides four feature selection algorithms to select best feature for multiclass classification problem. Comparing. Then find these algorithms top 100 features are selected to performance evaluations. Five machine learning algorithms is compared to find best models. Then frequency distribution of features are find by feature ranking of best model. At last it is said that frequency distribution of every character of API call sequence can be used to classify malware family
Signal processing for malware analysis
This Project is an experimental analysis of Android malware through images. The analysis
is based on classifying the malware into families or differentiating between goodware and
malware. This analysis has been done considering two approaches. These two
approaches have a common starting point, which is the transformation of Android
applications into PNG images. After this conversion, the first approach was subtracting
each image from the testing set with the images of the training set, in order to establish
which unknown malware belongs to a specific family or to distinguish between goodware
and malware. Although the accuracy was higher than the one defined in the
requirements, this approach was a time consuming task, so we consider another
approach to reduce the time and get the same or better accuracy. The second approach
was extracting features from all the images and then using a machine learning classifier
to get a precise differentiation. After this second approach, the resulting time for 100,000
samples was less than 4 hours and the accuracy 83.04%, which fulfill the requirements
specified.
To perform the analysis, we have used two heterogeneous datasets. The Malgenome
dataset which contains 49 kinds of malware Android applications (49 malware families). It
was used to perform the measurements and the different tests. The M0droid dataset,
which contains goodware and malware Android applications. It was used to corroborate
the previous analysis.Este proyecto es un análisis experimental de aplicaciones de Android mediante
imágenes. Este análisis se basa en clasificar las imágenes en familias o en diferenciarlas
entre goodware o malware. Para ello, se han considerado dos enfoques. Estas dos
aproximaciones tienen como punto en común la transformación de las aplicaciones de
Android en imágenes de tipo PNG. Después de este proceso de transformación a
imágenes, la primera aproximación se basó en restar cada imagen perteneciente al
grupo de pruebas con las imágenes del grupo de entrenamiento, de esta forma se pudo
saber la familia a la que pertenecÃa cada malware desconocido o distinguir entre
aplicaciones goodware y malware. Sin embargo, a pesar de que la precisión de acierto
era más alta que la definida en los requisitos, este enfoque era una tarea que consumÃa
mucho tiempo, asà que consideramos otra aproximación para reducir el tiempo y
conseguir una precisión parecida o mejor que la anterior. Este segundo enfoque fue
extraer las caracterÃsticas de las imágenes para después usar un clasificador y asÃ
obtener una diferenciación precisa. Con esta segunda aproximación, conseguimos un
tiempo total menor a las 4 horas para 100000 muestras con una precisión del 83.04%,
cumpliendo y superando de esta forma los requisitos que habÃan sido especificados.
Este análisis se ha llevado a cabo usando dos sets de datos heterogéneos. Uno de ellos
fue el perteneciente a un proyecto llamado Malgenome, éste contiene 49 tipos de
familias de malware en Android. El set de datos de Malgenome se usó para realizar los
diferentes ensayos o pruebas y sobre el que se realizaron las medidas de tiempo y
precisión. El set de datos de M0droid se usó para corroborar el análisis previo y asÃ
establecer una clasificación final.IngenierÃa Informátic
Computing Adaptive Feature Weights with PSO to Improve Android Malware Detection
© 2017 Yanping Xu et al. Android malware detection is a complex and crucial issue. In this paper, we propose a malware detection model using a support vector machine (SVM) method based on feature weights that are computed by information gain (IG) and particle swarm optimization (PSO) algorithms. The IG weights are evaluated based on the relevance between features and class labels, and the PSO weights are adaptively calculated to result in the best fitness (the performance of the SVM classification model). Moreover, to overcome the defects of basic PSO, we propose a new adaptive inertia weight method called fitness-based and chaotic adaptive inertia weight-PSO (FCAIW-PSO) that improves on basic PSO and is based on the fitness and a chaotic term. The goal is to assign suitable weights to the features to ensure the best Android malware detection performance. The results of experiments indicate that the IG weights and PSO weights both improve the performance of SVM and that the performance of the PSO weights is better than that of the IG weights
Novel Attacks and Defenses in the Userland of Android
In the last decade, mobile devices have spread rapidly, becoming more and more part of our everyday lives; this is due to their feature-richness, mobility, and affordable price. At the time of writing, Android is the leader of the market among operating systems, with a share of 76% and two and a half billion active Android devices around the world. Given that such small devices contain a massive amount of our private and sensitive information, the economic interests in the mobile ecosystem skyrocketed. For this reason, not only legitimate apps running on mobile environments have increased dramatically, but also malicious apps have also been on a steady rise. On the one hand, developers of mobile operating systems learned from security mistakes of the past, and they made significant strides in blocking those threats right from the start. On the other hand, these high-security levels did not deter attackers. In this thesis, I present my research contribution about the most meaningful attack and defense scenarios in the userland of the modern Android operating system. I have emphasized "userland'' because attack and defense solutions presented in this thesis are executing in the userspace of the operating system, due to the fact that Android is slightly different from traditional operating systems. After the necessary technical background, I show my solution, RmPerm, in order to enable Android users to better protect their privacy by selectively removing permissions from any app on any Android version. This operation does not require any modification to the underlying operating system because we repack the original application. Then, using again repackaging, I have developed Obfuscapk; it is a black-box obfuscation tool that can work with every Android app and offers a free solution with advanced state of the art obfuscation techniques -- especially the ones used by malware authors. Subsequently, I present a machine learning-based technique that focuses on the identification of malware in resource-constrained devices such as Android smartphones. This technique has a very low resource footprint and does not rely on resources outside the protected device. Afterward, I show how it is possible to mount a phishing attack -- the historically preferred attack vector -- by exploiting two recent Android features, initially introduced in the name of convenience. Although a technical solution to this problem certainly exists, it is not solvable from a single entity, and there is the need for a push from the entire community. But sometimes, even though there exists a solution to a well-known vulnerability, developers do not take proper precautions. In the end, I discuss the Frame Confusion vulnerability; it is often present in hybrid apps, and it was discovered some years ago, but I show how it is still widespread. I proposed a methodology, implemented in the FCDroid tool, for systematically detecting the Frame Confusion vulnerability in hybrid Android apps. The results of an extensive analysis carried out through FCDroid on a set of the most downloaded apps from the Google Play Store prove that 6.63% (i.e., 1637/24675) of hybrid apps are potentially vulnerable to Frame Confusion. The impact of such results on the Android users' community is estimated in 250.000.000 installations of vulnerable apps
pDroid
When an end user attempts to download an app on the Google Play Store they receive two related items that can be used to assess the potential threats of an application, the list of permissions used by the application and the textual description of the application. However, this raises several concerns. First, applications tend to use more permissions than they need and end users are not tech-savvy enough to fully understand the security risks. Therefore, it is challenging to assess the threats of an application fully by only seeing the permissions. On the other hand, most textual descriptions do not clearly define why they need a particular permission. These two issues conjoined make it difficult for end users to accurately assess the security threats of an application. This has lead to a demand for a framework that can accurately determine if a textual description adequately describes the actual behavior of an application. In this Master Thesis, we present pDroid (short for privateDroid), a market-independent framework that can compare an Android application’s textual description to its internal behavior. We evaluated pDroid using 1562 benign apps and 243 malware samples, and pDroid correctly classified 91.4% of malware with a false positive rate of 4.9%
- …