Search CORE

247 research outputs found

Exploring the Usage of Topic Modeling for Android Malware Static Analysis

Author: Medvet Eric
Mercaldo Francesco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

The rapid growth in smartphone and tablet usage over the last years has led to the inevitable rise in targeting of these devices by cyber-criminals. The exponential growth of Android devices, and the buoyant and largely unregulated Android app market, produced a sharp rise in malware targeting that platform. Furthermore, malware writers have been developing detection-evasion techniques which rapidly make anti-malware technologies ineffective. It is hence advisable that security expert are provided with tools which can aid them in the analysis of existing and new Android malware. In this paper, we explore the use of topic modeling as a technique which can assist experts to analyse malware applications in order to discover their characteristic. We apply Latend Dirichlet Allocation (LDA) to mobile applications represented as opcode sequences, hence considering a topic as a discrete distribution of opcode. Our experiments on a dataset of 900 malware applications of different families show that the information provided by topic modeling may help in better understanding malware characteristics and similarities

Archivio istituzionale della ricerca - Università di Trieste

Università degli Studi del Molise: IRIS

Crossref

Machine Learning Aided Static Malware Analysis: A Survey and Tutorial

Author: Andrii Shalaginov
D Krishna Sandeep Reddy
Farid Daryabar
Igor Santos
Reinaldo Jose Mangialardo
Smita Naval
Steve Watson
Teuvo Kohonen
Yanfang Ye
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/08/2018
Field of study

Malware analysis and detection techniques have been evolving during the last decade as a reflection to development of different malware techniques to evade network-based and host-based security protections. The fast growth in variety and number of malware species made it very difficult for forensics investigators to provide an on time response. Therefore, Machine Learning (ML) aided malware analysis became a necessity to automate different aspects of static and dynamic malware investigation. We believe that machine learning aided static analysis can be used as a methodological approach in technical Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware analysis that has been thoroughly studied before. In this paper, we address this research gap by conducting an in-depth survey of different machine learning methods for classification of static characteristics of 32-bit malicious Portable Executable (PE32) Windows files and develop taxonomy for better understanding of these techniques. Afterwards, we offer a tutorial on how different machine learning techniques can be utilized in extraction and analysis of a variety of static characteristic of PE binaries and evaluate accuracy and practical generalization of these techniques. Finally, the results of experimental study of all the method using common data was given to demonstrate the accuracy and complexity. This paper may serve as a stepping stone for future researchers in cross-disciplinary field of machine learning aided malware forensics.Comment: 37 Page

arXiv.org e-Print Archive

Crossref

N-opcode Analysis for Android Malware Classification and Categorization

Author: Kang B.
McLaughlin K.
Sezer Sakir
Yerima Suleiman
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Malware detection is a growing problem particularly on the Android mobile platform due to its increasing popularity and accessibility to numerous third party app markets. This has also been made worse by the increasingly sophisticated detection avoidance techniques employed by emerging malware families. This calls for more effective techniques for detection and classification of Android malware. Hence, in this paper we present an n-opcode analysis based approach that utilizes machine learning to classify and categorize Android malware. This approach enables automated feature discovery that eliminates the need for applying expert or domain knowledge to define the needed features. Our experiments on 2520 samples that were performed using up to 10-gram opcode features showed that an f-measure of 98% is achievable using this approach

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Crossref

De Montfort University Open Research Archive

Effectiveness of Opcode ngrams for Detection of Multi Family Android Malware

Author: Canfora Gerardo
De Lorenzo Andrea
Medvet Eric
Mercaldo Francesco
Visaggio Corrado Aaron
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

With the wide diffusion of smartphones and their usage in a plethora of processes and activities, these devices have been handling an increasing variety of sensitive resources. Attackers are hence producing a large number of malware applications for Android (the most spread mobile platform), often by slightly modifying existing applications, which results in malware being organized in families. Some works in the literature showed that opcodes are informative for detecting malware, not only in the Android platform. In this paper, we investigate if frequencies of ngrams of opcodes are effective in detecting Android malware and if there is some significant malware family for which they are more or less effective. To this end, we designed a method based on state-of-the-art classifiers applied to frequencies of opcodes ngrams. Then, we experimentally evaluated it on a recent dataset composed of 11120 applications, 5560 of which are malware belonging to several different families. Results show that an accuracy of 97% can be obtained on the average, whereas perfect detection rate is achieved for more than one malware family

Archivio istituzionale della ricerca - Università di Trieste

Crossref

Op2Vec: An Opcode Embedding Technique and Dataset Design for End-to-End Detection of Android Malware

Author: Ali Sikandar
Ghani Anwar
Khan Kaleem Nawaz
Khan Muhammad Salman
Nauman Mohammad
Ullah Najeeb
Publication venue: 'Hindawi Limited'
Publication date: 01/03/2022
Field of study

Android is one of the leading operating systems for smart phones in terms of market share and usage. Unfortunately, it is also an appealing target for attackers to compromise its security through malicious applications. To tackle this issue, domain experts and researchers are trying different techniques to stop such attacks. All the attempts of securing Android platform are somewhat successful. However, existing detection techniques have severe shortcomings, including the cumbersome process of feature engineering. Designing representative features require expert domain knowledge. There is a need for minimizing human experts' intervention by circumventing handcrafted feature engineering. Deep learning could be exploited by extracting deep features automatically. Previous work has shown that operational codes (opcodes) of executables provide key information to be used with deep learning models for detection process of malicious applications. The only challenge is to feed opcodes information to deep learning models. Existing techniques use one-hot encoding to tackle the challenge. However, the one-hot encoding scheme has severe limitations. In this paper, we introduce; (1) a novel technique for opcodes embedding, which we name Op2Vec, (2) based on the learned Op2Vec we have developed a dataset for end-to-end detection of android malware. Introducing the end-to-end Android malware detection technique avoids expert-intensive handcrafted features extraction, and ensures automation. Some of the recent deep learning-based techniques showed significantly improved results when tested with the proposed approach and achieved an average detection accuracy of 97.47%, precision of 0.976 and F1 score of 0.979

arXiv.org e-Print Archive

Malware Detection Approaches based on Operational Codes (OpCodes) of Executable Programs: A Review

Author: Saleh Mohammed A.
Publication venue: IAES Indonesia Section
Publication date: 30/06/2023
Field of study

A malicious software, or Malware for a short, poses a threat to computer systems, which need to be analyzed, detected, and eliminated. Generally, malware is analyzed in two ways: dynamic malware analysis and static malware analysis. The former collects features dataset during running of the malware, and involves malware APIs, registry activities, file activities, process activities, and network activities based features. The latter collects features dataset prior and without running the malware, and involves Operational Codes (OpCodes) and text based (Bytecodes) features. However, several previous researchers addressed and reviewed malware detection approaches based on various aspects, but none of them addressed and reviewed the approaches merely based on malware OpCodes. Therefore, this paper aims to review Malware Detection Approaches based on OpCodes. The review explores, demonstrates, and compares the existing approaches for detecting malware according to their OpCodes only, and finally presents a comprehensive comparable envisage about them

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)

Recommended from our members

Robust behavioral malware detection

Author: Kazdagli Mikhail
Publication venue
Publication date: 13/08/2018
Field of study

Computer security attacks evolve to evade deployed defenses. Recent attacks have ranged from exploiting generic software vulnerabilities in memory-unsafe languages such as buffer overflows and format string vulnerabilities to exploiting logic errors in web applications, through means such as SQL injection and cross-site scripting. Furthermore, recent attacks have focused on escalating privileges and stealing sensitive information by exploiting new hardware or operating system (OS) interfaces. Computer security attacks are also now relying on social engineering techniques to run malicious programs on victims' machines; instances of such abuse include phishing and watering hole attacks, both of which trick people into running malicious code or divulging confidential information. Thus, traditional computer security methods, such as OS confinement and program analysis, will not prevent new attacks that do not violate OS confinement or present illegal program behaviors. Another challenge is that traditional security approaches have large trusted code bases (TCBs), which include hardware, OSs, and other software components that implement authentication and authorization logic across a distributed system. This is a vulnerable area because these components are complex and often contain vulnerabilities that undermine the overall system's integrity or confidentiality. Evasive attacks on vulnerable systems -- especially in instances where trusted components turn malicious -- inspire the creation of defenses that can augment formally specified mechanisms against known threats. Specifically, this thesis advances the state of the art in behavioral malware detection -- detecting previously unknown malware in the very early stages of infection within an enterprise network. Here we assess three fundamental insights of modern-day attacks and then describe a cross-layer defense against such attacks. First, we make a low-level machine state visible to behavioral analysis, significantly minimizing the TCB and its associated vulnerabilities. Specifically, our behavioral detector utilizes an executable code's dynamic properties, with architectural and micro-architectural states as input. Second, we evaluate behavioral detectors against adaptive adversaries. For this purpose, we introduce a new metric to determine a detector's robustness against malware modifications, which serves as a step toward explainability of machine learning-based malware detectors. Finally, we exploit the fact that attacks spread through only a limited number of vectors and propose new techniques to analyze the resulting dynamic correlations created among machines. These insights show that behavioral detectors can efficiently protect both individual devices and end hosts within enterprise networks. We present three types of such behavioral detectors. Sherlock protects resource-constrained devices, such as mobile phones and Internet-of-things (IoT) devices, without modifying the software/hardware stack. Sherlock's supervised and unsupervised versions outperform prior work by 24.7% and 12.5% (area under the curve (AUC) metric), respectively, and detects stealthy malware that often evades static analysis tools. The second behavioral detector, Shape-GD, protects devices within an enterprise network. It monitors devices on the network, aggregates data from weak local detectors, overlays that with network-level information, and then makes early, robust predictions regarding malicious activity. Shape-GD achieves its goals by exploiting latent attack semantics. Specifically, it analyzes communication patterns across multiple devices, partitioning them into neighborhoods. Devices within the same neighborhood are likely to be exposed to the same attack vector. Furthermore, we hypothesize that the conditional distribution of false positives is different from that of true positives; i.e., given a neighborhood of nodes, we can compute the aggregate distributional shape of alert feature vectors from the neighborhood itself and provide robust labels. We evaluate Shape-GD by emulating a large community of Windows systems using the system call traces from a few thousand malicious and benign applications; we simulate both a phishing attack in a corporate email network as well as a watering hole attack through a popular website. In both scenarios, Shape-GD identifies malware early on (~100 infected nodes in a ~100K-node system for watering hole attacks, and ~10 of ~1,000 for phishing attacks) and robustly (with ~100% global true-positive and ~1% global false-positive rates). The third behavioral detector, Centurion, detects malware across machines monitored by an anti-virus company. It is able to analyze behavior from 5 million Symantec client machines in real time and discovers malware by correlating file downloads across multiple machines. Compared with a recent local detector that analyzes metadata from file downloads, Centurion reduced the number of false positives from ~1M to ~110K and increased the true-positive rate by a factor of ~2.5. In addition, on average, Centurion detects malware 345 days earlier than commercial anti-virus products.Electrical and Computer Engineerin

Texas ScholarWorks

The Dark Side(-Channel) of Mobile Devices: A Survey on Network Traffic Analysis

Author: Conti Mauro
Li QianQian
Maragno Alberto
Spolaor Riccardo
Publication venue
Publication date: 01/01/2018
Field of study

In recent years, mobile devices (e.g., smartphones and tablets) have met an increasing commercial success and have become a fundamental element of the everyday life for billions of people all around the world. Mobile devices are used not only for traditional communication activities (e.g., voice calls and messages) but also for more advanced tasks made possible by an enormous amount of multi-purpose applications (e.g., finance, gaming, and shopping). As a result, those devices generate a significant network traffic (a consistent part of the overall Internet traffic). For this reason, the research community has been investigating security and privacy issues that are related to the network traffic generated by mobile devices, which could be analyzed to obtain information useful for a variety of goals (ranging from device security and network optimization, to fine-grained user profiling). In this paper, we review the works that contributed to the state of the art of network traffic analysis targeting mobile devices. In particular, we present a systematic classification of the works in the literature according to three criteria: (i) the goal of the analysis; (ii) the point where the network traffic is captured; and (iii) the targeted mobile platforms. In this survey, we consider points of capturing such as Wi-Fi Access Points, software simulation, and inside real mobile devices or emulators. For the surveyed works, we review and compare analysis techniques, validation methods, and achieved results. We also discuss possible countermeasures, challenges and possible directions for future research on mobile traffic analysis and other emerging domains (e.g., Internet of Things). We believe our survey will be a reference work for researchers and practitioners in this research field.Comment: 55 page

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Padova