Search CORE

373 research outputs found

A Multi-view Context-aware Approach to Android Malware Detection and Malicious Code Localization

Author: Chandramohan Mahinthan
Chen Lihui
Liu Yang
Narayanan Annamalai
Publication venue
Publication date: 01/01/2017
Field of study

Existing Android malware detection approaches use a variety of features such as security sensitive APIs, system calls, control-flow structures and information flows in conjunction with Machine Learning classifiers to achieve accurate detection. Each of these feature sets provides a unique semantic perspective (or view) of apps' behaviours with inherent strengths and limitations. Meaning, some views are more amenable to detect certain attacks but may not be suitable to characterise several other attacks. Most of the existing malware detection approaches use only one (or a selected few) of the aforementioned feature sets which prevent them from detecting a vast majority of attacks. Addressing this limitation, we propose MKLDroid, a unified framework that systematically integrates multiple views of apps for performing comprehensive malware detection and malicious code localisation. The rationale is that, while a malware app can disguise itself in some views, disguising in every view while maintaining malicious intent will be much harder. MKLDroid uses a graph kernel to capture structural and contextual information from apps' dependency graphs and identify malice code patterns in each view. Subsequently, it employs Multiple Kernel Learning (MKL) to find a weighted combination of the views which yields the best detection accuracy. Besides multi-view learning, MKLDroid's unique and salient trait is its ability to locate fine-grained malice code portions in dependency graphs (e.g., methods/classes). Through our large-scale experiments on several datasets (incl. wild apps), we demonstrate that MKLDroid outperforms three state-of-the-art techniques consistently, in terms of accuracy while maintaining comparable efficiency. In our malicious code localisation experiments on a dataset of repackaged malware, MKLDroid was able to identify all the malice classes with 94% average recall

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

A Pre-Trained BERT Model for Android Applications

Author: Allix Kevin
Bissyandé Tegawendé F.
Kim Dongsun
Kim Kisub
Klein Jacques
Lo David
Sun Tiezhu
Zhou Xin
Publication venue
Publication date: 12/12/2022
Field of study

The automation of an increasingly large number of software engineering tasks is becoming possible thanks to Machine Learning (ML). One foundational building block in the application of ML to software artifacts is the representation of these artifacts (e.g., source code or executable code) into a form that is suitable for learning. Many studies have leveraged representation learning, delegating to ML itself the job of automatically devising suitable representations. Yet, in the context of Android problems, existing models are either limited to coarse-grained whole-app level (e.g., apk2vec) or conducted for one specific downstream task (e.g., smali2vec). Our work is part of a new line of research that investigates effective, task-agnostic, and fine-grained universal representations of bytecode to mitigate both of these two limitations. Such representations aim to capture information relevant to various low-level downstream tasks (e.g., at the class-level). We are inspired by the field of Natural Language Processing, where the problem of universal representation was addressed by building Universal Language Models, such as BERT, whose goal is to capture abstract semantic information about sentences, in a way that is reusable for a variety of tasks. We propose DexBERT, a BERT-like Language Model dedicated to representing chunks of DEX bytecode, the main binary format used in Android applications. We empirically assess whether DexBERT is able to model the DEX language and evaluate the suitability of our model in two distinct class-level software engineering tasks: Malicious Code Localization and Defect Prediction. We also experiment with strategies to deal with the problem of catering to apps having vastly different sizes, and we demonstrate one example of using our technique to investigate what information is relevant to a given task

arXiv.org e-Print Archive

apk2vec: Semi-supervised multi-view representation learning for profiling Android applications

Author: Chen Lihui
Liu Yang
Narayanan Annamalai
Soh Charlie
Wang Lipo
Publication venue
Publication date: 01/01/2018
Field of study

Building behavior profiles of Android applications (apps) with holistic, rich and multi-view information (e.g., incorporating several semantic views of an app such as API sequences, system calls, etc.) would help catering downstream analytics tasks such as app categorization, recommendation and malware analysis significantly better. Towards this goal, we design a semi-supervised Representation Learning (RL) framework named apk2vec to automatically generate a compact representation (aka profile/embedding) for a given app. More specifically, apk2vec has the three following unique characteristics which make it an excellent choice for largescale app profiling: (1) it encompasses information from multiple semantic views such as API sequences, permissions, etc., (2) being a semi-supervised embedding technique, it can make use of labels associated with apps (e.g., malware family or app category labels) to build high quality app profiles, and (3) it combines RL and feature hashing which allows it to efficiently build profiles of apps that stream over time (i.e., online learning). The resulting semi-supervised multi-view hash embeddings of apps could then be used for a wide variety of downstream tasks such as the ones mentioned above. Our extensive evaluations with more than 42,000 apps demonstrate that apk2vec's app profiles could significantly outperform state-of-the-art techniques in four app analytics tasks namely, malware detection, familial clustering, app clone detection and app recommendation.Comment: International Conference on Data Mining, 201

arXiv.org e-Print Archive

Crossref

DR-NTU (Digital Repository of NTU)

DexBERT: Effective, Task-Agnostic and Fine-Grained Representation Learning of Android Bytecode

Author: ALLIX Kevin
Bissyande Tegawende F.
KIM Dongsun
KIM Kisub
KLEIN Jacques
Lo David
SUN Tiezhu
Zhou Xin
Publication venue: Institute of Electrical and Electronics Engineers Inc.
Publication date: 01/09/2023
Field of study

peer reviewedThe automation of an increasingly large number of software engineering tasks is becoming possible thanks to Machine Learning (ML). One foundational building block in the application of ML to software artifacts is the representation of these artifacts (e.g., source code or executable code) into a form that is suitable for learning. Traditionally, researchers and practitioners have relied on manually selected features, based on expert knowledge, for the task at hand. Such knowledge is sometimes imprecise and generally incomplete. To overcome this limitation, many studies have leveraged representation learning, delegating to ML itself the job of automatically devising suitable representations and selections of the most relevant features. Yet, in the context of Android problems, existing models are either limited to coarse-grained whole-app level (e.g., apk2vec) or conducted for one specific downstream task (e.g., smali2vec). Thus, the produced representation may turn out to be unsuitable for fine-grained tasks or cannot generalize beyond the task that they have been trained on. Our work is part of a new line of research that investigates effective, task-agnostic, and fine-grained universal representations of bytecode to mitigate both of these two limitations. Such representations aim to capture information relevant to various low-level downstream tasks (e.g., at the class-level). We are inspired by the field of Natural Language Processing, where the problem of universal representation was addressed by building Universal Language Models, such as BERT, whose goal is to capture abstract semantic information about sentences, in a way that is reusable for a variety of tasks. We propose DexBERT, a BERT-like Language Model dedicated to representing chunks of DEX bytecode, the main binary format used in Android applications. We empirically assess whether DexBERT is able to model the DEX language and evaluate the suitability of our model in three distinct class-level software engineering tasks: Malicious Code Localization, Defect Prediction, and Component Type Classification. We also experiment with strategies to deal with the problem of catering to apps having vastly different sizes, and we demonstrate one example of using our technique to investigate what information is relevant to a given task

Institutional Knowledge at Singapore Management University

Open Repository and Bibliography - Luxembourg

Visualizing the outcome of dynamic analysis of Android malware with VizMal

Author: De Lorenzo Andrea
Martinelli Fabio
Medvet Eric
Mercaldo Francesco
Santone Antonella
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Malware detection techniques based on signature extraction require security analysts to manually inspect samples to find evidences of malicious behavior. This time-consuming task received little attention by researchers and practitioners, as most of the effort is on the identification as malware or non-malware of an entire sample. There are no tools for supporting the analyst in identifying when the malicious behavior occurs, given a sample. In this paper we propose VizMal, a tool able to visualize the execution traces of Android applications and to highlight which portions of the traces correspond to a potentially malicious behavior. The aim of VizMal is twofold: assisting the malware analyst during the inspection of an application and pushing the research community to organize and focus its effort on the malicious behavior localization. VizMal is able to discern if the application behavior during each second of execution are legitimate or malicious and to show this information in a simple and understandable way. We validate VizMal experimentally and by means of a user study: the results are promising and confirm that the tool can be useful

Archivio istituzionale della ricerca - Università di Trieste

Malicious code detection in android : the role of sequence characteristics and disassembling methods

Author: Acartürk Cengiz
Balikcioglu Pinar G.
Kucuk Ozge A.
Sirlanci Melih
Turkmen Ramazan K.
Ulukapi Bulut
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2023
Field of study

The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for detecting malicious software effectively considering the vast number of applications, machine learning techniques in this field have also become widespread. In this context, stating the acquired accuracy values in the contingency tables in malware detection studies has become a popular and efficient method and enabled researchers to evaluate their methodologies comparatively. In this study, we wanted to investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers, particularly the disassembly method and the input data characteristics. Firstly, we developed a model that tackles the malware detection problem from a Natural Language Processing (NLP) perspective using Long Short-Term Memory (LSTM). Then, we experimented with different base units (instruction, basic block, method, and class) and representations of source code obtained from three commonly used disassembling tools (JEB, IDA, and Apktool) and examined the results. Our findings exhibit that the disassembly method and different input representations affect the model results. More specifically, the datasets collected by the Apktool achieved better results compared to the other two disassemblers

Jagiellonian Univeristy Repository

Defending Your Mobile Fortress: An In-Depth Look at on-Device Trojan Detection in Machine Learning: Systematic Literature Review

Author: Ezrafel Amadeuz
Indahsari Ayu Nur
Monica Sella
Novelianto Koo Tito
Poerwati Alinda Endang
Roestam Rusdianto
Saragih Yuliarman
Setiyani Lila
Publication venue: Postgraduate, University of Mataram
Publication date: 25/07/2023
Field of study

Mobile app trojans are becoming an increasingly serious threat to personal information security. They can cause severe damage by exposing sensitive and personally-identifying information to malicious actors. This paper’s contribution is a comprehensive review of the attack vectors for trojan attacks, and ways to eliminate the risks posed by attack vectors and generate settlement automatically. As such, such attacks must be prevented. In this study, we explore to find how to detect the trojan attack in detail, and the way that we know in machine learning. A review is conducted on the state-of-the-art methods using the preferred reporting items for reviews and meta-analyses (PRISMA) guidelines. We review literature from several publications and analyze the use of machine learning for on-device trojan detection. This review provides evidence for the effectiveness of machine learning in detecting such threats. The current trend shows that signature-based analysis using various metadata, such as permission, intent, API and system calls, and network analysis, are capable of detecting trojan attacks before and after the initial infectio

Jurnal Penelitian Pendidikan IPA (Science Education Graduate Program University of Mataram)

Understanding Android App Piggybacking:A Systematic Study of Malicious Code Grafting

Author: Bissyande Tegawendé François D Assise
Cavallaro Lorenzo
Klein Jacques
Le Traon Yves
Li Daoyuan
Li Li
Lo David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

The Android packaging model offers ample opportunities for malware writers to piggyback malicious code in popular apps, which can then be easily spread to a large user base. Although recent research has produced approaches and tools to identify piggybacked apps, the literature lacks a comprehensive investigation into such phenomenon. We fill this gap by 1) systematically building a large set of piggybacked and benign apps pairs, which we release to the community, 2) empirically studying the characteristics of malicious piggybacked apps in comparison with their benign counterparts, and 3) providing insights on piggybacking processes. Among several findings providing insights, analysis techniques should build upon to improve the overall detection and classification accuracy of piggybacked apps, we show that piggybacking operations not only concern app code but also extensively manipulates app resource files, largely contradicting common beliefs. We also find that piggybacking is done with little sophistication, in many cases automatically, and often via library code

Royal Holloway - Pure

Institutional Knowledge at Singapore Management University

King's Research Portal

Open Repository and Bibliography - Luxembourg