Search CORE

46,165 research outputs found

A Multi-view Context-aware Approach to Android Malware Detection and Malicious Code Localization

Author: Chandramohan Mahinthan
Chen Lihui
Liu Yang
Narayanan Annamalai
Publication venue
Publication date: 01/01/2017
Field of study

Existing Android malware detection approaches use a variety of features such as security sensitive APIs, system calls, control-flow structures and information flows in conjunction with Machine Learning classifiers to achieve accurate detection. Each of these feature sets provides a unique semantic perspective (or view) of apps' behaviours with inherent strengths and limitations. Meaning, some views are more amenable to detect certain attacks but may not be suitable to characterise several other attacks. Most of the existing malware detection approaches use only one (or a selected few) of the aforementioned feature sets which prevent them from detecting a vast majority of attacks. Addressing this limitation, we propose MKLDroid, a unified framework that systematically integrates multiple views of apps for performing comprehensive malware detection and malicious code localisation. The rationale is that, while a malware app can disguise itself in some views, disguising in every view while maintaining malicious intent will be much harder. MKLDroid uses a graph kernel to capture structural and contextual information from apps' dependency graphs and identify malice code patterns in each view. Subsequently, it employs Multiple Kernel Learning (MKL) to find a weighted combination of the views which yields the best detection accuracy. Besides multi-view learning, MKLDroid's unique and salient trait is its ability to locate fine-grained malice code portions in dependency graphs (e.g., methods/classes). Through our large-scale experiments on several datasets (incl. wild apps), we demonstrate that MKLDroid outperforms three state-of-the-art techniques consistently, in terms of accuracy while maintaining comparable efficiency. In our malicious code localisation experiments on a dataset of repackaged malware, MKLDroid was able to identify all the malice classes with 94% average recall

arXiv.org e-Print Archive

BCFA: Bespoke Control Flow Analysis for CFA at Scale

Author: Benjamin Livshits V.
Bourdoncle François
Cobleigh Jamieson M.
Dyer Robert
Dyer Robert
Lam Patrick
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/05/2020
Field of study

Many data-driven software engineering tasks such as discovering programming patterns, mining API specifications, etc., perform source code analysis over control flow graphs (CFGs) at scale. Analyzing millions of CFGs can be expensive and performance of the analysis heavily depends on the underlying CFG traversal strategy. State-of-the-art analysis frameworks use a fixed traversal strategy. We argue that a single traversal strategy does not fit all kinds of analyses and CFGs and propose bespoke control flow analysis (BCFA). Given a control flow analysis (CFA) and a large number of CFGs, BCFA selects the most efficient traversal strategy for each CFG. BCFA extracts a set of properties of the CFA by analyzing the code of the CFA and combines it with properties of the CFG, such as branching factor and cyclicity, for selecting the optimal traversal strategy. We have implemented BCFA in Boa, and evaluated BCFA using a set of representative static analyses that mainly involve traversing CFGs and two large datasets containing 287 thousand and 162 million CFGs. Our results show that BCFA can speedup the large scale analyses by 1%-28%. Further, BCFA has low overheads; less than 0.2%, and low misprediction rate; less than 0.01%.Comment: 12 page

arXiv.org e-Print Archive

An Empirical Study on Android-related Vulnerabilities

Author: Bavota Gabriele
Escobar-Velasquez Camilo
Linares-Vasquez Mario
Publication venue
Publication date: 11/04/2017
Field of study

Mobile devices are used more and more in everyday life. They are our cameras, wallets, and keys. Basically, they embed most of our private information in our pocket. For this and other reasons, mobile devices, and in particular the software that runs on them, are considered first-class citizens in the software-vulnerabilities landscape. Several studies investigated the software-vulnerabilities phenomenon in the context of mobile apps and, more in general, mobile devices. Most of these studies focused on vulnerabilities that could affect mobile apps, while just few investigated vulnerabilities affecting the underlying platform on which mobile apps run: the Operating System (OS). Also, these studies have been run on a very limited set of vulnerabilities. In this paper we present the largest study at date investigating Android-related vulnerabilities, with a specific focus on the ones affecting the Android OS. In particular, we (i) define a detailed taxonomy of the types of Android-related vulnerability; (ii) investigate the layers and subsystems from the Android OS affected by vulnerabilities; and (iii) study the survivability of vulnerabilities (i.e., the number of days between the vulnerability introduction and its fixing). Our findings could help OS and apps developers in focusing their verification & validation activities, and researchers in building vulnerability detection tools tailored for the mobile world

arXiv.org e-Print Archive

apk2vec: Semi-supervised multi-view representation learning for profiling Android applications

Author: Chen Lihui
Liu Yang
Narayanan Annamalai
Soh Charlie
Wang Lipo
Publication venue
Publication date: 01/01/2018
Field of study

Building behavior profiles of Android applications (apps) with holistic, rich and multi-view information (e.g., incorporating several semantic views of an app such as API sequences, system calls, etc.) would help catering downstream analytics tasks such as app categorization, recommendation and malware analysis significantly better. Towards this goal, we design a semi-supervised Representation Learning (RL) framework named apk2vec to automatically generate a compact representation (aka profile/embedding) for a given app. More specifically, apk2vec has the three following unique characteristics which make it an excellent choice for largescale app profiling: (1) it encompasses information from multiple semantic views such as API sequences, permissions, etc., (2) being a semi-supervised embedding technique, it can make use of labels associated with apps (e.g., malware family or app category labels) to build high quality app profiles, and (3) it combines RL and feature hashing which allows it to efficiently build profiles of apps that stream over time (i.e., online learning). The resulting semi-supervised multi-view hash embeddings of apps could then be used for a wide variety of downstream tasks such as the ones mentioned above. Our extensive evaluations with more than 42,000 apps demonstrate that apk2vec's app profiles could significantly outperform state-of-the-art techniques in four app analytics tasks namely, malware detection, familial clustering, app clone detection and app recommendation.Comment: International Conference on Data Mining, 201

arXiv.org e-Print Archive

A Security Monitoring Framework For Virtualization Based HEP Infrastructures

Author: Betev L.
Grigoras C.
Kebschull U.
Lara C.
Pedreira M. Martinez
Ramirez A. Gomez
Publication venue: 'IOP Publishing'
Publication date: 16/04/2017
Field of study

High Energy Physics (HEP) distributed computing infrastructures require automatic tools to monitor, analyze and react to potential security incidents. These tools should collect and inspect data such as resource consumption, logs and sequence of system calls for detecting anomalies that indicate the presence of a malicious agent. They should also be able to perform automated reactions to attacks without administrator intervention. We describe a novel framework that accomplishes these requirements, with a proof of concept implementation for the ALICE experiment at CERN. We show how we achieve a fully virtualized environment that improves the security by isolating services and Jobs without a significant performance impact. We also describe a collected dataset for Machine Learning based Intrusion Prevention and Detection Systems on Grid computing. This dataset is composed of resource consumption measurements (such as CPU, RAM and network traffic), logfiles from operating system services, and system call data collected from production Jobs running in an ALICE Grid test site and a big set of malware. This malware was collected from security research sites. Based on this dataset, we will proceed to develop Machine Learning algorithms able to detect malicious Jobs.Comment: Proceedings of the 22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016, 10-14 October 2016, San Francisco. Submitted to Journal of Physics: Conference Series (JPCS

arXiv.org e-Print Archive