2,080 research outputs found

    Malicious Software Detection and Classification utilizing Temporal-Graphs of System-call Group Relations

    Full text link
    In this work we propose a graph-based model that, utilizing relations between groups of System-calls, distinguishes malicious from benign software samples and classifies the detected malicious samples to one of a set of known malware families. More precisely, given a System-call Dependency Graph (ScDG) that depicts the malware's behavior, we first transform it to a more abstract representation, utilizing the indexing of System-calls to a set of groups of similar functionality, constructing thus an abstract and mutation-tolerant graph that we call Group Relation Graph (GrG); then, we construct another graph representation, which we call Coverage Graph (CvG), that depicts the dominating relations between the nodes of a GrG graph. Based on the research so far in the field, we pointed out that behavior-based graph representations had not leveraged the aspect of the temporal evolution of the graph. Hence, the novelty of our work is that, preserving the initial representations of GrG and CvG graphs, we focus on augmenting the potentials of theses graphs by adding further features that enhance its abilities on detecting and further classifying to a known malware family an unknown malware sample. To that end, we construct periodical instances of the graph that represent its temporal evolution concerning its structural modifications, creating another graph representation that we call Temporal Graphs. In this paper, we present the theoretical background behind our approach, discuss the current technological status on malware detection and classification and demonstrate the overall architecture of our proposed detection and classification model alongside with its underlying main principles and its structural key-components.Comment: 23 pages, 15 figures, 1 tabl

    Virtual Machine Introspection Based Malware Behavior Profiling and Family Grouping

    Full text link
    The proliferation of malwares have been attributed to the alternations of a handful of original malware source codes. The malwares alternated from the same origin share some intrinsic behaviors and form a malware family. Expediently, identifying its malware family when a malware is first seen on the Internet can provide useful clues to mitigate the threat. In this paper, a malware profiler (VMP) is proposed to profile the execution behaviors of a malware by leveraging virtual machine introspection (VMI) technique. The VMP inserts plug-ins inside the virtual machine monitor (VMM) to record the invoked API calls with their input parameters and return values as the profile of malware. In this paper, a popular similarity measurement Jaccard distance and a phylogenetic tree construction method are adopted to discover malware families. The studies of malware profiles show the malwares from a malware family are very similar to each others and distinct from other malware families as well as benign software. This paper also examines VMP against existing anti-malware detection engines and some well-known malware grouping methods to compare the goodness in their malware family constructions. A peer voting approach is proposed and the results show VMP is better than almost all of the compared anti-malware engines, and compatible with the fine tuned text-mining approach and high order N-gram approaches. We also establish a malware profiling website based on VMP for malware research.Comment: 13 pages, 9 figures, 5 table

    AndroVault: Constructing Knowledge Graph from Millions of Android Apps for Automated Analysis

    Full text link
    Data driven research on Android has gained a great momentum these years. The abundance of data facilitates knowledge learning, however, also increases the difficulty of data preprocessing. Therefore, it is non-trivial to prepare a demanding and accurate set of data for research. In this work, we put forward AndroVault, a framework for the Android research composing of data collection, knowledge representation and knowledge extraction. It has started with a long-running web crawler for data collection (both apps and description) since 2013, which guarantees the timeliness of data; With static analysis and dynamic analysis of the collected data, we compute a variety of attributes to characterize Android apps. After that, we employ a knowledge graph to connect all these apps by computing their correlation in terms of attributes; Last, we leverage multiple technologies such as logical inference, machine learning, and correlation analysis to extract facts (more accurate and demanding, either high level or not, data) that are beneficial for a specific research problem. With the produced data of high quality, we have successfully conducted many research works including malware detection, code generation, and Android testing. We would like to release our data to the research community in an authenticated manner, and encourage them to conduct productive research

    Context-aware, Adaptive and Scalable Android Malware Detection through Online Learning (extended version)

    Full text link
    It is well-known that Android malware constantly evolves so as to evade detection. This causes the entire malware population to be non-stationary. Contrary to this fact, most of the prior works on Machine Learning based Android malware detection have assumed that the distribution of the observed malware characteristics (i.e., features) does not change over time. In this work, we address the problem of malware population drift and propose a novel online learning based framework to detect malware, named CASANDRA (Contextaware, Adaptive and Scalable ANDRoid mAlware detector). In order to perform accurate detection, a novel graph kernel that facilitates capturing apps' security-sensitive behaviors along with their context information from dependency graphs is proposed. Besides being accurate and scalable, CASANDRA has specific advantages: i) being adaptive to the evolution in malware features over time ii) explaining the significant features that led to an app's classification as being malicious or benign. In a large-scale comparative analysis, CASANDRA outperforms two state-of-the-art techniques on a benchmark dataset achieving 99.23% F-measure. When evaluated with more than 87,000 apps collected in-the-wild, CASANDRA achieves 89.92% accuracy, outperforming existing techniques by more than 25% in their typical batch learning setting and more than 7% when they are continuously retained, while maintaining comparable efficiency

    Detecting Malicious Code by Exploiting Dependencies of System-call Groups

    Full text link
    In this paper we present an elaborated graph-based algorithmic technique for efficient malware detection. More precisely, we utilize the system-call dependency graphs (or, for short ScD graphs), obtained by capturing taint analysis traces and a set of various similarity metrics in order to detect whether an unknown test sample is a malicious or a benign one. For the sake of generalization, we decide to empower our model against strong mutations by applying our detection technique on a weighted directed graph resulting from ScD graph after grouping disjoint subsets of its vertices. Additionally, we have developed a similarity metric, which we call NP-similarity, that combines qualitative, quantitative, and relational characteristics that are spread among the members of known malware families to archives a clear distinction between graph-representations of malware and the ones of benign software. Finally, we evaluate our detection model and compare our results against the results achieved by a variety of techniques proving the potentials of our model.Comment: 21 pages, 4 figure

    Robust and Effective Malware Detection through Quantitative Data Flow Graph Metrics

    Full text link
    We present a novel malware detection approach based on metrics over quantitative data flow graphs. Quantitative data flow graphs (QDFGs) model process behavior by interpreting issued system calls as aggregations of quantifiable data flows.Due to the high abstraction level we consider QDFG metric based detection more robust against typical behavior obfuscation like bogus call injection or call reordering than other common behavioral models that base on raw system calls. We support this claim with experiments on obfuscated malware logs and demonstrate the superior obfuscation robustness in comparison to detection using n-grams. Our evaluations on a large and diverse data set consisting of about 7000 malware and 500 goodware samples show an average detection rate of 98.01% and a false positive rate of 0.48%. Moreover, we show that our approach is able to detect new malware (i.e. samples from malware families not included in the training set) and that the consideration of quantities in itself significantly improves detection precision

    Computer activity learning from system call time series

    Full text link
    Using a previously introduced similarity function for the stream of system calls generated by a computer, we engineer a program-in-execution classifier using deep learning methods. Tested on malware classification, it significantly outperforms current state of the art. We provide a series of performance measures and tests to demonstrate the capabilities, including measurements from production use. We show how the system scales linearly with the number of endpoints. With the system we estimate the total number of malware families created over the last 10 years as 3450, in line with reasonable economic constraints. The more limited rate for new malware families than previously acknowledged implies that machine learning malware classifiers risk being tested on their training set; we achieve F1 = 0.995 in a test carefully designed to mitigate this risk.Comment: 27 pages, 6 figure

    Ransomware in Windows and Android Platforms

    Full text link
    Malware proliferation and sophistication have drastically increased and evolved continuously. Recent indiscriminate ransomware victimizations have imposed critical needs of effective detection techniques to prevent damages. Therefore, ransomware has drawn attention among cyberspace researchers. This paper contributes a comprehensive overview of ransomware attacks and summarizes existing detection and prevention techniques in both Windows and Android platforms. Moreover, it highlights the strengths and shortcomings of those techniques and provides a comparison between them. Furthermore, it gives recommendations to users and system administrators.Comment: 21 pages, 7 figures, 5 table

    Automated Poisoning Attacks and Defenses in Malware Detection Systems: An Adversarial Machine Learning Approach

    Full text link
    The evolution of mobile malware poses a serious threat to smartphone security. Today, sophisticated attackers can adapt by maximally sabotaging machine-learning classifiers via polluting training data, rendering most recent machine learning-based malware detection tools (such as Drebin, DroidAPIMiner, and MaMaDroid) ineffective. In this paper, we explore the feasibility of constructing crafted malware samples; examine how machine-learning classifiers can be misled under three different threat models; then conclude that injecting carefully crafted data into training data can significantly reduce detection accuracy. To tackle the problem, we propose KuafuDet, a two-phase learning enhancing approach that learns mobile malware by adversarial detection. KuafuDet includes an offline training phase that selects and extracts features from the training set, and an online detection phase that utilizes the classifier trained by the first phase. To further address the adversarial environment, these two phases are intertwined through a self-adaptive learning scheme, wherein an automated camouflage detector is introduced to filter the suspicious false negatives and feed them back into the training phase. We finally show that KuafuDet can significantly reduce false negatives and boost the detection accuracy by at least 15%. Experiments on more than 250,000 mobile applications demonstrate that KuafuDet is scalable and can be highly effective as a standalone system

    Malware triage for early identification of Advanced Persistent Threat activities

    Full text link
    In the last decade, a new class of cyber-threats has emerged. This new cybersecurity adversary is known with the name of "Advanced Persistent Threat" (APT) and is referred to different organizations that in the last years have been "in the center of the eye" due to multiple dangerous and effective attacks targeting financial and politic, news headlines, embassies, critical infrastructures, TV programs, etc. In order to early identify APT related malware, a semi-automatic approach for malware samples analysis is needed. In our previous work we introduced a "malware triage" step for a semi-automatic malware analysis architecture. This step has the duty to analyze as fast as possible new incoming samples and to immediately dispatch the ones that deserve a deeper analysis, among all the malware delivered per day in the cyber-space, the ones that really worth to be further examined by analysts. Our paper focuses on malware developed by APTs, and we build our knowledge base, used in the triage, on known APTs obtained from publicly available reports. In order to have the triage as fast as possible, we only rely on static malware features, that can be extracted with negligible delay, and use machine learning techniques for the identification. In this work we move from multiclass classification to a group of oneclass classifier, which simplify the training and allows higher modularity. The results of the proposed framework highlight high performances, reaching a precision of 100% and an accuracy over 95
    corecore