592 research outputs found

    A multilabel fuzzy relevance clustering system for malware attack attribution in the edge layer of cyber-physical networks

    Get PDF
    The rapid increase in the number of malicious programs has made malware forensics a daunting task and caused users’ systems to become in danger. Timely identification of malware characteristics including its origin and the malware sample family would significantly limit the potential damage of malware. This is a more profound risk in Cyber-Physical Systems (CPSs), where a malware attack may cause significant physical damage to the infrastructure. Due to limited on-device available memory and processing power in CPS devices, most of the efforts for protecting CPS networks are focused on the edge layer, where the majority of security mechanisms are deployed. Since the majority of advanced and sophisticated malware programs are combining features from different families, these malicious programs are not similar enough to any existing malware family and easily evade binary classifier detection. Therefore, in this article, we propose a novel multilabel fuzzy clustering system for malware attack attribution. Our system is deployed on the edge layer to provide insight into applicable malware threats to the CPS network. We leverage static analysis by utilizing Opcode frequencies as the feature space to classify malware families. We observed that a multilabel classifier does not classify a part of samples. We named this problem the instance coverage problem. To overcome this problem, we developed an ensemble-based multilabel fuzzy classification method to suggest the relevance of a malware instance to the stricken families. This classifier identified samples of VirusShare, RansomwareTracker, and BIG2015 with an accuracy of 94.66%, 94.26%, and 97.56%, respectively

    Hidden Markov Models for Malware Classification

    Get PDF
    Malware is a software which is developed for malicious intent. Malware is a rapidly evolving threat to the computing community. Although many techniques for malware classification have been proposed, there is still the lack of a comprehensible and useful taxonomy to classify malware samples. Previous research has shown that hidden Markov model (HMM) analysis is useful for detecting certain types of malware. In this research, we consider the related problem of malware classification based on HMMs. We train HMMs for a variety of malware generators and a variety of compilers. More than 9000 malware samples are then scored against each of these models and the malware samples are separated into clusters based on the resulting scores. We analyze the clusters and show that they correspond to certain characteristics of malware. These results indicate that HMMs are an effective tool for the challenging task of automatically classifying malware

    Behaviour based anomaly detection system for smartphones using machine learning algorithm

    Get PDF
    In this research, we propose a novel, platform independent behaviour-based anomaly detection system for smartphones. The fundamental premise of this system is that every smartphone user has unique usage patterns. By modelling these patterns into a profile we can uniquely identify users. To evaluate this hypothesis, we conducted an experiment in which a data collection application was developed to accumulate real-life dataset consisting of application usage statistics, various system metrics and contextual information from smartphones. Descriptive statistical analysis was performed on our dataset to identify patterns of dissimilarity in smartphone usage of the participants of our experiment. Following this analysis, a Machine Learning algorithm was applied on the dataset to create a baseline usage profile for each participant. These profiles were compared to monitor deviations from baseline in a series of tests that we conducted, to determine the profiling accuracy. In the first test, seven day smartphone usage data consisting of eight features and an observation interval of one hour was used and an accuracy range of 73.41% to 100% was achieved. In this test, 8 out 10 user profiles were more than 95% accurate. The second test, utilised the entire dataset and achieved average accuracy of 44.50% to 95.48%. Not only these results are very promising in differentiating participants based on their usage, the implications of this research are far reaching as our system can also be extended to provide transparent, continuous user authentication on smartphones or work as a risk scoring engine for other Intrusion Detection System

    Identifying meaningful clusters in malware data

    Get PDF
    Finding meaningful clusters in drive-by-download malware data is a particularly difficult task. Malware data tends to contain overlapping clusters with wide variations of cardinality. This happens because there can be considerable similarity between malware samples (some are even said to belong to the same family), and these tend to appear in bursts. Clustering algorithms are usually applied to normalised data sets. However, the process of normalisation aims at setting features with different range values to have a similar contribution to the clustering. It does not favour more meaningful features over those that are less meaningful, an effect one should perhaps expect of the data pre-processing stage. In this paper we introduce a method to deal precisely with the problem above. This is an iterative data pre-processing method capable of aiding to increase the separation between clusters. It does so by calculating the within-cluster degree of relevance of each feature, and then it uses these as a data rescaling factor. By repeating this until convergence our malware data was separated in clear clusters, leading to a higher average silhouette width

    Anomalous pattern based clustering of mental tasks with subject independent learning – some preliminary results

    Get PDF
    In this paper we describe a new method for EEG signal classification in which the classification of one subject’s EEG signals is based on features learnt from another subject. This method applies to the power spectrum density data and assigns class-dependent information weights to individual features. The informative features appear to be rather similar among different subjects, thus supporting the view that there are subject independent general brain patterns for the same mental task. Classification is done via clustering using the intelligent k-means algorithm with the most informative features from a different subject. We experimentally compare our method with others.</jats:p
    • …
    corecore