22 research outputs found

    Differentiating malware from cleanware using behavioural analysis

    Full text link
    This paper proposes a scalable approach for distinguishing malicious files from clean files by investigating the behavioural features using logs of various API calls. We also propose, as an alternative to the traditional method of manually identifying malware files, an automated classification system using runtime features of malware files. For both projects, we use an automated tool running in a virtual environment to extract API call features from executables and apply pattern recognition algorithms and statistical methods to differentiate between files. Our experimental results, based on a dataset of 1368 malware and 456 cleanware files, provide an accuracy of over 97% in distinguishing malware from cleanware. Our techniques provide a similar accuracy for classifying malware into families. In both cases, our results outperform comparable previously published techniques

    Classification of Polymorphic Virus Based on Integrated Features

    Get PDF
    Standard virus classification relies on the use of virus function, which is a small number of bytes written in assembly language. The addressable problem with current malware intrusion detection and prevention system is having difficulties in detecting unknown and multipath polymorphic computer virus solely based on either static or dynamic features. Thus, this paper presents an effective and efficient polymorphic classification technique based on integrated features. The integrated feature is selected based on Information Gain (IG) rank value between static and dynamic features. Then, all datasets are tested on Naïve Bayes and Random Forest classifiers. We extracted 49 features from 700 polymorphic computer virus samples from Netherland Net Lab and VXHeaven, which includes benign and polymorphic virus function. We spilt the dataset based on 60:40 split ratio sizes for training and testing respectively. Our proposed integrated features manage to achieve 98.9% of accuracy value

    Review of Contemporary Literature on Machine Learning based Malware Analysis and Detection Strategies

    Get PDF
    Abstract: malicious software also known as malware are the critical security threat experienced by the current ear of internet and computer system users. The malwares can morph to access or control the system level operations in multiple dimensions. The traditional malware detection strategies detects by signatures, which are not capable to notify the unknown malwares. The machine learning models learns from the behavioral patterns of the existing malwares and attempts to notify the malwares with similar behavioral patterns, hence these strategies often succeeds to notify even about unknown malwares. This manuscript explored the detailed review of machine learning based malware detection strategies found in contemporary literature

    A comparison of the classification of disparate malware collected in different time periods

    Full text link
    It has been argued that an anti-virus strategy based on malware collected at a certain date, will not work at a later date because malware evolves rapidly and an anti-virus engine is then faced with a completely new type of executable not as amenable to detection as the first was.In this paper, we test this idea by collecting two sets of malware, the first from 2002 to 2007, the second from 2009 to 2010 to determine how well the anti-virus strategy we developed based on the earlier set [18] will do on the later set. This anti-virus strategy integrates dynamic and static features extracted from the executables to classify malware by distinguishing between families. We also perform another test, to investigate the same idea whereby we accumulate all the malware executables in the old and new dataset, separately, and apply a malware versus cleanware classification.The resulting classification accuracies are very close for both datasets, with a difference of approximately 5.4% for both experiments, the older malware being more accurately classified than the newer malware. This leads us to conjecture that current anti-virus strategies can indeed be modified to deal effectively with new malware.<br /

    Analyzing malware log files for internet access investigation using Hadoop

    Get PDF
    On the Internet, malicious software (malware) is one of the most serious threats to system security. Major complex issues and problems on any software systems are frequently caused by malware. Malware can infect any computer software that has connection to Internet infrastructure. There are many types of malware and some of the popular malwares are botnet, trojans, viruses, spyware and adware. Internet users with lesser knowledge on the malware threats are susceptible to this issue. To protect and prevent the computer and internet users from exposing themselves towards malware attacks, identifying the attacks through investigating malware log file is an essential step to curb this threat. The log file exposes crucial information in identifying the malware, such as algorithm and functional characteristic, the network interaction between the source and the destination, and type of malware. By nature, the log file size is humongous and requires the investigation process to be executed on faster and stable platform such as big data environment. In this study, the authors had adopted Hadoop, an open source software framework to process and extract the information from the malware log files that obtains from university’s security equipment. The Python program was used for data transformation then analysis it in Hadoop simulation environment. The analysis includes assessing reduction of log files size, performance of execution time and data visualization using Microsoft Power BI (Business Intelligence). The results of log processing have reduced 50% of the original log file size, while the total execution time would not increase linearly with the size of the data. The information will be used for further prevention and protection from malware threats in university’s network

    Mal-Netminer: Malware Classification Approach based on Social Network Analysis of System Call Graph

    Get PDF
    As the security landscape evolves over time, where thousands of species of malicious codes are seen every day, antivirus vendors strive to detect and classify malware families for efficient and effective responses against malware campaigns. To enrich this effort, and by capitalizing on ideas from the social network analysis domain, we build a tool that can help classify malware families using features driven from the graph structure of their system calls. To achieve that, we first construct a system call graph that consists of system calls found in the execution of the individual malware families. To explore distinguishing features of various malware species, we study social network properties as applied to the call graph, including the degree distribution, degree centrality, average distance, clustering coefficient, network density, and component ratio. We utilize features driven from those properties to build a classifier for malware families. Our experimental results show that influence-based graph metrics such as the degree centrality are effective for classifying malware, whereas the general structural metrics of malware are less effective for classifying malware. Our experiments demonstrate that the proposed system performs well in detecting and classifying malware families within each malware class with accuracy greater than 96%.Comment: Mathematical Problems in Engineering, Vol 201

    Malware Pattern of Life Analysis

    Get PDF
    Many malware classifications include viruses, worms, trojans, ransomware, bots, adware, spyware, rootkits, file-less downloaders, malvertising, and many more. Each type may share unique behavioral characteristics with its methods of operations (MO), a pattern of behavior so distinctive that it could be recognized as having the same creator. The research shows the extraction of malware methods of operation using the step-by-step process of Artificial-Based Intelligence (ABI) with built-in Density-based spatial clustering of applications with noise (DBSCAN) machine learning to quantify the actions for their similarities, differences, baseline behaviors, and anomalies. The collected data of the research is from the ransomware sample repositories of Malware Bazaar and Virus Share, totaling 1300 live malicious codes ingested into the CAPEv2 malware sandbox, allowing the capture of traces of static, dynamic, and network behavior features. The ransomware features have shown significant activity of varying identified functions used in encryption, file application programming interface (API), and network function calls. During the machine learning categorization phase, there are eight identified clusters that have similar and different features regarding function-call sequencing events and file access manipulation for dropping file notes and writing encryption. Having compared all the clusters using a “supervenn” pictorial diagram, the characteristics of the static and dynamic behavior of the ransomware give the initial baselines for comparison with other variants that may have been added to the collected data for intelligence gathering. The findings provide a novel practical approach for intelligence gathering to address ransomware or any other malware variants’ activity patterns to discern similarities, anomalies, and differences between malware actions under study

    Malware classification using self organising feature maps and machine activity data

    Get PDF
    In this article we use machine activity metrics to automatically distinguish between malicious and trusted portable executable software samples. The motivation stems from the growth of cyber attacks using techniques that have been employed to surreptitiously deploy Advanced Persistent Threats (APTs). APTs are becoming more sophisticated and able to obfuscate much of their identifiable features through encryption, custom code bases and in-memory execution. Our hypothesis is that we can produce a high degree of accuracy in distinguishing malicious from trusted samples using Machine Learning with features derived from the inescapable footprint left behind on a computer system during execution. This includes CPU, RAM, Swap use and network traffic at a count level of bytes and packets. These features are continuous and allow us to be more flexible with the classification of samples than discrete features such as API calls (which can also be obfuscated) that form the main feature of the extant literature. We use these continuous data and develop a novel classification method using Self Organizing Feature Maps to reduce over fitting during training through the ability to create unsupervised clusters of similar ‘behaviour’ that are subsequently used as features for classification, rather than using the raw data. We compare our method to a set of machine classification methods that have been applied in previous research and demonstrate an increase of between 7.24% and 25.68% in classification accuracy using our method and an unseen dataset over the range of other machine classification methods that have been applied in previous research
    corecore