4 research outputs found
Classification of malware based on string and function feature selection
Anti-malware software producers are continually challenged to identify and counter new malware as it is released into the wild. A dramatic increase in malware production in recent years has rendered the conventional method of manually determining a signature for each new malware sample untenable. This paper presents a scalable, automated approach for detecting and classifying malware by using pattern recognition algorithms and statistical methods at various stages of the malware analysis life cycle. Our framework combines the static features of function length and printable string information extracted from malware samples into a single test which gives classification results better than those achieved by using either feature individually. In our testing we input feature information from close to 1400 unpacked malware samples to a number of different classification algorithms. Using k-fold cross validation on the malware, which includes Trojans and viruses, along with 151 clean files, we achieve an overall classification accuracy of over 98%.</p
Malware Pattern of Life Analysis
Many malware classifications include viruses, worms, trojans, ransomware, bots, adware, spyware, rootkits, file-less downloaders, malvertising, and many more. Each type may share unique behavioral characteristics with its methods of operations (MO), a pattern of behavior so distinctive that it could be recognized as having the same creator. The research shows the extraction of malware methods of operation using the step-by-step process of Artificial-Based Intelligence (ABI) with built-in Density-based spatial clustering of applications with noise (DBSCAN) machine learning to quantify the actions for their similarities, differences, baseline behaviors, and anomalies. The collected data of the research is from the ransomware sample repositories of Malware Bazaar and Virus Share, totaling 1300 live malicious codes ingested into the CAPEv2 malware sandbox, allowing the capture of traces of static, dynamic, and network behavior features. The ransomware features have shown significant activity of varying identified functions used in encryption, file application programming interface (API), and network function calls. During the machine learning categorization phase, there are eight identified clusters that have similar and different features regarding function-call sequencing events and file access manipulation for dropping file notes and writing encryption. Having compared all the clusters using a âsupervennâ pictorial diagram, the characteristics of the static and dynamic behavior of the ransomware give the initial baselines for comparison with other variants that may have been added to the collected data for intelligence gathering. The findings provide a novel practical approach for intelligence gathering to address ransomware or any other malware variantsâ activity patterns to discern similarities, anomalies, and differences between malware actions under study
Recommended from our members
A Framework for the Systematic Evaluation of Malware Forensic Tools
Following a series of high profile miscarriages of justice linked to questionable expert evidence, the post of the Forensic Science Regulator was created in 2008 with a remit to improve the standard of practitioner competences and forensic procedures. It has since moved to incorporate a greater level of scientific practice in these areas, as used in the production of expert evidence submitted to the UK Criminal Justice System. Accreditation to their codes of practice and conduct will become mandatory for all forensic practitioners by October 2017. A variety of challenges with expert evidence are explored and linked to a lack of a scientific methodology underpinning the processes followed. In particular, the research focuses upon investigations where malicious software (âmalwareâ) has been identified.
A framework, called the âMalware Analysis Tool Evaluation Frameworkâ (MATEF), has been developed to address this lack of methodology to evaluate software tools used during investigations involving malware. A prototype implementation of the framework was used to evaluate two tools against a population of over 350,000 samples of malware. Analysis of the findings indicated that the choice of tool could impact on the number of artefacts observed in malware forensic investigations as well as identifying the optimal execution time for a given tool when observing malware artefacts.
Three different measures were used to evaluate the framework. The first of these evaluated the framework against the requirements and determined that these were largely met. Where the requirements were not met these are attributed to matters either outside scope or the fledgling nature of the research. Another measure used to evaluate the framework was to consider its performance in terms of speed and resource utilisation. This identified scope for improvement in terms of the time to complete a test and the need for more economical use of disk space. Finally, the framework provides a scientific means to evaluate malware analysis tools, hence addressing the Research Question subject to the level at which ground truth is established.
A number of contributions are produced as the output of this work. First there is confirmation for the case for a lack of trusted practice in the field of malware forensics. Second, the MATEF itself, as it facilitates the production of empirical evidence of a toolâs ability to detect malware artefacts. A third contribution is a set of requirements for establishing trusted practice in the use of malware artefact detection tools. Finally, empirical evidence that supports both the notion that the choice of tool can impact on the number of artefacts observed in malware forensic investigations as well as identifying the optimal execution time for a given tool when observing malware artefacts