1,665 research outputs found

    Malware Pattern of Life Analysis

    Get PDF
    Many malware classifications include viruses, worms, trojans, ransomware, bots, adware, spyware, rootkits, file-less downloaders, malvertising, and many more. Each type may share unique behavioral characteristics with its methods of operations (MO), a pattern of behavior so distinctive that it could be recognized as having the same creator. The research shows the extraction of malware methods of operation using the step-by-step process of Artificial-Based Intelligence (ABI) with built-in Density-based spatial clustering of applications with noise (DBSCAN) machine learning to quantify the actions for their similarities, differences, baseline behaviors, and anomalies. The collected data of the research is from the ransomware sample repositories of Malware Bazaar and Virus Share, totaling 1300 live malicious codes ingested into the CAPEv2 malware sandbox, allowing the capture of traces of static, dynamic, and network behavior features. The ransomware features have shown significant activity of varying identified functions used in encryption, file application programming interface (API), and network function calls. During the machine learning categorization phase, there are eight identified clusters that have similar and different features regarding function-call sequencing events and file access manipulation for dropping file notes and writing encryption. Having compared all the clusters using a “supervenn” pictorial diagram, the characteristics of the static and dynamic behavior of the ransomware give the initial baselines for comparison with other variants that may have been added to the collected data for intelligence gathering. The findings provide a novel practical approach for intelligence gathering to address ransomware or any other malware variants’ activity patterns to discern similarities, anomalies, and differences between malware actions under study

    Detecting Malicious Software By Dynamicexecution

    Get PDF
    Traditional way to detect malicious software is based on signature matching. However, signature matching only detects known malicious software. In order to detect unknown malicious software, it is necessary to analyze the software for its impact on the system when the software is executed. In one approach, the software code can be statically analyzed for any malicious patterns. Another approach is to execute the program and determine the nature of the program dynamically. Since the execution of malicious code may have negative impact on the system, the code must be executed in a controlled environment. For that purpose, we have developed a sandbox to protect the system. Potential malicious behavior is intercepted by hooking Win32 system calls. Using the developed sandbox, we detect unknown virus using dynamic instruction sequences mining techniques. By collecting runtime instruction sequences in basic blocks, we extract instruction sequence patterns based on instruction associations. We build classification models with these patterns. By applying this classification model, we predict the nature of an unknown program. We compare our approach with several other approaches such as simple heuristics, NGram and static instruction sequences. We have also developed a method to identify a family of malicious software utilizing the system call trace. We construct a structural system call diagram from captured dynamic system call traces. We generate smart system call signature using profile hidden Markov model (PHMM) based on modularized system call block. Smart system call signature weakly identifies a family of malicious software

    Latent Representation and Sampling in Network: Application in Text Mining and Biology.

    Get PDF
    In classical machine learning, hand-designed features are used for learning a mapping from raw data. However, human involvement in feature design makes the process expensive. Representation learning aims to learn abstract features directly from data without direct human involvement. Raw data can be of various forms. Network is one form of data that encodes relational structure in many real-world domains. Therefore, learning abstract features for network units is an important task. In this dissertation, we propose models for incorporating temporal information given as a collection of networks from subsequent time-stamps. The primary objective of our models is to learn a better abstract feature representation of nodes and edges in an evolving network. We show that the temporal information in the abstract feature improves the performance of link prediction task substantially. Besides applying to the network data, we also employ our models to incorporate extra-sentential information in the text domain for learning better representation of sentences. We build a context network of sentences to capture extra-sentential information. This information in abstract feature representation of sentences improves various text-mining tasks substantially over a set of baseline methods. A problem with the abstract features that we learn is that they lack interpretability. In real-life applications on network data, for some tasks, it is crucial to learn interpretable features in the form of graphical structures. For this we need to mine important graphical structures along with their frequency statistics from the input dataset. However, exact algorithms for these tasks are computationally expensive, so scalable algorithms are of urgent need. To overcome this challenge, we provide efficient sampling algorithms for mining higher-order structures from network(s). We show that our sampling-based algorithms are scalable. They are also superior to a set of baseline algorithms in terms of retrieving important graphical sub-structures, and collecting their frequency statistics. Finally, we show that we can use these frequent subgraph statistics and structures as features in various real-life applications. We show one application in biology and another in security. In both cases, we show that the structures and their statistics significantly improve the performance of knowledge discovery tasks in these domains

    Multi-level analysis of Malware using Machine Learning

    Get PDF
    Multi-level analysis of Malware using Machine Learnin
    • …
    corecore