5 research outputs found

    Mining domain-specific edit operations from model repositories with applications to semantic lifting of model differences and change profiling

    Get PDF
    Model transformations are central to model-driven software development. Applications of model transformations include creating models, handling model co-evolution, model merging, and understanding model evolution. In the past, various (semi-) automatic approaches to derive model transformations from meta-models or from examples have been proposed. These approaches require time-consuming handcrafting or the recording of concrete examples, or they are unable to derive complex transformations. We propose a novel unsupervised approach, called Ockham, which is able to learn edit operations from model histories in model repositories. Ockham is based on the idea that meaningful domain-specifc edit operations are the ones that compress the model diferences. It employs frequent subgraph mining to discover frequent structures in model diference graphs. We evaluate our approach in two controlled experiments and one real-world case study of a large-scale industrial model-driven architecture project in the railway domain. We found that our approach is able to discover frequent edit operations that have actually been applied before. Furthermore, Ockham is able to extract edit operations that are meaningful—in the sense of explaining model diferences through the edit operations they comprise—to practitioners in an industrial setting. We also discuss use cases (i.e., semantic lifting of model diferences and change profles) for the discovered edit operations in this industrial setting. We fnd that the edit operations discovered by Ockham can be used to better understand and simulate the evolution of models

    Techniques for the reverse engineering of banking malware

    Get PDF
    Malware attacks are a significant and frequently reported problem, adversely affecting the productivity of organisations and governments worldwide. The well-documented consequences of malware attacks include financial loss, data loss, reputation damage, infrastructure damage, theft of intellectual property, compromise of commercial negotiations, and national security risks. Mitiga-tion activities involve a significant amount of manual analysis. Therefore, there is a need for automated techniques for malware analysis to identify malicious behaviours. Research into automated techniques for malware analysis covers a wide range of activities. This thesis consists of a series of studies: an anal-ysis of banking malware families and their common behaviours, an emulated command and control environment for dynamic malware analysis, a technique to identify similar malware functions, and a technique for the detection of ransomware. An analysis of the nature of banking malware, its major malware families, behaviours, variants, and inter-relationships are provided in this thesis. In doing this, this research takes a broad view of malware analysis, starting with the implementation of the malicious behaviours through to detailed analysis using machine learning. The broad approach taken in this thesis differs from some other studies that approach malware research in a more abstract sense. A disadvantage of approaching malware research without domain knowledge, is that important methodology questions may not be considered. Large datasets of historical malware samples are available for countermea-sures research. However, due to the age of these samples, the original malware infrastructure is no longer available, often restricting malware operations to initialisation functions only. To address this absence, an emulated command and control environment is provided. This emulated environment provides full control of the malware, enabling the capabilities of the original in-the-wild operation, while enabling feature extraction for research purposes. A major focus of this thesis has been the development of a machine learn-ing function similarity method with a novel feature encoding that increases feature strength. This research develops techniques to demonstrate that the machine learning model trained on similarity features from one program can find similar functions in another, unrelated program. This finding can lead to the development of generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra. Further, this research examines the use of API call features for the identi-fication of ransomware and shows that a failure to consider malware analysis domain knowledge can lead to weaknesses in experimental design. In this case, we show that existing research has difficulty in discriminating between ransomware and benign cryptographic software. This thesis by publication, has developed techniques to advance the disci-pline of malware reverse engineering, in order to minimize harm due to cyber-attacks on critical infrastructure, government institutions, and industry.Doctor of Philosoph
    corecore