5 research outputs found
Mining domain-specific edit operations from model repositories with applications to semantic lifting of model differences and change profiling
Model transformations are central to model-driven software development. Applications of model transformations include creating models, handling model co-evolution, model merging, and understanding model evolution. In the past, various (semi-)
automatic approaches to derive model transformations from meta-models or from
examples have been proposed. These approaches require time-consuming handcrafting or the recording of concrete examples, or they are unable to derive complex
transformations. We propose a novel unsupervised approach, called Ockham, which
is able to learn edit operations from model histories in model repositories. Ockham
is based on the idea that meaningful domain-specifc edit operations are the ones
that compress the model diferences. It employs frequent subgraph mining to discover frequent structures in model diference graphs. We evaluate our approach in
two controlled experiments and one real-world case study of a large-scale industrial
model-driven architecture project in the railway domain. We found that our approach
is able to discover frequent edit operations that have actually been applied before.
Furthermore, Ockham is able to extract edit operations that are meaningful—in the
sense of explaining model diferences through the edit operations they comprise—to
practitioners in an industrial setting. We also discuss use cases (i.e., semantic lifting of model diferences and change profles) for the discovered edit operations in
this industrial setting. We fnd that the edit operations discovered by Ockham can be
used to better understand and simulate the evolution of models
Techniques for the reverse engineering of banking malware
Malware attacks are a significant and frequently reported problem, adversely affecting the productivity of organisations and governments worldwide. The well-documented consequences of malware attacks include financial loss, data loss, reputation damage, infrastructure damage, theft of intellectual property, compromise of commercial negotiations, and national security risks. Mitiga-tion activities involve a significant amount of manual analysis. Therefore, there is a need for automated techniques for malware analysis to identify malicious behaviours. Research into automated techniques for malware analysis covers a wide range of activities. This thesis consists of a series of studies: an anal-ysis of banking malware families and their common behaviours, an emulated command and control environment for dynamic malware analysis, a technique to identify similar malware functions, and a technique for the detection of ransomware. An analysis of the nature of banking malware, its major malware families, behaviours, variants, and inter-relationships are provided in this thesis. In doing this, this research takes a broad view of malware analysis, starting with the implementation of the malicious behaviours through to detailed analysis using machine learning. The broad approach taken in this thesis differs from some other studies that approach malware research in a more abstract sense. A disadvantage of approaching malware research without domain knowledge, is that important methodology questions may not be considered. Large datasets of historical malware samples are available for countermea-sures research. However, due to the age of these samples, the original malware infrastructure is no longer available, often restricting malware operations to initialisation functions only. To address this absence, an emulated command and control environment is provided. This emulated environment provides full control of the malware, enabling the capabilities of the original in-the-wild operation, while enabling feature extraction for research purposes. A major focus of this thesis has been the development of a machine learn-ing function similarity method with a novel feature encoding that increases feature strength. This research develops techniques to demonstrate that the machine learning model trained on similarity features from one program can find similar functions in another, unrelated program. This finding can lead to the development of generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra. Further, this research examines the use of API call features for the identi-fication of ransomware and shows that a failure to consider malware analysis domain knowledge can lead to weaknesses in experimental design. In this case, we show that existing research has difficulty in discriminating between ransomware and benign cryptographic software. This thesis by publication, has developed techniques to advance the disci-pline of malware reverse engineering, in order to minimize harm due to cyber-attacks on critical infrastructure, government institutions, and industry.Doctor of Philosoph
Recommended from our members
Representing Program Edits with the Choice Calculus
The problem of supporting more advanced selective undo operations has received a lot of attention. However, selective undo is generally missing in commonly used editors. Moreover, partial selective undo, the ability of undoing just part of some edit so that other edits may be undone, is not supported at all. We observe that a fundamental obstacle is the lack of a more flexible and compositional edit model. This project addresses this issue and proposes the choice edit model, which is based on the representation provided by the choice calculus. The central idea is to represent an edit through a choice that contains the old and the new code as alternatives. Edits inherit properties from choices and can thus be composed, nested, and transformed so that dependent edits may be untangled and undone partially. The choice representation is an internal representation, not meant to be exposed to programmers directly. To communicate the structure and dependencies of edits we introduce program edit graphs as an alternative, more abstract representation.
Program edit graphs explicitly represent program variants and their relations. We also discuss the scalability of PEGs