3,563 research outputs found
Variability Anomalies in Software Product Lines
Software Product Lines (SPLs) allow variants of a software system to be generated based on the configuration selected by the user. In this thesis, we focus on C based software systems with build-time variability using a build system and C preprocessor. Such systems usually consist of a configuration space, a code space, and a build space. The configuration space describes the features that the user can select and any configuration constraints between them. The features and the constraints between them are commonly
documented in a variability model. The code and build spaces contain the actual implementation of the system where the former contains the C code files with conditional compilation directives (e.g., #ifdefs), and the latter contains the build scripts with conditionally compiled files. We study the relationship between the three spaces as follows: (1) we detect variability anomalies which arise due to inconsistencies among the
three spaces, and (2) we use anomaly detection techniques to automatically extract configuration constraints from the implementation.
For (1), we complement previous research which mainly focused on the relationship between the configuration space and code space. We additionally analyze the build space to ensure that the constraints in all three spaces are consistent. We detect inconsistencies, which we call variability anomalies, in particular dead and undead artifacts. Dead artifacts are conditional artifacts which are not included in any valid
configuration while undead artifacts are those which are always included. We look for such anomalies at both the code block and source file levels using the Linux kernel as a case study. Our work shows that almost half the configurable features are only used to control source file compilation in Linux’s build system, KBUILD . We analyze KBUILD to extract file presence conditions which determine under which feature combinations is each file compiled. We show that by considering the build system, we can detect an
additional 20% variability anomalies on the code block level when compared to only using the configuration and code spaces. Our work also shows that file level anomalies occur less frequently than block level ones. We analyze the evolution of the detected anomalies and identify some of their causes and fixes.
For (2), we develop novel analyses to automatically extract configuration constraints from implementation and compare them to those in existing variability models. We rely on two means of detecting variability anomalies: (a) conditional build-time errors and (b) detecting under which conditions a feature has an effect on the compiled code (to avoid duplicate variants). We apply this to four real-world systems: uClibc, BusyBox, eCos, and the Linux kernel. We show that our extraction is 93% and 77% accurate respectively
for the two means we use and that we can recover 19 % of the existing variability-model constraints using our approach. We qualitatively investigate the non-recovered constraints and find that many of them stem from domain knowledge. For systems with existing variability models, understanding where each constraint comes from can aid in traceability between the code and the model which can help in debugging conflicts.
More importantly, in systems which do not have a formal variability model, automatically extracting constraints from code provides the basis for reverse engineering a variability model.
Overall, we provide tools and techniques to help maintain and create software product lines. Our work helps to ensure the consistency of variability constraints scattered across SPLs and provides tools to help reverse engineer variability models
A Multi-view Context-aware Approach to Android Malware Detection and Malicious Code Localization
Existing Android malware detection approaches use a variety of features such
as security sensitive APIs, system calls, control-flow structures and
information flows in conjunction with Machine Learning classifiers to achieve
accurate detection. Each of these feature sets provides a unique semantic
perspective (or view) of apps' behaviours with inherent strengths and
limitations. Meaning, some views are more amenable to detect certain attacks
but may not be suitable to characterise several other attacks. Most of the
existing malware detection approaches use only one (or a selected few) of the
aforementioned feature sets which prevent them from detecting a vast majority
of attacks. Addressing this limitation, we propose MKLDroid, a unified
framework that systematically integrates multiple views of apps for performing
comprehensive malware detection and malicious code localisation. The rationale
is that, while a malware app can disguise itself in some views, disguising in
every view while maintaining malicious intent will be much harder.
MKLDroid uses a graph kernel to capture structural and contextual information
from apps' dependency graphs and identify malice code patterns in each view.
Subsequently, it employs Multiple Kernel Learning (MKL) to find a weighted
combination of the views which yields the best detection accuracy. Besides
multi-view learning, MKLDroid's unique and salient trait is its ability to
locate fine-grained malice code portions in dependency graphs (e.g.,
methods/classes). Through our large-scale experiments on several datasets
(incl. wild apps), we demonstrate that MKLDroid outperforms three
state-of-the-art techniques consistently, in terms of accuracy while
maintaining comparable efficiency. In our malicious code localisation
experiments on a dataset of repackaged malware, MKLDroid was able to identify
all the malice classes with 94% average recall
Equivalence Partitioning as a Basis for Dynamic Conditional Invariant Detection
Program invariants are statements asserting properties of programs at
certain points. They can assist developers and testers in understanding the
program, and can be used for automated formal verification of the program.
However, despite their usefulness they are often omitted from code. Dynamic
invariant detection is a technique that discovers program invariants
by observing execution of the program. One type of invariants that presents
challenge to this technique is conditional invariants, which are considered
to be computationally infeasible to be computed exhaustively. We present
a new approach to assist conditional invariants detection, by analysing test
suites used to drive the execution of the programs for their use of equivalence
partitioning – a very common testing technique – and inferring conditional
invariants from this information. A prototype implementation, named Yacon,
is developed to work in conjunction with a mature dynamic invariant
detection tool Daikon. Given a set of splitting conditions, Daikon can use
them to infer conditional invariants. Yacon attempts to recover partitioning
information from a given test suite, producing splitting conditions as a result.
We introduced two strategies to recover partitioning information, one
based on the presence of boundary value analysis testing technique; the other
based on invariants within the test suite itself. We evaluated the effectiveness
of each recovery strategy and the approach as a whole, and found that our
approach can help make Daikon perform significantly better. However, the
two recovery strategies only work well in limited circumstances, suggesting
possible improvement in finding more effective recovery strategies
Equivalence Partitioning as a Basis for Dynamic Conditional Invariant Detection
Program invariants are statements asserting properties of programs at
certain points. They can assist developers and testers in understanding the
program, and can be used for automated formal verification of the program.
However, despite their usefulness they are often omitted from code. Dynamic
invariant detection is a technique that discovers program invariants
by observing execution of the program. One type of invariants that presents
challenge to this technique is conditional invariants, which are considered
to be computationally infeasible to be computed exhaustively. We present
a new approach to assist conditional invariants detection, by analysing test
suites used to drive the execution of the programs for their use of equivalence
partitioning – a very common testing technique – and inferring conditional
invariants from this information. A prototype implementation, named Yacon,
is developed to work in conjunction with a mature dynamic invariant
detection tool Daikon. Given a set of splitting conditions, Daikon can use
them to infer conditional invariants. Yacon attempts to recover partitioning
information from a given test suite, producing splitting conditions as a result.
We introduced two strategies to recover partitioning information, one
based on the presence of boundary value analysis testing technique; the other
based on invariants within the test suite itself. We evaluated the effectiveness
of each recovery strategy and the approach as a whole, and found that our
approach can help make Daikon perform significantly better. However, the
two recovery strategies only work well in limited circumstances, suggesting
possible improvement in finding more effective recovery strategies
Static optimization in PHP 7
PHP is a dynamically typed programming language commonly used for the server-side implementation of web applications. Approachability and ease of deployment have made PHP one of the most widely used scripting languages for the web, powering important web applications such as WordPress, Wikipedia, and Facebook. PHP's highly dynamic nature, while providing useful language features, also makes it hard to optimize statically.
This paper reports on the implementation of purely static bytecode optimizations for PHP 7, the last major version of PHP. We discuss the challenge of integrating classical compiler optimizations, which have been developed in the context of statically-typed languages, into a programming language that is dynamically and weakly typed, and supports a plethora of dynamic language features. Based on a careful analysis of language semantics, we adapt static single assignment (SSA) form for use in PHP. Combined with type inference, this allows type-based specialization of instructions, as well as the application of various classical SSA-enabled compiler optimizations such as constant propagation or dead code elimination.
We evaluate the impact of the proposed static optimizations on a wide collection of programs, including micro-benchmarks, libraries and web frameworks. Despite the dynamic nature of PHP, our approach achieves an average speedup of 50% on micro-benchmarks, 13% on computationally intensive libraries, as well as 1.1% (MediaWiki) and 3.5% (WordPress) on web applications
A multiprocessor implementation of a contextual image processing algorithm
There are no author-identified significant results in this report
Machine Learning Aided Static Malware Analysis: A Survey and Tutorial
Malware analysis and detection techniques have been evolving during the last
decade as a reflection to development of different malware techniques to evade
network-based and host-based security protections. The fast growth in variety
and number of malware species made it very difficult for forensics
investigators to provide an on time response. Therefore, Machine Learning (ML)
aided malware analysis became a necessity to automate different aspects of
static and dynamic malware investigation. We believe that machine learning
aided static analysis can be used as a methodological approach in technical
Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware
analysis that has been thoroughly studied before. In this paper, we address
this research gap by conducting an in-depth survey of different machine
learning methods for classification of static characteristics of 32-bit
malicious Portable Executable (PE32) Windows files and develop taxonomy for
better understanding of these techniques. Afterwards, we offer a tutorial on
how different machine learning techniques can be utilized in extraction and
analysis of a variety of static characteristic of PE binaries and evaluate
accuracy and practical generalization of these techniques. Finally, the results
of experimental study of all the method using common data was given to
demonstrate the accuracy and complexity. This paper may serve as a stepping
stone for future researchers in cross-disciplinary field of machine learning
aided malware forensics.Comment: 37 Page
- …