3,563 research outputs found

    Variability Anomalies in Software Product Lines

    Get PDF
    Software Product Lines (SPLs) allow variants of a software system to be generated based on the configuration selected by the user. In this thesis, we focus on C based software systems with build-time variability using a build system and C preprocessor. Such systems usually consist of a configuration space, a code space, and a build space. The configuration space describes the features that the user can select and any configuration constraints between them. The features and the constraints between them are commonly documented in a variability model. The code and build spaces contain the actual implementation of the system where the former contains the C code files with conditional compilation directives (e.g., #ifdefs), and the latter contains the build scripts with conditionally compiled files. We study the relationship between the three spaces as follows: (1) we detect variability anomalies which arise due to inconsistencies among the three spaces, and (2) we use anomaly detection techniques to automatically extract configuration constraints from the implementation. For (1), we complement previous research which mainly focused on the relationship between the configuration space and code space. We additionally analyze the build space to ensure that the constraints in all three spaces are consistent. We detect inconsistencies, which we call variability anomalies, in particular dead and undead artifacts. Dead artifacts are conditional artifacts which are not included in any valid configuration while undead artifacts are those which are always included. We look for such anomalies at both the code block and source file levels using the Linux kernel as a case study. Our work shows that almost half the configurable features are only used to control source file compilation in Linux’s build system, KBUILD . We analyze KBUILD to extract file presence conditions which determine under which feature combinations is each file compiled. We show that by considering the build system, we can detect an additional 20% variability anomalies on the code block level when compared to only using the configuration and code spaces. Our work also shows that file level anomalies occur less frequently than block level ones. We analyze the evolution of the detected anomalies and identify some of their causes and fixes. For (2), we develop novel analyses to automatically extract configuration constraints from implementation and compare them to those in existing variability models. We rely on two means of detecting variability anomalies: (a) conditional build-time errors and (b) detecting under which conditions a feature has an effect on the compiled code (to avoid duplicate variants). We apply this to four real-world systems: uClibc, BusyBox, eCos, and the Linux kernel. We show that our extraction is 93% and 77% accurate respectively for the two means we use and that we can recover 19 % of the existing variability-model constraints using our approach. We qualitatively investigate the non-recovered constraints and find that many of them stem from domain knowledge. For systems with existing variability models, understanding where each constraint comes from can aid in traceability between the code and the model which can help in debugging conflicts. More importantly, in systems which do not have a formal variability model, automatically extracting constraints from code provides the basis for reverse engineering a variability model. Overall, we provide tools and techniques to help maintain and create software product lines. Our work helps to ensure the consistency of variability constraints scattered across SPLs and provides tools to help reverse engineer variability models

    A Multi-view Context-aware Approach to Android Malware Detection and Malicious Code Localization

    Full text link
    Existing Android malware detection approaches use a variety of features such as security sensitive APIs, system calls, control-flow structures and information flows in conjunction with Machine Learning classifiers to achieve accurate detection. Each of these feature sets provides a unique semantic perspective (or view) of apps' behaviours with inherent strengths and limitations. Meaning, some views are more amenable to detect certain attacks but may not be suitable to characterise several other attacks. Most of the existing malware detection approaches use only one (or a selected few) of the aforementioned feature sets which prevent them from detecting a vast majority of attacks. Addressing this limitation, we propose MKLDroid, a unified framework that systematically integrates multiple views of apps for performing comprehensive malware detection and malicious code localisation. The rationale is that, while a malware app can disguise itself in some views, disguising in every view while maintaining malicious intent will be much harder. MKLDroid uses a graph kernel to capture structural and contextual information from apps' dependency graphs and identify malice code patterns in each view. Subsequently, it employs Multiple Kernel Learning (MKL) to find a weighted combination of the views which yields the best detection accuracy. Besides multi-view learning, MKLDroid's unique and salient trait is its ability to locate fine-grained malice code portions in dependency graphs (e.g., methods/classes). Through our large-scale experiments on several datasets (incl. wild apps), we demonstrate that MKLDroid outperforms three state-of-the-art techniques consistently, in terms of accuracy while maintaining comparable efficiency. In our malicious code localisation experiments on a dataset of repackaged malware, MKLDroid was able to identify all the malice classes with 94% average recall

    Equivalence Partitioning as a Basis for Dynamic Conditional Invariant Detection

    Get PDF
    Program invariants are statements asserting properties of programs at certain points. They can assist developers and testers in understanding the program, and can be used for automated formal verification of the program. However, despite their usefulness they are often omitted from code. Dynamic invariant detection is a technique that discovers program invariants by observing execution of the program. One type of invariants that presents challenge to this technique is conditional invariants, which are considered to be computationally infeasible to be computed exhaustively. We present a new approach to assist conditional invariants detection, by analysing test suites used to drive the execution of the programs for their use of equivalence partitioning – a very common testing technique – and inferring conditional invariants from this information. A prototype implementation, named Yacon, is developed to work in conjunction with a mature dynamic invariant detection tool Daikon. Given a set of splitting conditions, Daikon can use them to infer conditional invariants. Yacon attempts to recover partitioning information from a given test suite, producing splitting conditions as a result. We introduced two strategies to recover partitioning information, one based on the presence of boundary value analysis testing technique; the other based on invariants within the test suite itself. We evaluated the effectiveness of each recovery strategy and the approach as a whole, and found that our approach can help make Daikon perform significantly better. However, the two recovery strategies only work well in limited circumstances, suggesting possible improvement in finding more effective recovery strategies

    Equivalence Partitioning as a Basis for Dynamic Conditional Invariant Detection

    Get PDF
    Program invariants are statements asserting properties of programs at certain points. They can assist developers and testers in understanding the program, and can be used for automated formal verification of the program. However, despite their usefulness they are often omitted from code. Dynamic invariant detection is a technique that discovers program invariants by observing execution of the program. One type of invariants that presents challenge to this technique is conditional invariants, which are considered to be computationally infeasible to be computed exhaustively. We present a new approach to assist conditional invariants detection, by analysing test suites used to drive the execution of the programs for their use of equivalence partitioning – a very common testing technique – and inferring conditional invariants from this information. A prototype implementation, named Yacon, is developed to work in conjunction with a mature dynamic invariant detection tool Daikon. Given a set of splitting conditions, Daikon can use them to infer conditional invariants. Yacon attempts to recover partitioning information from a given test suite, producing splitting conditions as a result. We introduced two strategies to recover partitioning information, one based on the presence of boundary value analysis testing technique; the other based on invariants within the test suite itself. We evaluated the effectiveness of each recovery strategy and the approach as a whole, and found that our approach can help make Daikon perform significantly better. However, the two recovery strategies only work well in limited circumstances, suggesting possible improvement in finding more effective recovery strategies

    Static optimization in PHP 7

    Get PDF
    PHP is a dynamically typed programming language commonly used for the server-side implementation of web applications. Approachability and ease of deployment have made PHP one of the most widely used scripting languages for the web, powering important web applications such as WordPress, Wikipedia, and Facebook. PHP's highly dynamic nature, while providing useful language features, also makes it hard to optimize statically. This paper reports on the implementation of purely static bytecode optimizations for PHP 7, the last major version of PHP. We discuss the challenge of integrating classical compiler optimizations, which have been developed in the context of statically-typed languages, into a programming language that is dynamically and weakly typed, and supports a plethora of dynamic language features. Based on a careful analysis of language semantics, we adapt static single assignment (SSA) form for use in PHP. Combined with type inference, this allows type-based specialization of instructions, as well as the application of various classical SSA-enabled compiler optimizations such as constant propagation or dead code elimination. We evaluate the impact of the proposed static optimizations on a wide collection of programs, including micro-benchmarks, libraries and web frameworks. Despite the dynamic nature of PHP, our approach achieves an average speedup of 50% on micro-benchmarks, 13% on computationally intensive libraries, as well as 1.1% (MediaWiki) and 3.5% (WordPress) on web applications

    A multiprocessor implementation of a contextual image processing algorithm

    Get PDF
    There are no author-identified significant results in this report

    Machine Learning Aided Static Malware Analysis: A Survey and Tutorial

    Full text link
    Malware analysis and detection techniques have been evolving during the last decade as a reflection to development of different malware techniques to evade network-based and host-based security protections. The fast growth in variety and number of malware species made it very difficult for forensics investigators to provide an on time response. Therefore, Machine Learning (ML) aided malware analysis became a necessity to automate different aspects of static and dynamic malware investigation. We believe that machine learning aided static analysis can be used as a methodological approach in technical Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware analysis that has been thoroughly studied before. In this paper, we address this research gap by conducting an in-depth survey of different machine learning methods for classification of static characteristics of 32-bit malicious Portable Executable (PE32) Windows files and develop taxonomy for better understanding of these techniques. Afterwards, we offer a tutorial on how different machine learning techniques can be utilized in extraction and analysis of a variety of static characteristic of PE binaries and evaluate accuracy and practical generalization of these techniques. Finally, the results of experimental study of all the method using common data was given to demonstrate the accuracy and complexity. This paper may serve as a stepping stone for future researchers in cross-disciplinary field of machine learning aided malware forensics.Comment: 37 Page
    • …
    corecore