441 research outputs found

    Coverage-Based Debloating for Java Bytecode

    Full text link
    Software bloat is code that is packaged in an application but is actually not necessary to run the application. The presence of software bloat is an issue for security, for performance, and for maintenance. In this paper, we introduce a novel technique for debloating Java bytecode, which we call coverage-based debloating. We leverage a combination of state-of-the-art Java bytecode coverage tools to precisely capture what parts of a project and its dependencies are used at runtime. Then, we automatically remove the parts that are not covered to generate a debloated version of the compiled project. We successfully generate debloated versions of 220 open-source Java libraries, which are syntactically correct and preserve their original behavior according to the workload. Our results indicate that 68.3% of the libraries' bytecode and 20.5% of their total dependencies can be removed through coverage-based debloating. Meanwhile, we present the first experiment that assesses the utility of debloated libraries with respect to client applications that reuse them. We show that 80.9% of the clients with at least one test that uses the library successfully compile and pass their test suite when the original library is replaced by its debloated version

    Code Coverage Measurement and Fault Localization Approaches

    Get PDF
    Code coverage measurement plays an important role in white-box testing, both in industrial practice and academic research. Several areas are highly dependent on code coverage as well, including test case generation, test prioritization, fault localization, and others. Out of these areas, this dissertation focuses on two main topics, and the thesis points are divided into two parts accordingly. The first part consists of one thesis point that discusses the differences between methods for measuring code coverage in Java and the effects of these differences. The second part focuses on a fault localization technique called spectrum-based fault localization that utilizes code coverage to estimate the risk of each program element being faulty. More specifically, the corresponding two thesis points are discussing the improvement of the efficiency of spectrum-based approaches by incorporating external information, e.g., users’ knowledge, and context data extracted from call chains

    Workload characterization of JVM languages

    Get PDF
    Being developed with a single language in mind, namely Java, the Java Virtual Machine (JVM) nowadays is targeted by numerous programming languages. Automatic memory management, Just-In-Time (JIT) compilation, and adaptive optimizations provided by the JVM make it an attractive target for different language implementations. Even though being targeted by so many languages, the JVM has been tuned with respect to characteristics of Java programs only -- different heuristics for the garbage collector or compiler optimizations are focused more on Java programs. In this dissertation, we aim at contributing to the understanding of the workloads imposed on the JVM by both dynamically-typed and statically-typed JVM languages. We introduce a new set of dynamic metrics and an easy-to-use toolchain for collecting the latter. We apply our toolchain to applications written in six JVM languages -- Java, Scala, Clojure, Jython, JRuby, and JavaScript. We identify differences and commonalities between the examined languages and discuss their implications. Moreover, we have a close look at one of the most efficient compiler optimizations - method inlining. We present the decision tree of the HotSpot JVM's JIT compiler and analyze how well the JVM performs in inlining the workloads written in different JVM languages

    Bytecode-Based Multiple Condition Coverage: An Initial Investigation

    Get PDF
    Masking occurs when one condition prevents another from influencing the output of a Boolean expression. Adequacy criteria such as Multiple Condition Coverage (MCC) overcome masking within one expression, but offer no guarantees about subsequent expressions. As a result, a Boolean expression written as a single complex statement will yield more effective test cases than when written as a series of simple expressions. Many approaches to automated test case generation for Java operate not on the source code, but on bytecode. The transformation to bytecode simplifies complex expressions into multiple expressions, introducing masking. We propose Bytecode-MCC, a new adequacy criterion designed to group bytecode expressions and reformulate them into complex expressions. Bytecode-MCC should produce test obligations that are more likely to reveal faults in program logic than tests covering the simplified bytecode.A preliminary study shows potential improvements from attaining Bytecode-MCC coverage. However, Bytecode-MCC is difficult to optimize, and means of increasing coverage are needed before the technique can make a difference in practice. We propose potential methods to improve coverage

    Quantifying and Predicting the Influence of Execution Platform on Software Component Performance

    Get PDF
    The performance of software components depends on several factors, including the execution platform on which the software components run. To simplify cross-platform performance prediction in relocation and sizing scenarios, a novel approach is introduced in this thesis which separates the application performance profile from the platform performance profile. The approach is evaluated using transparent instrumentation of Java applications and with automated benchmarks for Java Virtual Machines

    Efficient branch and node testing

    Get PDF
    Software testing evaluates the correctness of a program’s implementation through a test suite. The quality of a test case or suite is assessed with a coverage metric indicating what percentage of a program’s structure was exercised (covered) during execution. Coverage of every execution path is impossible due to infeasible paths and loops that result in an exponential or infinite number of paths. Instead, metrics such as the number of statements (nodes) or control-flow branches covered are used. Node and branch coverage require instrumentation probes to be present during program runtime. Traditionally, probes were statically inserted during compilation. These static probes remain even after coverage is recorded, incurring unnecessary overhead, reducing the number of tests that can be run, or requiring large amounts of memory In this dissertation, I present three novel techniques for improving branch and node coverage performance for the Java runtime. First, Demand-driven Structural Testing (DDST) uses dynamic insertion and removal of probes so they can be removed after recording coverage, avoiding the unnecessary overhead of static instrumentation. DDST is built on a new framework for developing and researching coverage techniques, Jazz. DDST for node coverage averages 19.7% faster than statically-inserted instrumentation on an industry-standard benchmark suite, SPECjvm98. Due to DDST’s higher-cost probes, no single branch coverage technique performs best on all programs or methods. To address this, I developed Hybrid Structural Testing (HST). HST combines different test techniques, including static and DDST, into one run. HST uses a cost model for analysis, reducing the cost of branch coverage testing an average of 48% versus Static and 56% versus DDST on SPECjvm98. HST never chooses certain techniques due to expensive analysis. I developed a third technique, Test Plan Caching (TPC), that exploits the inherent repetition in testing over a suite. TPC saves analysis results to avoid recomputation. Combined with HST, TPC produces a mix of techniques that record coverage quickly and efficiently. My three techniques reduce the average cost of branch coverage by 51.6–90.8% over previous approaches on SPECjvm98, allowing twice as many test cases in a given time budget

    Malware Analysis and Privacy Policy Enforcement Techniques for Android Applications

    Get PDF
    The rapid increase in mobile malware and deployment of over-privileged applications over the years has been of great concern to the security community. Encroaching on user’s privacy, mobile applications (apps) increasingly exploit various sensitive data on mobile devices. The information gathered by these applications is sufficient to uniquely and accurately profile users and can cause tremendous personal and financial damage. On Android specifically, the security and privacy holes in the operating system and framework code has created a whole new dynamic for malware and privacy exploitation. This research work seeks to develop novel analysis techniques that monitor Android applications for possible unwanted behaviors and then suggest various ways to deal with the privacy leaks associated with them. Current state-of-the-art static malware analysis techniques on Android-focused mainly on detecting known variants without factoring any kind of software obfuscation. The dynamic analysis systems, on the other hand, are heavily dependent on extending the Android OS and/or runtime virtual machine. These methodologies often tied the system to a single Android version and/or kernel making it very difficult to port to a new device. In privacy, accesses to the database system’s objects are not controlled by any security check beyond overly-broad read/write permissions. This flawed model exposes the database contents to abuse by privacy-agnostic apps and malware. This research addresses the problems above in three ways. First, we developed a novel static analysis technique that fingerprints known malware based on three-level similarity matching. It scores similarity as a function of normalized opcode sequences found in sensitive functional modules and application permission requests. Our system has an improved detection ratio over current research tools and top COTS anti-virus products while maintaining a high level of resiliency to both simple and complex obfuscation. Next, we augment the signature-related weaknesses of our static classifier with a hybrid analysis system which incorporates bytecode instrumentation and dynamic runtime monitoring to examine unknown malware samples. Using the concept of Aspect-oriented programming, this technique involves recompiling security checking code into an unknown binary for data flow analysis, resource abuse tracing, and analytics of other suspicious behaviors. Our system logs all the intercepted activities dynamically at runtime without the need for building custom kernels. Finally, we designed a user-level privacy policy enforcement system that gives users more control over their personal data saved in the SQLite database. Using bytecode weaving for query re-writing and enforcing access control, our system forces new policies at the schema, column, and entity levels of databases without rooting or voiding device warranty

    Enhancing Search-based Testing with Testability Transformations for Existing APIs

    Get PDF
    Search-based software testing (SBST) has been shown to be an effective technique to generate test cases automatically. Its effectiveness strongly depends on the guidance of the fitness function. Unfortunately, a common issue in SBST is the so-called flag problem, where the fitness landscape presents a plateau that provides no guidance to the search. In this paper, we provide a series of novel testability transformations aimed at providing guidance in the context of commonly used API calls (e.g., strings that need to be converted into valid date/time objects). We also provide specific transformations aimed at helping the testing of REST Web Services. We implemented our novel techniques as an extension to EvoMaster, a SBST tool that generates system level test cases. Experiments on nine open-source REST web services, as well as an industrial web service, show that our novel techniques improve performance significantlyacceptedVersio
    • …
    corecore