195 research outputs found

    Automatic Root Cause Quantification for Missing Edges in JavaScript Call Graphs (Extended Version)

    Full text link
    Building sound and precise static call graphs for real-world JavaScript applications poses an enormous challenge, due to many hard-to-analyze language features. Further, the relative importance of these features may vary depending on the call graph algorithm being used and the class of applications being analyzed. In this paper, we present a technique to automatically quantify the relative importance of different root causes of call graph unsoundness for a set of target applications. The technique works by identifying the dynamic function data flows relevant to each call edge missed by the static analysis, correctly handling cases with multiple root causes and inter-dependent calls. We apply our approach to perform a detailed study of the recall of a state-of-the-art call graph construction technique on a set of framework-based web applications. The study yielded a number of useful insights. We found that while dynamic property accesses were the most common root cause of missed edges across the benchmarks, other root causes varied in importance depending on the benchmark, potentially useful information for an analysis designer. Further, with our approach, we could quickly identify and fix a recall issue in the call graph builder we studied, and also quickly assess whether a recent analysis technique for Node.js-based applications would be helpful for browser-based code. All of our code and data is publicly available, and many components of our technique can be re-used to facilitate future studies.Comment: Extended version of ECOOP'22 paper (with appendix

    Identifying Bugs in Make and JVM-Oriented Builds

    Full text link
    Incremental and parallel builds are crucial features of modern build systems. Parallelism enables fast builds by running independent tasks simultaneously, while incrementality saves time and computing resources by processing the build operations that were affected by a particular code change. Writing build definitions that lead to error-free incremental and parallel builds is a challenging task. This is mainly because developers are often unable to predict the effects of build operations on the file system and how different build operations interact with each other. Faulty build scripts may seriously degrade the reliability of automated builds, as they cause build failures, and non-deterministic and incorrect build results. To reason about arbitrary build executions, we present buildfs, a generally-applicable model that takes into account the specification (as declared in build scripts) and the actual behavior (low-level file system operation) of build operations. We then formally define different types of faults related to incremental and parallel builds in terms of the conditions under which a file system operation violates the specification of a build operation. Our testing approach, which relies on the proposed model, analyzes the execution of single full build, translates it into buildfs, and uncovers faults by checking for corresponding violations. We evaluate the effectiveness, efficiency, and applicability of our approach by examining hundreds of Make and Gradle projects. Notably, our method is the first to handle Java-oriented build systems. The results indicate that our approach is (1) able to uncover several important issues (245 issues found in 45 open-source projects have been confirmed and fixed by the upstream developers), and (2) orders of magnitude faster than a state-of-the-art tool for Make builds

    H-CFA: a Simplified Approach for Pushdown Control Flow Analysis

    Get PDF
    In control flow analysis (CFA), call/return mismatch is a problem that reduces analysis precision. So-called k-CFA uses bounded call-strings to obtain limited call/return matching, but it has a serious performance problem due to its coupling of call/return matching with context-sensitivity of values. CFA2 and PDCFA are the first two algorithms that bring pushdown (context-free reachability) approach to the CFA area, which provide perfect call/return mathcing. However, CFA2 and PDCFA both need significant engineering effort to implement. The abstracting abstract machine (AAM), a configurable framework for constructing abstract interpreters, introduces store-allocated continuations that make the soundness of abstract interpreters easily obtainable. Recently, two related approaches (AAC and P4F) provide call/return matching using AAM by modeling the call-stack as a pushdown system. However, AAC incurs high overhead and is hard to understand, while P4F cannot compute monovariant analysis. To overcome the above shortcomings, we developed a new method, h-CFA, to address the call/return mismatch problem. h-CFA records the program execution history during abstract interpretation and uses it to avoid control flow merging that causes call/return mismatch. Our method uses AAM and is very easy to implement for ANF style program. ANF is a popular intermediate representation of programs that converts all complex intra-procedural control flows to linear let-bindings and sets a syntactic variable to each sub-expression. In addition, our method reveals an essential property of any pushdown CFA, which we exploited in the development of a static analyzer for JavaScript, named JsCFA. This application of the essential property avoids recording the program execution history, so source programs are no long required being the ANF form. Meanwhile, JsCFA adopts a technique to solve the environment problem or fake rebinding, which eliminates more defects of monovariant analysis. This, in cooperation with exact call/return matching, yield more precise analysis and better performance. Moreover, JsCFA supports a configurable interface to add context-sensitivity to selected areas of programs. JsCFA applies the interface to improve the analysis precision for runtime object extensions. Finally, we quantitatively evaluated the performance of JsCFA

    Automatic Creation of SQL Injection and Cross-Site Scripting Attacks

    Get PDF
    We present a technique for finding security vulnerabilitiesin Web applications. SQL Injection (SQLI) and cross-sitescripting (XSS) attacks are widespread forms of attackin which the attacker crafts the input to the application toaccess or modify user data and execute malicious code. Inthe most serious attacks (called second-order, or persistent,XSS), an attacker can corrupt a database so as to causesubsequent users to execute malicious code.This paper presents an automatic technique for creatinginputs that expose SQLI and XSS vulnerabilities. The techniquegenerates sample inputs, symbolically tracks taintsthrough execution (including through database accesses),and mutates the inputs to produce concrete exploits. Oursis the first analysis of which we are aware that preciselyaddresses second-order XSS attacks.Our technique creates real attack vectors, has few falsepositives, incurs no runtime overhead for the deployed application,works without requiring modification of applicationcode, and handles dynamic programming-languageconstructs. We implemented the technique for PHP, in a toolArdilla. We evaluated Ardilla on five PHP applicationsand found 68 previously unknown vulnerabilities (23 SQLI,33 first-order XSS, and 12 second-order XSS)

    Fuzzy Logic Based Software Product Quality Model for Execution Tracing

    Get PDF
    This report presents the research carried out in the area of software product quality modelling. Its main endeavour is to consider software product quality with regard to maintainability. Supporting this aim, execution tracing quality, which is a neglected property of the software product quality at present in the quality frameworks under investigation, needs to be described by a model that offers possibilities to link to the overall software product quality frameworks. The report includes concise description of the research objectives: (1) the thorough investigation of software product quality frameworks from the point of view of the quality property analysability with regard to execution tracing; (2) moreover, extension possibilities of software product quality frameworks, and (3) a pilot quality model developed for execution tracing quality, which is capable to capture subjective uncertainty associated with the software quality measurement. The report closes with concluding remarks: (1) the present software quality frameworks do not exhibit any property to describe execution tracing quality, (2) execution tracing has a significant impact on the analysability of software systems that increases with the complexity, and (3) the uncertainty associated with execution tracing quality can adequately be expressed by type-1 fuzzy logic. The section potential future work outlines directions into which the research could be continued. Findings of the research were summarized in two research reports, which were also incorporated in the thesis, and submitted for publication: 1. Tamas Galli, Francisco Chiclana, Jenny Carter, Helge Janicke, “Towards Introducing Execution Tracing to Software Product Quality Frameworks,” Acta Polytechnica Hungarica, vol. 11, no. 3, pp. 5-24, 2014. doi: 10.12700/APH.11.03.2014.03.1 2. Tamas Galli, Francisco Chiclana, Jenny Carter, Helge Janicke “Modelling Execution Tracing Quality by Means of Type-1 Fuzzy Logic,” Acta Polytechnica Hungarica, vol. 10, no. 8, pp. 49-67, 2013. doi: 10.12700/APH.10.08.2013.8.

    Mitigating the Uncertainty and Imprecision of Log-Based Code Coverage Without Requiring Additional Logging Statements

    Get PDF
    Understanding code coverage is an important precursor to software maintenance activities (e.g., better testing). Although modern code coverage tools provide key insights, they typically rely on code instrumentation, resulting in significant performance overhead. An alternative approach to code instrumentation is to process an application’s source code and the associated log traces in tandem. This so-called “log-based code coverage” approach does not impose the same performance overhead as code instrumentation. Previous work has introduced LogCoCo — a tool that implements log-based code coverage for Java. While LogCoCo breaks important new ground, it has fundamental limitations, namely: uncertainty due to the lack of logging statements in conditional branches, and imprecision caused by dependency injection. In this thesis, we propose Log2Cov, a tool that generates log-based code coverage for programs written in Python and addresses uncertainty and imprecision issues. We evaluate Log2Cov on three large and active open-source systems. More specifically, we compare the performance of Log2Cov to that of Coverage.py, an instrumentation-based coverage tool for Python. Our results indicate that 1) Log2Cov achieves high precision without introducing runtime overhead; and 2) uncertainty and imprecision can be reduced by up to 11% by statically analyzing the program’s source code and execution logs, without requiring additional logging instrumentation from developers. While our enhancements make substantial improvements, we find that future work is needed to handle conditional statements and exception handling blocks to achieve parity with instrumentation-based approaches. We conclude the thesis by drawing attention to these promising directions for future work
    • …
    corecore