1,362 research outputs found

    Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript

    Get PDF
    In static analysis of modern JavaScript libraries, relational analysis at key locations is critical to provide sound and useful results. Prior work addresses this challenge by the use of various forms of trace partitioning and syntactic patterns, which is fragile and does not scale well, or by incorporating complex backwards analysis. In this paper, we propose a new lightweight variant of trace partitioning named value partitioning that refines individual abstract values instead of entire abstract states. We describe how this approach can effectively capture important relational properties involving dynamic property accesses, functions with free variables, and predicate functions. Furthermore, we extend an existing JavaScript analyzer with value partitioning and demonstrate experimentally that it is a simple, precise, and efficient alternative to the existing approaches for analyzing widely used JavaScript libraries

    Static program analysis for string manipulation languages

    Get PDF
    In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in a wide range of fields and applications. Their tricky and misunderstood behaviors pose a hard challenge for static analysis of these programming languages. A key aspect of any dynamic language program is the multiple usage of strings, since they can be implicitly converted to another type value, transformed by string-to-code primitives or used to access an object-property. Unfortunately, string analyses for dynamic languages still lack precision and do not take into account some important string features. Moreover, string obfuscation is very popular in the context of dynamic language malicious code, for example, to hide code information inside strings and then to dynamically transform strings into executable code. In this scenario, more precise string analyses become a necessity. This paper is placed in the context of static string analysis by abstract interpretation and proposes a new semantics for string analysis, placing a first step for handling dynamic languages string features

    Static Analysis for Asynchronous JavaScript Programs

    Get PDF
    Asynchrony has become an inherent element of JavaScript, as an effort to improve the scalability and performance of modern web applications. To this end, JavaScript provides programmers with a wide range of constructs and features for developing code that performs asynchronous computations, including but not limited to timers, promises, and non-blocking I/O. However, the data flow imposed by asynchrony is implicit, and not always well-understood by the developers who introduce many asynchrony-related bugs to their programs. Worse, there are few tools and techniques available for analyzing and reasoning about such asynchronous applications. In this work, we address this issue by designing and implementing one of the first static analysis schemes capable of dealing with almost all the asynchronous primitives of JavaScript up to the 7th edition of the ECMAScript specification. Specifically, we introduce the callback graph, a representation for capturing data flow between asynchronous code. We exploit the callback graph for designing a more precise analysis that respects the execution order between different asynchronous functions. We parameterize our analysis with one novel context-sensitivity flavor, and we end up with multiple analysis variations for building callback graph. We performed a number of experiments on a set of hand-written and real-world JavaScript programs. Our results show that our analysis can be applied to medium-sized programs achieving 79% precision, on average. The findings further suggest that analysis sensitivity is beneficial for the vast majority of the benchmarks. Specifically, it is able to improve precision by up to 28.5%, while it achieves an 88% precision on average without highly sacrificing performance

    Static Program Analysis for String Manipulation Languages

    Get PDF
    In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in a wide range of fields and applications. Their tricky and misunderstood behaviors pose a hard challenge for static analysis of these programming languages. A key aspect of any dynamic language program is the multiple usage of strings, since they can be implicitly converted to another type value, transformed by string-to-code primitives or used to access an object-property. Unfortunately, string analyses for dynamic languages still lack precision and do not take into account some important string features. Moreover, string obfuscation is very popular in the context of dynamic language malicious code, for example, to hide code information inside strings and then to dynamically transform strings into executable code. In this scenario, more precise string analyses become a necessity. This paper is placed in the context of static string analysis by abstract interpretation and proposes a new semantics for string analysis, placing a first step for handling dynamic languages string features.Comment: In Proceedings VPT 2019, arXiv:1908.0672

    Reviewer Integration and Performance Measurement for Malware Detection

    Full text link
    We present and evaluate a large-scale malware detection system integrating machine learning with expert reviewers, treating reviewers as a limited labeling resource. We demonstrate that even in small numbers, reviewers can vastly improve the system's ability to keep pace with evolving threats. We conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years and containing 1.1 million binaries with 778GB of raw feature data. Without reviewer assistance, we achieve 72% detection at a 0.5% false positive rate, performing comparable to the best vendors on VirusTotal. Given a budget of 80 accurate reviews daily, we improve detection to 89% and are able to detect 42% of malicious binaries undetected upon initial submission to VirusTotal. Additionally, we identify a previously unnoticed temporal inconsistency in the labeling of training datasets. We compare the impact of training labels obtained at the same time training data is first seen with training labels obtained months later. We find that using training labels obtained well after samples appear, and thus unavailable in practice for current training data, inflates measured detection by almost 20 percentage points. We release our cluster-based implementation, as well as a list of all hashes in our evaluation and 3% of our entire dataset.Comment: 20 papers, 11 figures, accepted at the 13th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2016

    Static Analysis for ECMAScript String Manipulation Programs

    Get PDF
    In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in a wide range of fields and applications. Their tricky and misunderstood behaviors pose a great challenge for static analysis of these languages. A key aspect of any dynamic language program is the multiple usage of strings, since they can be implicitly converted to another type value, transformed by string-to-code primitives or used to access an object-property. Unfortunately, string analyses for dynamic languages still lack precision and do not take into account some important string features. In this scenario, more precise string analyses become a necessity. The goal of this paper is to place a first step for precisely handling dynamic language string features. In particular, we propose a new abstract domain approximating strings as finite state automata and an abstract interpretation-based static analysis for the most common string manipulating operations provided by the ECMAScript specification. The proposed analysis comes with a prototype static analyzer implementation for an imperative string manipulating language, allowing us to show and evaluate the improved precision of the proposed analysis

    Security analyses for detecting deserialisation vulnerabilities : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North, New Zealand

    Get PDF
    An important task in software security is to identify potential vulnerabilities. Attackers exploit security vulnerabilities in systems to obtain confidential information, to breach system integrity, and to make systems unavailable to legitimate users. In recent years, particularly 2012, there has been a rise in reported Java vulnerabilities. One type of vulnerability involves (de)serialisation, a commonly used feature to store objects or data structures to an external format and restore them. In 2015, a deserialisation vulnerability was reported involving Apache Commons Collections, a popular Java library, which affected numerous Java applications. Another major deserialisation-related vulnerability that affected 55\% of Android devices was reported in 2015. Both of these vulnerabilities allowed arbitrary code execution on vulnerable systems by malicious users, a serious risk, and this came as a call for the Java community to issue patches to fix serialisation related vulnerabilities in both the Java Development Kit and libraries. Despite attention to coding guidelines and defensive strategies, deserialisation remains a risky feature and a potential weakness in object-oriented applications. In fact, deserialisation related vulnerabilities (both denial-of-service and remote code execution) continue to be reported for Java applications. Further, deserialisation is a case of parsing where external data is parsed from their external representation to a program's internal data structures and hence, potentially similar vulnerabilities can be present in parsers for file formats and serialisation languages. The problem is, given a software package, to detect either injection or denial-of-service vulnerabilities and propose strategies to prevent attacks that exploit them. The research reported in this thesis casts detecting deserialisation related vulnerabilities as a program analysis task. The goal is to automatically discover this class of vulnerabilities using program analysis techniques, and to experimentally evaluate the efficiency and effectiveness of the proposed methods on real-world software. We use multiple techniques to detect reachability to sensitive methods and taint analysis to detect if untrusted user-input can result in security violations. Challenges in using program analysis for detecting deserialisation vulnerabilities include addressing soundness issues in analysing dynamic features in Java (e.g., native code). Another hurdle is that available techniques mostly target the analysis of applications rather than library code. In this thesis, we develop techniques to address soundness issues related to analysing Java code that uses serialisation, and we adapt dynamic techniques such as fuzzing to address precision issues in the results of our analysis. We also use the results from our analysis to study libraries in other languages, and check if they are vulnerable to deserialisation-type attacks. We then provide a discussion on mitigation measures for engineers to protect their software against such vulnerabilities. In our experiments, we show that we can find unreported vulnerabilities in Java code; and how these vulnerabilities are also present in widely-used serialisers for popular languages such as JavaScript, PHP and Rust. In our study, we discovered previously unknown denial-of-service security bugs in applications/libraries that parse external data formats such as YAML, PDF and SVG
    • …
    corecore