6,851 research outputs found

    Static Type Analysis by Abstract Interpretation of Python Programs

    Get PDF

    Improving Quality of Software with Foreign Function Interfaces using Static Analysis

    Get PDF
    A Foreign Function Interface (FFI) is a mechanism that allows software written in one host programming language to directly use another foreign programming language by invoking function calls across language boundaries. Today\u27s software development often utilizes FFIs to reuse software components. Examples of such systems are the Java Development Kit (JDK), Android mobile OS, and Python packages in the Fedora LINUX operating systems. The use of FFIs, however, requires extreme care and can introduce undesired side effects that degrade software quality. In this thesis, we aim to improve several quality aspects of software composed of FFIs by applying static analysis. The thesis investigates several particular characteristics of FFIs and studies software bugs caused by the misuse of FFIs. We choose two FFIs, the Java Native Interface (JNI) and the Python/C interface, as the main subjects of this dissertation. To reduce software security vulnerabilities introduced by the JNI, we first propose definitions of new patterns of bugs caused by the improper exception handlings between Java and C. We then present the design and implement a bug finding system to uncover these bugs. To ensure software safety and reliability in multithreaded environment, we present a novel and efficient system that ensures atomicity in the JNI. Finally, to improve software performance and reliability, we design and develop a framework for finding errors in memory management in programs written with the Python/C interface. The framework is built by applying affine abstraction and affine analysis of reference-counts of Python objects. This dissertation offers a comprehensive study of FFIs and software composed of FFIs. The research findings make several contributions to the studies of static analysis and to the improvement of software quality

    Context-Sensitive Abstract Interpretation of Dynamic Languages

    Full text link
    There is a vast gap in the quality of IDE tooling between static languages like Java and dynamic languages like Python or JavaScript. Modern frameworks and libraries in these languages heavily use their dynamic capabilities to achieve the best ergonomics and readability. This has a side effect of making the current generation of IDEs blind to control flow and data flow, which often breaks navigation, autocompletion and refactoring. In this thesis we propose an algorithm that can bridge this gap between tooling for dynamic and static languages by statically analyzing dynamic metaprogramming and runtime reflection in programs. We use a technique called abstract interpretation to partially execute programs and extract information that is usually only available at runtime. Our algorithm has been implemented in a prototype analyzer that can analyze programs written in a subset of JavaScript

    Data-flow Analysis of Programs with Associative Arrays

    Full text link
    Dynamic programming languages, such as PHP, JavaScript, and Python, provide built-in data structures including associative arrays and objects with similar semantics-object properties can be created at run-time and accessed via arbitrary expressions. While a high level of security and safety of applications written in these languages can be of a particular importance (consider a web application storing sensitive data and providing its functionality worldwide), dynamic data structures pose significant challenges for data-flow analysis making traditional static verification methods both unsound and imprecise. In this paper, we propose a sound and precise approach for value and points-to analysis of programs with associative arrays-like data structures, upon which data-flow analyses can be built. We implemented our approach in a web-application domain-in an analyzer of PHP code.Comment: In Proceedings ESSS 2014, arXiv:1405.055

    A machine learning approach to portfolio pricing and risk management for high-dimensional problems

    Full text link
    We present a general framework for portfolio risk management in discrete time, based on a replicating martingale. This martingale is learned from a finite sample in a supervised setting. The model learns the features necessary for an effective low-dimensional representation, overcoming the curse of dimensionality common to function approximation in high-dimensional spaces. We show results based on polynomial and neural network bases. Both offer superior results to naive Monte Carlo methods and other existing methods like least-squares Monte Carlo and replicating portfolios.Comment: 30 pages (main), 10 pages (appendix), 3 figures, 22 table

    Security analyses for detecting deserialisation vulnerabilities : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North, New Zealand

    Get PDF
    An important task in software security is to identify potential vulnerabilities. Attackers exploit security vulnerabilities in systems to obtain confidential information, to breach system integrity, and to make systems unavailable to legitimate users. In recent years, particularly 2012, there has been a rise in reported Java vulnerabilities. One type of vulnerability involves (de)serialisation, a commonly used feature to store objects or data structures to an external format and restore them. In 2015, a deserialisation vulnerability was reported involving Apache Commons Collections, a popular Java library, which affected numerous Java applications. Another major deserialisation-related vulnerability that affected 55\% of Android devices was reported in 2015. Both of these vulnerabilities allowed arbitrary code execution on vulnerable systems by malicious users, a serious risk, and this came as a call for the Java community to issue patches to fix serialisation related vulnerabilities in both the Java Development Kit and libraries. Despite attention to coding guidelines and defensive strategies, deserialisation remains a risky feature and a potential weakness in object-oriented applications. In fact, deserialisation related vulnerabilities (both denial-of-service and remote code execution) continue to be reported for Java applications. Further, deserialisation is a case of parsing where external data is parsed from their external representation to a program's internal data structures and hence, potentially similar vulnerabilities can be present in parsers for file formats and serialisation languages. The problem is, given a software package, to detect either injection or denial-of-service vulnerabilities and propose strategies to prevent attacks that exploit them. The research reported in this thesis casts detecting deserialisation related vulnerabilities as a program analysis task. The goal is to automatically discover this class of vulnerabilities using program analysis techniques, and to experimentally evaluate the efficiency and effectiveness of the proposed methods on real-world software. We use multiple techniques to detect reachability to sensitive methods and taint analysis to detect if untrusted user-input can result in security violations. Challenges in using program analysis for detecting deserialisation vulnerabilities include addressing soundness issues in analysing dynamic features in Java (e.g., native code). Another hurdle is that available techniques mostly target the analysis of applications rather than library code. In this thesis, we develop techniques to address soundness issues related to analysing Java code that uses serialisation, and we adapt dynamic techniques such as fuzzing to address precision issues in the results of our analysis. We also use the results from our analysis to study libraries in other languages, and check if they are vulnerable to deserialisation-type attacks. We then provide a discussion on mitigation measures for engineers to protect their software against such vulnerabilities. In our experiments, we show that we can find unreported vulnerabilities in Java code; and how these vulnerabilities are also present in widely-used serialisers for popular languages such as JavaScript, PHP and Rust. In our study, we discovered previously unknown denial-of-service security bugs in applications/libraries that parse external data formats such as YAML, PDF and SVG

    Doctor of Philosophy

    Get PDF
    dissertationToday's smartphones house private and confidential data ubiquitously. Mobile apps running on the devices can leak sensitive information by accident or intentionally. To understand application behaviors before running a program, we need to statically analyze it, tracking what data are accessed, where sensitive data ow, and what operations are performed with the data. However, automated identification of malicious behaviors in Android apps is challenging: First, there is a primary challenge in analyzing object-oriented programs precisely, soundly and efficiently, especially in the presence of exceptions. Second, there is an Android-specific challenge|asynchronous execution of multiple entry points. Third, the maliciousness of any given behavior is application-dependent and subject to human judgment. In this work, I develop a generic, highly precise static analysis of object-oriented code with multiple entry points, on which I construct an eective malware identification system with a human in the loop. Specically, I develop a new analysis-pushdown exception-ow analysis, to generalize the analysis of normal control flows and exceptional flows in object-oriented programs. To rene points-to information, I generalize abstract garbage collection to object-oriented programs and enhance it with liveness analysis for even better precision. To tackle Android-specic challenges, I develop multientry point saturation to approximate the eect of arbitrary asynchronous events. To apply the analysis techniques to security, I develop a static taint- ow analysis to track and propagate tainted sensitive data in the push-down exception-flow framework. To accelerate the speed of static analysis, I develop a compact and ecient encoding scheme, called G odel hashes, and integrate it into the analysis framework. All the techniques are realized and evaluated in a system, named AnaDroid. AnaDroid is designed with a human in the loop to specify analysis conguration, properties of interest and then to make the nal judgment and identify where the maliciousness is, based on analysis results. The analysis results include control- ow graphs highlighting suspiciousness, permission and risk-ranking reports. The experiments show that AnaDroid can lead to precise and fast identication of common classes of Android malware
    • …
    corecore