5 research outputs found

    Polymorphic Type Inference for the JNI

    Get PDF
    We present a multi-lingual type inference system for checking type safety of programs that use the Java Native Interface (JNI). The JNI uses specially-formatted strings to represent class and field names as well as method signatures, and so our type system tracks the flow of string constants through the program. Our system embeds string variables in types, and as those variables are resolved to string constants during inference they are replaced with the structured types the constants represent. This restricted form of dependent types allows us to directly assign type signatures to each of the more than 200 functions in the JNI. Moreover, it allows us to infer types for user-defined functions that are parameterized by Java type strings, which we have found to be common practice. Our inference system allows such functions to be treated polymorphically by using instantiation constraints, solved with semi-unification, at function calls. Finally, we have implemented our system and applied it to a small set of benchmarks. Although semi-unification is undecidable, we found our system to be scalable and effective in practice. We discovered 155 errors 36 cases of suspicious programming practices in our benchmarks

    Improving Quality of Software with Foreign Function Interfaces using Static Analysis

    Get PDF
    A Foreign Function Interface (FFI) is a mechanism that allows software written in one host programming language to directly use another foreign programming language by invoking function calls across language boundaries. Today\u27s software development often utilizes FFIs to reuse software components. Examples of such systems are the Java Development Kit (JDK), Android mobile OS, and Python packages in the Fedora LINUX operating systems. The use of FFIs, however, requires extreme care and can introduce undesired side effects that degrade software quality. In this thesis, we aim to improve several quality aspects of software composed of FFIs by applying static analysis. The thesis investigates several particular characteristics of FFIs and studies software bugs caused by the misuse of FFIs. We choose two FFIs, the Java Native Interface (JNI) and the Python/C interface, as the main subjects of this dissertation. To reduce software security vulnerabilities introduced by the JNI, we first propose definitions of new patterns of bugs caused by the improper exception handlings between Java and C. We then present the design and implement a bug finding system to uncover these bugs. To ensure software safety and reliability in multithreaded environment, we present a novel and efficient system that ensures atomicity in the JNI. Finally, to improve software performance and reliability, we design and develop a framework for finding errors in memory management in programs written with the Python/C interface. The framework is built by applying affine abstraction and affine analysis of reference-counts of Python objects. This dissertation offers a comprehensive study of FFIs and software composed of FFIs. The research findings make several contributions to the studies of static analysis and to the improvement of software quality

    Towards Better Static Analysis Security Testing Methodologies

    Get PDF
    Software vulnerabilities have been a significant attack surface used in cyberattacks, which have been escalating recently. Software vulnerabilities have caused substantial damage, and thus there are many techniques to guard against them. Nevertheless, detecting and eliminating software vulnerabilities from the source code is the best and most effective solution in terms of protection and cost. Static Analysis Security Testing (SAST) tools spot vulnerabilities and help programmers to remove the vulnerabilities. The fundamental problem is that modern software continues to evolve and shift, making detecting vulnerabilities more difficult. Hence, this thesis takes a step toward highlighting the features required to be present in the SAST tools to address software vulnerabilities in modern software. The thesis’s end goal is to introduce SAST methods and tools to detect the dominant type of software vulnerabilities in modern software. The investigation first focuses on state-of-theart SAST tools when working with large-scale modern software. The research examines how different state-of-the-art SAST tools react to different types of warnings over time, and measures SAST tools precision of different types of warnings. The study presumption is that the SAST tools’ precision can be obtained from studying real-world projects’ history and SAST tools that generated warnings over time. The empirical analysis in this study then takes a further step to look at the problem from a different angle, starting at the real-world vulnerabilities detected by individuals and published in well-known vulnerabilities databases. Android application vulnerabilities are used as an example of modern software vulnerabilities. This study aims to measure the recall of SAST tools when they work with modern software vulnerabilities and understand how software vulnerabilities manifest in the real world. We find that buffer errors that belong to the input validation and representation class of vulnerability dominate modern software. Also, we find that studied state-of-the-art SAST tools failed to identify real-world vulnerabilities. To address the issue of detecting vulnerabilities in modern software, we introduce two methodologies. The first methodology is a coarse-grain method that targets helping taint static analysis methods to tackle two aspects of the complexity of modern software. One aspect is that one vulnerability can be scattered across different languages in a single application making the analysis harder to achieve. The second aspect is that the number of sources and sinks is high and increasing over time, which can be hard for taint analysis to cover such a high number of sources and sinks. We implement the proposed methodology in a tool called Source Sink (SoS) that filters out the source and sink pairs that do not have feasible paths. Then, another fine-grain methodology focuses on discovering buffer errors that occur in modern software. The method performs taint analysis to examine the reachability between sources and sinks and looks for "validators" that validates the untrusted input. We implemented methodology in a tool called Buffer Error Finder (BEFinder)

    Building a Typed Scripting Language

    Get PDF
    Since the 1990s, scripting languages (e.g. Python, Ruby, JavaScript, and many others) have gained widespread popularity. Features such as ad-hoc data manipulation, dynamic structural typing, and terse syntax permit rapid engineering and improve developer productivity. Unfortunately, programs written in scripting languages execute slower and are less scalable than those written in traditional languages (such as C or Java) due to the challenge of statically analyzing scripting languages' semantics. Although various research projects have made progress on this front, corner cases in the semantics of existing scripting languages continue to defy static analysis and software engineers must generally still choose between program performance and programmer performance when selecting a language. We address that dichotomy in this dissertation by designing a scripting language with the intent of statically analyzing it. We select a set of core primitives in which common language features such as object-orientation and case analysis can be encoded and give a sound and decidable type inference system for it. Our type theory is based on subtype constraint systems but is also closely related to abstract interpretation; we use this connection to guide development of the type system and to employ a novel type soundness proof strategy based on simulation. At the heart of our approach is a type indexed record we call the onion which supports asymmetric concatenation and dispatch; we use onions to formally encode a variety of features, including records, operator overloading, objects, and mixins. An optimistic call-site polymorphism model defined herein captures the ad-hoc, case-analysis-based reasoning often used in scripting languages. Although the language in this dissertation uses a particular set of core primitives, the strategy we use to design it is general: we demonstrate a simple, formulaic process for adding features such as integers and state
    corecore