63 research outputs found

    Large Language Models for Software Engineering: Survey and Open Problems

    Full text link
    This paper provides a survey of the emerging area of Large Language Models (LLMs) for Software Engineering (SE). It also sets out open research challenges for the application of LLMs to technical problems faced by software engineers. LLMs' emergent properties bring novelty and creativity with applications right across the spectrum of Software Engineering activities including coding, design, requirements, repair, refactoring, performance improvement, documentation and analytics. However, these very same emergent properties also pose significant technical challenges; we need techniques that can reliably weed out incorrect solutions, such as hallucinations. Our survey reveals the pivotal role that hybrid techniques (traditional SE plus LLMs) have to play in the development and deployment of reliable, efficient and effective LLM-based SE

    Protecting Systems From Exploits Using Language-Theoretic Security

    Get PDF
    Any computer program processing input from the user or network must validate the input. Input-handling vulnerabilities occur in programs when the software component responsible for filtering malicious input---the parser---does not perform validation adequately. Consequently, parsers are among the most targeted components since they defend the rest of the program from malicious input. This thesis adopts the Language-Theoretic Security (LangSec) principle to understand what tools and research are needed to prevent exploits that target parsers. LangSec proposes specifying the syntactic structure of the input format as a formal grammar. We then build a recognizer for this formal grammar to validate any input before the rest of the program acts on it. To ensure that these recognizers represent the data format, programmers often rely on parser generators or parser combinators tools to build the parsers. This thesis propels several sub-fields in LangSec by proposing new techniques to find bugs in implementations, novel categorizations of vulnerabilities, and new parsing algorithms and tools to handle practical data formats. To this end, this thesis comprises five parts that tackle various tenets of LangSec. First, I categorize various input-handling vulnerabilities and exploits using two frameworks. First, I use the mismorphisms framework to reason about vulnerabilities. This framework helps us reason about the root causes leading to various vulnerabilities. Next, we built a categorization framework using various LangSec anti-patterns, such as parser differentials and insufficient input validation. Finally, we built a catalog of more than 30 popular vulnerabilities to demonstrate the categorization frameworks. Second, I built parsers for various Internet of Things and power grid network protocols and the iccMAX file format using parser combinator libraries. The parsers I built for power grid protocols were deployed and tested on power grid substation networks as an intrusion detection tool. The parser I built for the iccMAX file format led to several corrections and modifications to the iccMAX specifications and reference implementations. Third, I present SPARTA, a novel tool I built that generates Rust code that type checks Portable Data Format (PDF) files. The type checker I helped build strictly enforces the constraints in the PDF specification to find deviations. Our checker has contributed to at least four significant clarifications and corrections to the PDF 2.0 specification and various open-source PDF tools. In addition to our checker, we also built a practical tool, PDFFixer, to dynamically patch type errors in PDF files. Fourth, I present ParseSmith, a tool to build verified parsers for real-world data formats. Most parsing tools available for data formats are insufficient to handle practical formats or have not been verified for their correctness. I built a verified parsing tool in Dafny that builds on ideas from attribute grammars, data-dependent grammars, and parsing expression grammars to tackle various constructs commonly seen in network formats. I prove that our parsers run in linear time and always terminate for well-formed grammars. Finally, I provide the earliest systematic comparison of various data description languages (DDLs) and their parser generation tools. DDLs are used to describe and parse commonly used data formats, such as image formats. Next, I conducted an expert elicitation qualitative study to derive various metrics that I use to compare the DDLs. I also systematically compare these DDLs based on sample data descriptions available with the DDLs---checking for correctness and resilience

    Security Testing of Embedded TLS Implementations

    Get PDF
    The Transport Layer Security (TLS) protocol is by far the most popular cryptographic protocol used to secure data being exchanged on the Internet. The latest version, TLS 1.3, provides a set of security properties on top of what was already included in previous versions, such as channel binding, downgrade protection, and non-replayability, along with improved performance and exclusion of less secure algorithms that were used before. However, vulnerabilities are constantly found and reported in implementations of TLS, especially those developed specifically for embedded devices, as they require code optimizations that sometimes leave necessary checks out of the picture, giving rise to security flaws. To improve the quality and security of these embedded implementations, it is recommended to apply software testing throughout the development process. Fuzz testing, or simply fuzzing, is an effective testing technique that has been successfully used in the past to find bugs and vulnerabilities in different kinds of applications. In fuzzing tests, semi-valid test cases are generated randomly either by modifying valid seeds or by following a specification or model and fed as input to the target program, while monitoring its behavior. Unfortunately, most existing fuzzing tools are focused on file-based or standard input applications, and they are not very effective for testing cryptographic protocols where input messages are subject to integrity checks and need to be encrypted and decrypted constantly. Additionally, to the best of our knowledge, there is a lack of a comprehensive tool or set of guidelines that specify how to test embedded TLS implementations. In this thesis, we develop a testing framework that can be used to measure the security of embedded TLS implementations by combining fuzzing techniques with hand-crafted test cases. We then use this framework to test HTLS, a TLS library developed by Huawei

    On the Caching Schemes to Speed Up Program Reduction

    Get PDF
    Program reduction is a highly practical, widely demanded technique to help debug language tools, such as compilers, interpreters and debuggers. Given a program P which exhibits a property ψ, conceptually, program reduction iteratively applies various program transformations to generate a vast number of variants from P by deleting certain tokens, and returns the minimal variant preserving ψ as the result. A program reduction process inevitably generates duplicate variants, and the number of them can be significant. Our study reveals that on average 62.3% of the generated variants in HDD, a state-of-the-art program reducer, are duplicates. Checking them against ψ is thus redundant and unnecessary, which wastes time and computation resources. Although it seems that simply caching the generated variants can avoid redundant property tests, such a trivial method is impractical in the real world due to the significant memory footprint. Therefore, a memory-efficient caching scheme for program reduction is in great demand. This thesis is the first effort to conduct systematic, extensive analysis of memory-efficient caching schemes for program reduction. We first propose to use two well-known compression methods, i.e., ZIP and SHA, to compress the generated variants before they are stored in the cache. Furthermore, our keen understanding on the program reduction process motivates us to propose a novel, domain-specific, both memory and computation-efficient caching scheme, Refreshable Compact Caching (RCC). Our key insight is two-fold: 1) by leveraging the correlation between variants and the original program P, we losslessly encode each variant into an equivalent, compact, canonical representation; 2) we periodically remove stale cache entries to minimize the memory footprint over time. Our evaluation on 20 real-world C compiler bugs demonstrates that caching schemes help avoid issuing redundant queries by 62.3%; correspondingly, the runtime performance is notably boosted by 15.6%. With regard to the memory efficiency, all three methods use less memory than the state-of-the-art string-based scheme STR. ZIP and SHA cut down the memory footprint by 73.99% and 99.74%, compared to STR; more importantly, the highly-scalable, domain-specific RCC dominates peer schemes, and outperforms the second-best SHA by 89.0%

    Programming Languages and Systems

    Get PDF
    This open access book constitutes the proceedings of the 31st European Symposium on Programming, ESOP 2022, which was held during April 5-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 21 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems

    Metamorphic Testing for Software Libraries and Graphics Compilers

    Get PDF
    Metamorphic Testing is a testing technique which mutates existing test cases in semantically equivalent forms, by making use of metamorphic relations, while avoiding the oracle problem. However, these required relations are not readily available for a given system under test. Defining effective metamorphic relations is difficult, and arguably the main obstacle towards adoption of metamorphic testing in production-level software development. One example application is testing graphics compilers, where the approximate and under-specified nature of the domain makes it hard to apply more traditional techniques. We propose an approach with a lower barrier of entry to applying metamorphic testing for a software library. The user must still identify relations that hold over their particular library, but can do so within a development-like environment. We apply methods from the domains of metamorphic testing and fuzzing to produce complex test cases. We consider the user interaction a bonus, as they can control what parts of the target codebase is tested, potentially focusing on less-tested or critical sections of the codebase. We implement our proposed approach in a tool, MF++, which synthesises C++ test cases for a C++ library, defined by user-provided ingredients. We applied MF++ to 7 libraries in the domains of satisfiability modulo theories and Presburger arithmetic,. Our evaluation of MF++ was able to identify 21 bugs in these tools. We additionally provide an automatic reducer for tests generated by MF++, named MF++R. In addition to minimising tests exposing issues, MF++R can also be used to identify incorrect user-provided relations. Additionally, we investigate the combined use of MF++ and MF++R in order to augment code coverage of library test suites. We assess the utility of this application by contributing 21 tests aimed at improving coverage across 3 libraries.Open Acces

    Hashing fuzzing: introducing input diversity to improve crash detection

    Get PDF
    The utility of a test set of program inputs is strongly influenced by its diversity and its size. Syntax coverage has become a standard proxy for diversity. Although more sophisticated measures exist, such as proximity of a sample to a uniform distribution, methods to use them tend to be type dependent. We use r-wise hash functions to create a novel, semantics preserving, testability transformation for C programs that we call HashFuzz. Use of HashFuzz improves the diversity of test sets produced by instrumentation-based fuzzers. We evaluate the effect of the HashFuzz transformation on eight programs from the Google Fuzzer Test Suite using four state-of-the-art fuzzers that have been widely used in previous research. We demonstrate pronounced improvements in the performance of the test sets for the transformed programs across all the fuzzers that we used. These include strong improvements in diversity in every case, maintenance or small improvement in branch coverage – up to 4.8% improvement in the best case, and significant improvement in unique crash detection numbers – between 28% to 97% increases compared to test sets for untransformed program

    Automated analysis of security protocol implementations

    Get PDF
    Security protocols, or cryptographic protocols, are crucial to the functioning of today’s technology-dependant society. They are a fundamental innovation, without which much of our online activity, mobile communication and even transport signalling would not be possible. The reason for their importance is simple, communication over shared or publicly accessible networks is vulnerable to interception, manipulation, and impersonation. It is the role of security protocols to prevent this, allowing for safe and secure communication. Our reliance on these protocols for such critical tasks, means it is essential to engineer them with great care, just like we do with bridges or a safety-critical aircraft engine control system, for example. As with all types of engineering, there are two key elements to this process – design and implementation. In this thesis we produce techniques to analyse the latter. In particular, we develop automated tooling which helps to identify incorrect or vulnerable behaviour in the implementations of security protocols. The techniques we present follow a theme of trying to infer as much as we can about the protocol logic implemented in a system, with as little access to it’s inner workings as possible. In general, we do this through observations of protocol messages on the network, executing the system, but treating it as a black-box. Within this particular framework, we design two new techniques – one which identifies a specific vulnerability in TLS/SSL, and another, more general approach, which systematically extracts a protocol behaviour model from protocols like the WiFi security handshakes. We then argue that it his framework limits the potential of model extraction, and proceed to develop a solution to this problem by utilising grey-box insights. Our proposed approach, which we test on a variety of security protocols, represents a paradigm shift in the well established model learning field. Throughout this thesis, as well as presenting general results from testing the efficacy of our tools, we also present a number of vulnerabilities we discover in the process. This ranges from major banking apps vulnerable to Man-In-The-Middle attacks, to CVE assigned ciphersuite downgrades in popular WiFi routers
    • …
    corecore