7 research outputs found

    Automated Software Transplantation

    Get PDF
    Automated program repair has excited researchers for more than a decade, yet it has yet to find full scale deployment in industry. We report our experience with SAPFIX: the first deployment of automated end-to-end fault fixing, from test case design through to deployed repairs in production code. We have used SAPFIX at Facebook to repair 6 production systems, each consisting of tens of millions of lines of code, and which are collectively used by hundreds of millions of people worldwide. In its first three months of operation, SAPFIX produced 55 repair candidates for 57 crashes reported to SAPFIX, of which 27 have been deem as correct by developers and 14 have been landed into production automatically by SAPFIX. SAPFIX has thus demonstrated the potential of the search-based repair research agenda by deploying, to hundreds of millions of users worldwide, software systems that have been automatically tested and repaired. Automated software transplantation (autotransplantation) is a form of automated software engineering, where we use search based software engineering to be able to automatically move a functionality of interest from a ‘donor‘ program that implements it into a ‘host‘ program that lacks it. Autotransplantation is a kind of automated program repair where we repair the ‘host‘ program by augmenting it with the missing functionality. Automated software transplantation would open many exciting avenues for software development: suppose we could autotransplant code from one system into another, entirely unrelated, system, potentially written in a different programming language. Being able to do so might greatly enhance the software engineering practice, while reducing the costs. Automated software transplantation manifests in two different flavors: monolingual, when the languages of the host and donor programs is the same, or multilingual when the languages differ. This thesis introduces a theory of automated software transplantation, and two algorithms implemented in two tools that achieve this: µSCALPEL for monolingual software transplantation and τSCALPEL for multilingual software transplantation. Leveraging lightweight annotation, program analysis identifies an organ (interesting behavior to transplant); testing validates that the organ exhibits the desired behavior during its extraction and after its implantation into a host. We report encouraging results: in 14 of 17 monolingual transplantation experiments involving 6 donors and 4 hosts, popular real-world systems, we successfully autotransplanted 6 new functionalities; and in 10 out of 10 multlingual transplantation experiments involving 10 donors and 10 hosts, popular real-world systems written in 4 different programming languages, we successfully autotransplanted 10 new functionalities. That is, we have passed all the test suites that validates the new functionalities behaviour and the fact that the initial program behaviour is preserved. Additionally, we have manually checked the behaviour exercised by the organ. Autotransplantation is also very useful: in just 26 hours computation time we successfully autotransplanted the H.264 video encoding functionality from the x264 system to the VLC media player, a task that is currently done manually by the developers of VLC, since 12 years ago. We autotransplanted call graph generation and indentation for C programs into Kate, (a popular KDE based test editor used as an IDE by a lot of C developers) two features currently missing from Kate, but requested by the users of Kate. Autotransplantation is also efficient: the total runtime across 15 monolingual transplants is 5 hours and a half; the total runtime across 10 multilingual transplants is 33 hours

    Computer Aided Verification

    Get PDF
    This open access two-volume set LNCS 13371 and 13372 constitutes the refereed proceedings of the 34rd International Conference on Computer Aided Verification, CAV 2022, which was held in Haifa, Israel, in August 2022. The 40 full papers presented together with 9 tool papers and 2 case studies were carefully reviewed and selected from 209 submissions. The papers were organized in the following topical sections: Part I: Invited papers; formal methods for probabilistic programs; formal methods for neural networks; software Verification and model checking; hyperproperties and security; formal methods for hardware, cyber-physical, and hybrid systems. Part II: Probabilistic techniques; automata and logic; deductive verification and decision procedures; machine learning; synthesis and concurrency. This is an open access book

    Automatic Detection and Repair of Input Validation and Sanitization Bugs

    Get PDF
    A crucial problem in developing dependable web applications is thecorrectness of the input validation and sanitization. Bugs in stringmanipulation operations used for validation and sanitization are common,resulting in erroneous application behavior and vulnerabilities that areexploitable by malicious users. In this dissertation, we investigate theproblem of automatic detection and repair of validation and sanitization bugsboth at the client-side (JavaScript) and the server-side (PHP or Java) code.We first present a formal model for input validation and sanitizationfunctions along with a new domain specific intermediate languageto represent them. Then, we show how to extract input validation andsanitization functions in our intermediate language from both client andserver-side code in web applications. After the extraction phase, we useautomata-based static string-analysis techniques to automatically verifyand fix the extracted functions. One of our contributions is the developmentof efficient automata-based string analysis techniques for frequently used,complex string operations.We developed two basic approaches to bug detection and repair: 1)policy-based, and 2) differential. In the policy-based approach, inputvalidation and sanitization policies are expressed using two regularexpressions, one specifying the maximum policy (the upper bound for theset of strings that should be allowed) and the other specifying the minimumpolicy (the lower bound for the set of strings that should be allowed). Usingour string analysis techniques we can identify two types of errors inan input validation and sanitization function: 1) it accepts a set of strings thatis not permitted by the maximum policy (i.e., it is under-constrained),or 2) it rejects a set of strings that is permitted by the minimum policy(i.e., it is over-constrained).Our differential bug detection and repair approach does not require anypolicy specifications. It exploits the fact that, in web applications,developers typically perform redundant input validation and sanitizationin both the client and the server-side since client-side checks canbe by-passed. Using automata-based string analysis, we compare theinput validation and sanitization functions extracted from the client- andserver-side code, and identify and report the inconsistencies between them.Finally, we present an automated differential repair technique that canrepair client and server-side code with respect to each other, or acrossapplications in order to strengthen the validation and sanitizationchecks. Given a reference and a target function, our differential repairtechnique strengthens the validation and sanitization operations in thetarget function based on the reference function by automatically generatinga set of patches.We experimented with a number of real world web applications and found manybugs and vulnerabilities. Our analysis generates counter-example behaviorsdemonstrating the detected bugs and vulnerabilities to help the developerswith the debugging process. Moreover, we automatically generate patchesthat can be used to mitigate the detected bugs and vulnerabilities untildevelopers write their own patches

    A pattern-driven corpus to predictive analytics in mitigating SQL injection attack

    Get PDF
    The back-end database provides accessible and structured storage for each web application’s big data internet web traffic exchanges stemming from cloud-hosted web applications to the Internet of Things (IoT) smart devices in emerging computing. Structured Query Language Injection Attack (SQLIA) remains an intruder’s exploit of choice to steal confidential information from the database of vulnerable front-end web applications with potentially damaging security ramifications.Existing solutions to SQLIA still follows the on-premise web applications server hosting concept which were primarily developed before the recent challenges of the big data mining and as such lack the functionality and ability to cope with new attack signatures concealed in a large volume of web requests. Also, most organisations’ databases and services infrastructure no longer reside on-premise as internet cloud-hosted applications and services are increasingly used which limit existing Structured Query Language Injection (SQLI) detection and prevention approaches that rely on source code scanning. A bio-inspired approach such as Machine Learning (ML) predictive analytics provides functional and scalable mining for big data in the detection and prevention of SQLI in intercepting large volumes of web requests. Unfortunately, lack of availability of robust ready-made data set with patterns and historical data items to train a classifier are issues well known in SQLIA research applying ML in the field of Artificial Intelligence (AI). The purpose-built competition-driven test case data sets are antiquated and not pattern-driven to train a classifier for real-world application. Also, the web application types are so diverse to have an all-purpose generic data set for ML SQLIA mitigation.This thesis addresses the lack of pattern-driven data set by deriving one to predict SQLIA of any size and proposing a technique to obtain a data set on the fly and break the circle of relying on few outdated competitions-driven data sets which exist are not meant to benchmark real-world SQLIA mitigation. The thesis in its contributions derived pattern-driven data set of related member strings that are used in training a supervised learning model with validation through Receiver Operating Characteristic (ROC) curve and Confusion Matrix (CM) with results of low false positives and negatives. We further the evaluations with cross-validation to have obtained a low variance in accuracy that indicates of a successful trained model using the derived pattern-driven data set capable of generalisation of unknown data in the real-world with reduced biases. Also, we demonstrated a proof of concept with a test application by implementing an ML Predictive Analytics to SQLIA detection and prevention using this pattern-driven data set in a test web application. We observed in the experiments carried out in the course of this thesis, a data set of related member strings can be generated from a web expected input data and SQL tokens, including known SQLI signatures. The data set extraction ontology proposed in this thesis for applied ML in SQLIA mitigation in the context of emerging computing of big data internet, and cloud-hosted services set our proposal apart from existing approaches that were mostly on-premise source code scanning and queries structure comparisons of some sort

    Other things besides number: Abstraction, constraint propagation, and string variable types

    Full text link

    Symbolic automata constraint solving

    No full text
    Abstract. Constraints over regular and context-free languages are common in the context of string-manipulating programs. Efficient solving of such constraints, often in combination with arithmetic and other theories, has many useful applications in program analysis and testing. We introduce and evaluate a method for symbolically expressing and solving constraints over automata, including subset constraints. Our method uses techniques present in the state-of-the-art SMT solver Z3.
    corecore