6,376 research outputs found

    Combining Static and Dynamic Analysis for Vulnerability Detection

    Full text link
    In this paper, we present a hybrid approach for buffer overflow detection in C code. The approach makes use of static and dynamic analysis of the application under investigation. The static part consists in calculating taint dependency sequences (TDS) between user controlled inputs and vulnerable statements. This process is akin to program slice of interest to calculate tainted data- and control-flow path which exhibits the dependence between tainted program inputs and vulnerable statements in the code. The dynamic part consists of executing the program along TDSs to trigger the vulnerability by generating suitable inputs. We use genetic algorithm to generate inputs. We propose a fitness function that approximates the program behavior (control flow) based on the frequencies of the statements along TDSs. This runtime aspect makes the approach faster and accurate. We provide experimental results on the Verisec benchmark to validate our approach.Comment: There are 15 pages with 1 figur

    Leveraging Automated Unit Tests for Unsupervised Code Translation

    Get PDF
    With little to no parallel data available for programming languages, unsupervised methods are well-suited to source code translation. However, the majority of unsupervised machine translation approaches rely on back-translation, a method developed in the context of natural language translation and one that inherently involves training on noisy inputs. Unfortunately, source code is highly sensitive to small changes; a single token can result in compilation failures or erroneous programs, unlike natural languages where small inaccuracies may not change the meaning of a sentence. To address this issue, we propose to leverage an automated unit-testing system to filter out invalid translations, thereby creating a fully tested parallel corpus. We found that fine-tuning an unsupervised model with this filtered data set significantly reduces the noise in the translations so-generated, comfortably outperforming the state-of-the-art for all language pairs studied. In particular, for Java→Python and Python→C++ we outperform the best previous methods by more than 16% and 24% respectively, reducing the error rate by more than 35%

    Relay: A New IR for Machine Learning Frameworks

    Full text link
    Machine learning powers diverse services in industry including search, translation, recommendation systems, and security. The scale and importance of these models require that they be efficient, expressive, and portable across an array of heterogeneous hardware devices. These constraints are often at odds; in order to better accommodate them we propose a new high-level intermediate representation (IR) called Relay. Relay is being designed as a purely-functional, statically-typed language with the goal of balancing efficient compilation, expressiveness, and portability. We discuss the goals of Relay and highlight its important design constraints. Our prototype is part of the open source NNVM compiler framework, which powers Amazon's deep learning framework MxNet

    Multi-objective engineering shape optimization using differential evolution interfaced to the Nimrod/O tool

    Get PDF
    This paper presents an enhancement of the Nimrod/O optimization tool by interfacing DEMO, an external multiobjective optimization algorithm. DEMO is a variant of differential evolution – an algorithm that has attained much popularity in the research community, and this work represents the first time that true multiobjective optimizations have been performed with Nimrod/O. A modification to the DEMO code enables multiple objectives to be evaluated concurrently. With Nimrod/O’s support for parallelism, this can reduce the wall-clock time significantly for compute intensive objective function evaluations. We describe the usage and implementation of the interface and present two optimizations. The first is a two objective mathematical function in which the Pareto front is successfully found after only 30 generations. The second test case is the three-objective shape optimization of a rib-reinforced wall bracket using the Finite Element software, Code_Aster. The interfacing of the already successful packages of Nimrod/O and DEMO yields a solution that we believe can benefit a wide community, both industrial and academic

    Amortising the Cost of Mutation Based Fault Localisation using Statistical Inference

    Full text link
    Mutation analysis can effectively capture the dependency between source code and test results. This has been exploited by Mutation Based Fault Localisation (MBFL) techniques. However, MBFL techniques suffer from the need to expend the high cost of mutation analysis after the observation of failures, which may present a challenge for its practical adoption. We introduce SIMFL (Statistical Inference for Mutation-based Fault Localisation), an MBFL technique that allows users to perform the mutation analysis in advance against an earlier version of the system. SIMFL uses mutants as artificial faults and aims to learn the failure patterns among test cases against different locations of mutations. Once a failure is observed, SIMFL requires either almost no or very small additional cost for analysis, depending on the used inference model. An empirical evaluation of SIMFL using 355 faults in Defects4J shows that SIMFL can successfully localise up to 103 faults at the top, and 152 faults within the top five, on par with state-of-the-art alternatives. The cost of mutation analysis can be further reduced by mutation sampling: SIMFL retains over 80% of its localisation accuracy at the top rank when using only 10% of generated mutants, compared to results obtained without sampling

    Fuzz testing containerized digital forensics and incident response tools

    Get PDF
    Abstract. Open source digital forensics and incident response tools are increasingly important in detecting, responding to, and documenting hostile actions against organisations and systems. Programming errors in these tools can at minimum slow down incident response and at maximum be a major security issue potentially leading to arbitrary code execution. Many of these tools are developed by a single individual or a small team of developers and have not been comprehensively tested. The goal of this thesis was to find a way to fuzz test a large amount of containerized open source digital forensics and incident response tools. A framework was designed and implemented that allows fuzz testing any containerized command line based application. The framework was tested against 43 popular containerized open source digital forensics and incident response tools. As a result, out of 43 of the tested tools 24 had potential issues. Most critical issues were disclosed to the respective tools authors via their preferred communication method. The results show that currently many open source digital forensics and incident response tools have robustness issues and reaffirm that fuzzing is an efficient software testing method.Kontitettujen digitaaliforensiikka ja tietoturvahäiriön hallintatyökalujen fuzz-testaus. Tiivistelmä. Avoimen lähdekoodin digitaaliforensiikka ja tietoturvahäiriön hallintaan käytetyt työkalut ovat entistä tärkeämpiä tunnistamiseen, hallintaan ja dokumentoimiseen haitallisia toimintoja vastaan organisaatioissa ja järjestelmissä. Ohjelmointivirheet näissä työkaluissa voivat pienimmillään hidastaa tieturvahäiriön hallintaa ja enimmillään olla suuri tietoturvauhka, joka potentiaalisesti voi johtaa mielivaltaisen koodin suoritukseen. Monet näistä työkaluista ovat kehittäneet joko yksittäiset henkilöt tai pienet ohjelmistokehittäjätiimit eikä niitä välttämättä ole kattavasti testattu. Tämän diplomi-insinöörityön tavoitteena oli löytää tapa fuzz-testata suuri määrä kontitettuja avoimen lähdekoodin digitaaliforensiikka ja tietoturvahäiriön hallintaan käytettyjä työkaluja. Tähän tarkoitettu ohjelmisto suunniteltiin ja toteutettiin, joka mahdollistaa minkä tahansa kontitetun komentolinjaan perustuvan ohjelmiston fuzz-testaamisen. Ohjelmistoa testattiin 43 suosittua kontitettua avoimen lähdekoodin digitaaliforensiikka ja tietoturvahäiriön hallintaan käytettävää työkalua vastaan. Lopputuloksena testatuista työkalusta 24 löytyi ongelmia. Kriittisimmät löydetyistä ongelmista ilmoitettiin työkalujen vastaaville kehittäjille heidän suosimallaan kommunikaatiometodilla. Lopputulokset osoittavat, että tällä hetkellä monet avoimen lähdekoodin digitaaliforensiikka ja tietoturvahäiriöiden hallintaan käytettävät työkalut kärsivät ohjelmointivirheistä ja vahvistaa fuzz-testauksen olevan edelleen tehokas ohjelmistotestaus metodi

    Lightweight Multilingual Software Analysis

    Full text link
    Developer preferences, language capabilities and the persistence of older languages contribute to the trend that large software codebases are often multilingual, that is, written in more than one computer language. While developers can leverage monolingual software development tools to build software components, companies are faced with the problem of managing the resultant large, multilingual codebases to address issues with security, efficiency, and quality metrics. The key challenge is to address the opaque nature of the language interoperability interface: one language calling procedures in a second (which may call a third, or even back to the first), resulting in a potentially tangled, inefficient and insecure codebase. An architecture is proposed for lightweight static analysis of large multilingual codebases: the MLSA architecture. Its modular and table-oriented structure addresses the open-ended nature of multiple languages and language interoperability APIs. We focus here as an application on the construction of call-graphs that capture both inter-language and intra-language calls. The algorithms for extracting multilingual call-graphs from codebases are presented, and several examples of multilingual software engineering analysis are discussed. The state of the implementation and testing of MLSA is presented, and the implications for future work are discussed.Comment: 15 page

    Regression Test Selection Tool for Python in Continuous Integration Process

    Get PDF
    In this paper, we present a coverage-based regression test selection (RTS) approach and a developed tool for Python. The tool can be used either on a developer's machine or on build servers. A special characteristic of the tool is the attention to easy integration to continuous integration and deployment. To evaluate the performance of the proposed approach, mutation testing is applied to three open-source projects, and the results of the execution of full test suites are compared to the execution of a set of tests selected by the tool. The missed fault rate of the test selection varies between 0-2% at file-level granularity and 16-24% at line-level granularity. The high missed fault rate at the line-level granularity is related to the selected basic mutation approach and the result could be improved with advanced mutation techniques. Depending on the target optimization metric (time or precision) in DevOps/MLOps process the error rate could be acceptable or further improved by using file-level granularity based test selection.Peer reviewe
    corecore