148 research outputs found

    Automatic translation of assembly shellcodes to printable byte codes

    Get PDF
    The generation of printable shellcode is an important computer security research area. The original idea of the printable shellcode generation was to write a binary, executable code in a way that the generated byte code contains only bytes that are represented by the English letters, numbers and punctuation characters. In this way unfortunately only a limited number of CPU instructions can be used. In the originally published paper a small decoder is written with instructions represented by printable characters and the shellcode is decoded on the stack to be executed later. This paper, however describes a proof of concept project, which converts the source code of a full assembly program or shellcode to a new source code, whose compiled binary code contains only printable characters. The paper also presents new, printable character implementation of some CPU instructions

    EVIL: Exploiting Software via Natural Language

    Get PDF
    Writing exploits for security assessment is a challenging task. The writer needs to master programming and obfuscation techniques to develop a successful exploit. To make the task easier, we propose an approach (EVIL) to automatically generate exploits in assembly/Python language from descriptions in natural language. The approach leverages Neural Machine Translation (NMT) techniques and a dataset that we developed for this work. We present an extensive experimental study to evaluate the feasibility of EVIL, using both automatic and manual analysis, and both at generating individual statements and entire exploits. The generated code achieved high accuracy in terms of syntactic and semantic correctness

    Enhancing Robustness of AI Offensive Code Generators via Data Augmentation

    Full text link
    In this work, we present a method to add perturbations to the code descriptions, i.e., new inputs in natural language (NL) from well-intentioned developers, in the context of security-oriented code, and analyze how and to what extent perturbations affect the performance of AI offensive code generators. Our experiments show that the performance of the code generators is highly affected by perturbations in the NL descriptions. To enhance the robustness of the code generators, we use the method to perform data augmentation, i.e., to increase the variability and diversity of the training data, proving its effectiveness against both perturbed and non-perturbed code descriptions

    Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation

    Get PDF
    Neural Machine Translation (NMT) has reached a level of maturity to be recognized as the premier method for the translation between different languages and aroused interest in different research areas, including software engineering. A key step to validate the robustness of the NMT models consists in evaluating the performance of the models on adversarial inputs, i.e., inputs obtained from the original ones by adding small amounts of perturbation. However, when dealing with the specific task of the code generation (i.e., the generation of code starting from a description in natural language), it has not yet been defined an approach to validate the robustness of the NMT models. In this work, we address the problem by identifying a set of perturbations and metrics tailored for the robustness assessment of such models. We present a preliminary experimental evaluation, showing what type of perturbations affect the model the most and deriving useful insights for future directions.Comment: Paper accepted for publication in the proceedings of The 1st Intl. Workshop on Natural Language-based Software Engineering (NLBSE) to be held with ICSE 202

    06. Computer Science

    Get PDF

    Automating the Correctness Assessment of AI-generated Code for Security Contexts

    Full text link
    In this paper, we propose a fully automated method, named ACCA, to evaluate the correctness of AI-generated code for security purposes. The method uses symbolic execution to assess whether the AI-generated code behaves as a reference implementation. We use ACCA to assess four state-of-the-art models trained to generate security-oriented assembly code and compare the results of the evaluation with different baseline solutions, including output similarity metrics, widely used in the field, and the well-known ChatGPT, the AI-powered language model developed by OpenAI. Our experiments show that our method outperforms the baseline solutions and assesses the correctness of the AI-generated code similar to the human-based evaluation, which is considered the ground truth for the assessment in the field. Moreover, ACCA has a very strong correlation with human evaluation (Pearson's correlation coefficient r=0.84 on average). Finally, since it is a fully automated solution that does not require any human intervention, the proposed method performs the assessment of every code snippet in ~0.17s on average, which is definitely lower than the average time required by human analysts to manually inspect the code, based on our experience

    Combatting Advanced Persistent Threat via Causality Inference and Program Analysis

    Get PDF
    Cyber attackers are becoming more and more sophisticated. In particular, Advanced Persistent Threat (APT) is a new class of attack that targets a specifc organization and compromises systems over a long time without being detected. Over the years, we have seen notorious examples of APTs including Stuxnet which disrupted Iranian nuclear centrifuges and data breaches affecting millions of users. Investigating APT is challenging as it occurs over an extended period of time and the attack process is highly sophisticated and stealthy. Also, preventing APTs is diffcult due to ever-expanding attack vectors. In this dissertation, we present proposals for dealing with challenges in attack investigation. Specifcally, we present LDX which conducts precise counter-factual causality inference to determine dependencies between system calls (e.g., between input and output system calls) and allows investigators to determine the origin of an attack (e.g., receiving a spam email) and the propagation path of the attack, and assess the consequences of the attack. LDX is four times more accurate and two orders of magnitude faster than state-of-the-art taint analysis techniques. Moreover, we then present a practical model-based causality inference system, MCI, which achieves precise and accurate causality inference without requiring any modifcation or instrumentation in end-user systems. Second, we show a general protection system against a wide spectrum of attack vectors and methods. Specifcally, we present A2C that prevents a wide range of attacks by randomizing inputs such that any malicious payloads contained in the inputs are corrupted. The protection provided by A2C is both general (e.g., against various attack vectors) and practical (7% runtime overhead)

    A novel intrusion detection system for internet of things devices and data

    Get PDF
    As we enter the new age of the Internet of Things (IoT) and wearable gadgets, sensors, and embedded devices are extensively used for data aggregation and its transmission. The extent of the data processed by IoT networks makes it vulnerable to outside attacks. Therefore, it is important to design an intrusion detection system (IDS) that ensures the security, integrity, and confidentiality of IoT networks and their data. State-of-the-art IDSs have poor detection capabilities and incur high communication and device overhead, which is not ideal for IoT applications requiring secured and real-time processing. This research presents a teaching-learning-based optimization enabled intrusion detection system (TLBO-IDS) which effectively protects IoT networks from intrusion attacks and also ensures low overhead at the same time. The proposed TLBO-IDS can detect analysis attacks, fuzzing attacks, shellcode attacks, worms, denial of service (Dos) attacks, exploits, and backdoor intrusion attacks. TLBO-IDS is extensively tested and its performance is compared with state-of-the-art algorithms. In particular, TLBO-IDS outperforms the bat algorithm and genetic algorithm (GA)
    • …
    corecore