148 research outputs found
Automatic translation of assembly shellcodes to printable byte codes
The generation of printable shellcode is an important computer security research area. The original idea of the printable shellcode generation was to write a binary, executable code in a way that the generated byte code contains only bytes that are represented by the English letters, numbers and punctuation characters. In this way unfortunately only a limited number of CPU instructions can be used. In the originally published paper a small decoder is written with instructions represented by printable characters and the shellcode is decoded on the stack to be executed later. This paper, however describes a proof of concept project, which converts the source code of a full assembly program or shellcode to a new source code, whose compiled binary code contains only printable characters. The paper also presents new, printable character implementation of some CPU instructions
EVIL: Exploiting Software via Natural Language
Writing exploits for security assessment is a challenging task. The writer needs to master programming and obfuscation techniques to develop a successful exploit. To make the task easier, we propose an approach (EVIL) to automatically generate exploits in assembly/Python language from descriptions in natural language. The approach leverages Neural Machine Translation (NMT) techniques and a dataset that we developed for this work. We present an extensive experimental study to evaluate the feasibility of EVIL, using both automatic and manual analysis, and both at generating individual statements and entire exploits. The generated code achieved high accuracy in terms of syntactic and semantic correctness
Enhancing Robustness of AI Offensive Code Generators via Data Augmentation
In this work, we present a method to add perturbations to the code
descriptions, i.e., new inputs in natural language (NL) from well-intentioned
developers, in the context of security-oriented code, and analyze how and to
what extent perturbations affect the performance of AI offensive code
generators. Our experiments show that the performance of the code generators is
highly affected by perturbations in the NL descriptions. To enhance the
robustness of the code generators, we use the method to perform data
augmentation, i.e., to increase the variability and diversity of the training
data, proving its effectiveness against both perturbed and non-perturbed code
descriptions
Recommended from our members
Smashing the Stack with Hydra: The Many Heads of Advanced Polymorphic Shellcode
Recent work on the analysis of polymorphic shellcode engines suggests that modern obfuscation methods would soon eliminate the usefulness of signature-based network intrusion detection methods and supports growing views that the new generation of shellcode cannot be accurately and efficiently represented by the string signatures which current IDS and AV scanners rely upon. In this paper, we expand on this area of study by demonstrating never before seen concepts in advanced shellcode polymorphism with a proof-of-concept engine which we call Hydra. Hydra distinguishes itself by integrating an array of obfuscation techniques, such as recursive NOP sleds and multi-layer ciphering into one system while offering multiple improvements upon existing strategies. We also introduce never before seen attack methods such as byte-splicing statistical mimicry, safe-returns with forking shellcode and syscall-time-locking. In total, Hydra simultaneously attacks signature, statistical, disassembly, behavioral and emulation-based sensors, as well as frustrates offline forensics. This engine was developed to present an updated view of the frontier of modern polymorphic shellcode and provide an effective tool for evaluation of IDS systems, Cyber test ranges and other related security technologies
Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation
Neural Machine Translation (NMT) has reached a level of maturity to be
recognized as the premier method for the translation between different
languages and aroused interest in different research areas, including software
engineering. A key step to validate the robustness of the NMT models consists
in evaluating the performance of the models on adversarial inputs, i.e., inputs
obtained from the original ones by adding small amounts of perturbation.
However, when dealing with the specific task of the code generation (i.e., the
generation of code starting from a description in natural language), it has not
yet been defined an approach to validate the robustness of the NMT models. In
this work, we address the problem by identifying a set of perturbations and
metrics tailored for the robustness assessment of such models. We present a
preliminary experimental evaluation, showing what type of perturbations affect
the model the most and deriving useful insights for future directions.Comment: Paper accepted for publication in the proceedings of The 1st Intl.
Workshop on Natural Language-based Software Engineering (NLBSE) to be held
with ICSE 202
Automating the Correctness Assessment of AI-generated Code for Security Contexts
In this paper, we propose a fully automated method, named ACCA, to evaluate
the correctness of AI-generated code for security purposes. The method uses
symbolic execution to assess whether the AI-generated code behaves as a
reference implementation. We use ACCA to assess four state-of-the-art models
trained to generate security-oriented assembly code and compare the results of
the evaluation with different baseline solutions, including output similarity
metrics, widely used in the field, and the well-known ChatGPT, the AI-powered
language model developed by OpenAI. Our experiments show that our method
outperforms the baseline solutions and assesses the correctness of the
AI-generated code similar to the human-based evaluation, which is considered
the ground truth for the assessment in the field. Moreover, ACCA has a very
strong correlation with human evaluation (Pearson's correlation coefficient
r=0.84 on average). Finally, since it is a fully automated solution that does
not require any human intervention, the proposed method performs the assessment
of every code snippet in ~0.17s on average, which is definitely lower than the
average time required by human analysts to manually inspect the code, based on
our experience
Combatting Advanced Persistent Threat via Causality Inference and Program Analysis
Cyber attackers are becoming more and more sophisticated. In particular, Advanced Persistent Threat (APT) is a new class of attack that targets a specifc organization and compromises systems over a long time without being detected. Over the years, we have seen notorious examples of APTs including Stuxnet which disrupted Iranian nuclear centrifuges and data breaches affecting millions of users. Investigating APT is challenging as it occurs over an extended period of time and the attack process is highly sophisticated and stealthy. Also, preventing APTs is diffcult due to ever-expanding attack vectors.
In this dissertation, we present proposals for dealing with challenges in attack investigation. Specifcally, we present LDX which conducts precise counter-factual causality inference to determine dependencies between system calls (e.g., between input and output system calls) and allows investigators to determine the origin of an attack (e.g., receiving a spam email) and the propagation path of the attack, and assess the consequences of the attack. LDX is four times more accurate and two orders of magnitude faster than state-of-the-art taint analysis techniques. Moreover, we then present a practical model-based causality inference system, MCI, which achieves precise and accurate causality inference without requiring any modifcation or instrumentation in end-user systems.
Second, we show a general protection system against a wide spectrum of attack vectors and methods. Specifcally, we present A2C that prevents a wide range of attacks by randomizing inputs such that any malicious payloads contained in the inputs are corrupted. The protection provided by A2C is both general (e.g., against various attack vectors) and practical (7% runtime overhead)
A novel intrusion detection system for internet of things devices and data
As we enter the new age of the Internet of Things (IoT) and wearable gadgets, sensors, and embedded devices are extensively used for data aggregation and its transmission. The extent of the data processed by IoT networks makes it vulnerable to outside attacks. Therefore, it is important to design an intrusion detection system (IDS) that ensures the security, integrity, and confidentiality of IoT networks and their data. State-of-the-art IDSs have poor detection capabilities and incur high communication and device overhead, which is not ideal for IoT applications requiring secured and real-time processing. This research presents a teaching-learning-based optimization enabled intrusion detection system (TLBO-IDS) which effectively protects IoT networks from intrusion attacks and also ensures low overhead at the same time. The proposed TLBO-IDS can detect analysis attacks, fuzzing attacks, shellcode attacks, worms, denial of service (Dos) attacks, exploits, and backdoor intrusion attacks. TLBO-IDS is extensively tested and its performance is compared with state-of-the-art algorithms. In particular, TLBO-IDS outperforms the bat algorithm and genetic algorithm (GA)
- …