89,576 research outputs found

    Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

    Full text link
    Binary code analysis allows analyzing binary code without having access to the corresponding source code. A binary, after disassembly, is expressed in an assembly language. This inspires us to approach binary analysis by leveraging ideas and techniques from Natural Language Processing (NLP), a rich area focused on processing text of various natural languages. We notice that binary code analysis and NLP share a lot of analogical topics, such as semantics extraction, summarization, and classification. This work utilizes these ideas to address two important code similarity comparison problems. (I) Given a pair of basic blocks for different instruction set architectures (ISAs), determining whether their semantics is similar or not; and (II) given a piece of code of interest, determining if it is contained in another piece of assembly code for a different ISA. The solutions to these two problems have many applications, such as cross-architecture vulnerability discovery and code plagiarism detection. We implement a prototype system INNEREYE and perform a comprehensive evaluation. A comparison between our approach and existing approaches to Problem I shows that our system outperforms them in terms of accuracy, efficiency and scalability. And the case studies utilizing the system demonstrate that our solution to Problem II is effective. Moreover, this research showcases how to apply ideas and techniques from NLP to large-scale binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium 201

    Debugging Memory Issues In Embedded Linux: A Case Study

    Full text link
    Debugging denotes the process of detecting root causes of unexpected observable behaviors in programs, such as a program crash, an unexpected output value being produced or an assertion violation. Debugging of program errors is a difficult task and often takes a significant amount of time in the software development life cycle. In the context of embedded software, the probability of bugs is quite high. Due to requirements of low code size and less resource consumption, embedded softwares typically do away with a lot of sanity checks during development time. This leads to high chance of errors being uncovered in the production code at run time. In this paper we propose a methodology for debugging errors in BusyBox, a de-facto standard for Linux in embedded systems. Our methodology works on top of Valgrind, a popular memory error detector and Daikon, an invariant analyzer. We have experimented with two published errors in BusyBox and report our findings in this paper.Comment: In proceedings of IEEE TechSym 2011, 14-16 January, 2011, IIT kharagpur, Indi

    Automated Dynamic Firmware Analysis at Scale: A Case Study on Embedded Web Interfaces

    Full text link
    Embedded devices are becoming more widespread, interconnected, and web-enabled than ever. However, recent studies showed that these devices are far from being secure. Moreover, many embedded systems rely on web interfaces for user interaction or administration. Unfortunately, web security is known to be difficult, and therefore the web interfaces of embedded systems represent a considerable attack surface. In this paper, we present the first fully automated framework that applies dynamic firmware analysis techniques to achieve, in a scalable manner, automated vulnerability discovery within embedded firmware images. We apply our framework to study the security of embedded web interfaces running in Commercial Off-The-Shelf (COTS) embedded devices, such as routers, DSL/cable modems, VoIP phones, IP/CCTV cameras. We introduce a methodology and implement a scalable framework for discovery of vulnerabilities in embedded web interfaces regardless of the vendor, device, or architecture. To achieve this goal, our framework performs full system emulation to achieve the execution of firmware images in a software-only environment, i.e., without involving any physical embedded devices. Then, we analyze the web interfaces within the firmware using both static and dynamic tools. We also present some interesting case-studies, and discuss the main challenges associated with the dynamic analysis of firmware images and their web interfaces and network services. The observations we make in this paper shed light on an important aspect of embedded devices which was not previously studied at a large scale. We validate our framework by testing it on 1925 firmware images from 54 different vendors. We discover important vulnerabilities in 185 firmware images, affecting nearly a quarter of vendors in our dataset. These experimental results demonstrate the effectiveness of our approach

    Certifying cost annotations in compilers

    Get PDF
    We discuss the problem of building a compiler which can lift in a provably correct way pieces of information on the execution cost of the object code to cost annotations on the source code. To this end, we need a clear and flexible picture of: (i) the meaning of cost annotations, (ii) the method to prove them sound and precise, and (iii) the way such proofs can be composed. We propose a so-called labelling approach to these three questions. As a first step, we examine its application to a toy compiler. This formal study suggests that the labelling approach has good compositionality and scalability properties. In order to provide further evidence for this claim, we report our successful experience in implementing and testing the labelling approach on top of a prototype compiler written in OCAML for (a large fragment of) the C language
    • …
    corecore