89,576 research outputs found
Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs
Binary code analysis allows analyzing binary code without having access to
the corresponding source code. A binary, after disassembly, is expressed in an
assembly language. This inspires us to approach binary analysis by leveraging
ideas and techniques from Natural Language Processing (NLP), a rich area
focused on processing text of various natural languages. We notice that binary
code analysis and NLP share a lot of analogical topics, such as semantics
extraction, summarization, and classification. This work utilizes these ideas
to address two important code similarity comparison problems. (I) Given a pair
of basic blocks for different instruction set architectures (ISAs), determining
whether their semantics is similar or not; and (II) given a piece of code of
interest, determining if it is contained in another piece of assembly code for
a different ISA. The solutions to these two problems have many applications,
such as cross-architecture vulnerability discovery and code plagiarism
detection. We implement a prototype system INNEREYE and perform a comprehensive
evaluation. A comparison between our approach and existing approaches to
Problem I shows that our system outperforms them in terms of accuracy,
efficiency and scalability. And the case studies utilizing the system
demonstrate that our solution to Problem II is effective. Moreover, this
research showcases how to apply ideas and techniques from NLP to large-scale
binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium
201
Debugging Memory Issues In Embedded Linux: A Case Study
Debugging denotes the process of detecting root causes of unexpected
observable behaviors in programs, such as a program crash, an unexpected output
value being produced or an assertion violation. Debugging of program errors is
a difficult task and often takes a significant amount of time in the software
development life cycle. In the context of embedded software, the probability of
bugs is quite high. Due to requirements of low code size and less resource
consumption, embedded softwares typically do away with a lot of sanity checks
during development time. This leads to high chance of errors being uncovered in
the production code at run time. In this paper we propose a methodology for
debugging errors in BusyBox, a de-facto standard for Linux in embedded systems.
Our methodology works on top of Valgrind, a popular memory error detector and
Daikon, an invariant analyzer. We have experimented with two published errors
in BusyBox and report our findings in this paper.Comment: In proceedings of IEEE TechSym 2011, 14-16 January, 2011, IIT
kharagpur, Indi
Automated Dynamic Firmware Analysis at Scale: A Case Study on Embedded Web Interfaces
Embedded devices are becoming more widespread, interconnected, and
web-enabled than ever. However, recent studies showed that these devices are
far from being secure. Moreover, many embedded systems rely on web interfaces
for user interaction or administration. Unfortunately, web security is known to
be difficult, and therefore the web interfaces of embedded systems represent a
considerable attack surface.
In this paper, we present the first fully automated framework that applies
dynamic firmware analysis techniques to achieve, in a scalable manner,
automated vulnerability discovery within embedded firmware images. We apply our
framework to study the security of embedded web interfaces running in
Commercial Off-The-Shelf (COTS) embedded devices, such as routers, DSL/cable
modems, VoIP phones, IP/CCTV cameras. We introduce a methodology and implement
a scalable framework for discovery of vulnerabilities in embedded web
interfaces regardless of the vendor, device, or architecture. To achieve this
goal, our framework performs full system emulation to achieve the execution of
firmware images in a software-only environment, i.e., without involving any
physical embedded devices. Then, we analyze the web interfaces within the
firmware using both static and dynamic tools. We also present some interesting
case-studies, and discuss the main challenges associated with the dynamic
analysis of firmware images and their web interfaces and network services. The
observations we make in this paper shed light on an important aspect of
embedded devices which was not previously studied at a large scale.
We validate our framework by testing it on 1925 firmware images from 54
different vendors. We discover important vulnerabilities in 185 firmware
images, affecting nearly a quarter of vendors in our dataset. These
experimental results demonstrate the effectiveness of our approach
Certifying cost annotations in compilers
We discuss the problem of building a compiler which can lift in a provably
correct way pieces of information on the execution cost of the object code to
cost annotations on the source code. To this end, we need a clear and flexible
picture of: (i) the meaning of cost annotations, (ii) the method to prove them
sound and precise, and (iii) the way such proofs can be composed. We propose a
so-called labelling approach to these three questions. As a first step, we
examine its application to a toy compiler. This formal study suggests that the
labelling approach has good compositionality and scalability properties. In
order to provide further evidence for this claim, we report our successful
experience in implementing and testing the labelling approach on top of a
prototype compiler written in OCAML for (a large fragment of) the C language
- …