2,542 research outputs found
SAFE: Self-Attentive Function Embeddings for Binary Similarity
The binary similarity problem consists in determining if two functions are
similar by only considering their compiled form. Advanced techniques for binary
similarity recently gained momentum as they can be applied in several fields,
such as copyright disputes, malware analysis, vulnerability detection, etc.,
and thus have an immediate practical impact. Current solutions compare
functions by first transforming their binary code in multi-dimensional vector
representations (embeddings), and then comparing vectors through simple and
efficient geometric operations. However, embeddings are usually derived from
binary code using manual feature extraction, that may fail in considering
important function characteristics, or may consider features that are not
important for the binary similarity problem. In this paper we propose SAFE, a
novel architecture for the embedding of functions based on a self-attentive
neural network. SAFE works directly on disassembled binary functions, does not
require manual feature extraction, is computationally more efficient than
existing solutions (i.e., it does not incur in the computational overhead of
building or manipulating control flow graphs), and is more general as it works
on stripped binaries and on multiple architectures. We report the results from
a quantitative and qualitative analysis that show how SAFE provides a
noticeable performance improvement with respect to previous solutions.
Furthermore, we show how clusters of our embedding vectors are closely related
to the semantic of the implemented algorithms, paving the way for further
interesting applications (e.g. semantic-based binary function search).Comment: Published in International Conference on Detection of Intrusions and
Malware, and Vulnerability Assessment (DIMVA) 201
A new mass-ratio for the X-ray Binary X2127+119 in M15?
The luminous low-mass X-ray binary X2127+119 in the core of the globular
cluster M15 (NGC 7078), which has an orbital period of 17 hours, has long been
assumed to contain a donor star evolving off the main sequence, with a mass of
0.8 solar masses (the main-sequence turn-off mass for M15). We present
orbital-phase-resolved spectroscopy of X2127+119 in the H-alpha and He I 6678
spectral region, obtained with the Hubble Space Telescope. We show that these
data are incompatible with the assumed masses of X2127+119's component stars.
The continuum eclipse is too shallow, indicating that much of the accretion
disc remains visible during eclipse, and therefore that the size of the donor
star relative to the disc is much smaller in this high-inclination system than
the assumed mass-ratio allows. Furthermore, the flux of X2127+119's He I 6678
emission, which has a velocity that implies an association with the stream-disc
impact region, remains unchanged through eclipse, implying that material from
the impact region is always visible. This should not be possible if the
previously-assumed mass ratio is correct. In addition, we do not detect any
spectral features from the donor star, which is unexpected for a 0.8 solar-mass
sub-giant in a system with a 17-hour period.Comment: 6 pages, 4 figures, accepted by A&
Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs
Binary code analysis allows analyzing binary code without having access to
the corresponding source code. A binary, after disassembly, is expressed in an
assembly language. This inspires us to approach binary analysis by leveraging
ideas and techniques from Natural Language Processing (NLP), a rich area
focused on processing text of various natural languages. We notice that binary
code analysis and NLP share a lot of analogical topics, such as semantics
extraction, summarization, and classification. This work utilizes these ideas
to address two important code similarity comparison problems. (I) Given a pair
of basic blocks for different instruction set architectures (ISAs), determining
whether their semantics is similar or not; and (II) given a piece of code of
interest, determining if it is contained in another piece of assembly code for
a different ISA. The solutions to these two problems have many applications,
such as cross-architecture vulnerability discovery and code plagiarism
detection. We implement a prototype system INNEREYE and perform a comprehensive
evaluation. A comparison between our approach and existing approaches to
Problem I shows that our system outperforms them in terms of accuracy,
efficiency and scalability. And the case studies utilizing the system
demonstrate that our solution to Problem II is effective. Moreover, this
research showcases how to apply ideas and techniques from NLP to large-scale
binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium
201
MicroWalk: A Framework for Finding Side Channels in Binaries
Microarchitectural side channels expose unprotected software to information
leakage attacks where a software adversary is able to track runtime behavior of
a benign process and steal secrets such as cryptographic keys. As suggested by
incremental software patches for the RSA algorithm against variants of
side-channel attacks within different versions of cryptographic libraries,
protecting security-critical algorithms against side channels is an intricate
task. Software protections avoid leakages by operating in constant time with a
uniform resource usage pattern independent of the processed secret. In this
respect, automated testing and verification of software binaries for
leakage-free behavior is of importance, particularly when the source code is
not available. In this work, we propose a novel technique based on Dynamic
Binary Instrumentation and Mutual Information Analysis to efficiently locate
and quantify memory based and control-flow based microarchitectural leakages.
We develop a software framework named \tool~for side-channel analysis of
binaries which can be extended to support new classes of leakage. For the first
time, by utilizing \tool, we perform rigorous leakage analysis of two
widely-used closed-source cryptographic libraries: \emph{Intel IPP} and
\emph{Microsoft CNG}. We analyze different cryptographic implementations
consisting of million instructions in about minutes of CPU time. By
locating previously unknown leakages in hardened implementations, our results
suggest that \tool~can efficiently find microarchitectural leakages in software
binaries
- …