1,085 research outputs found
SAFE: Self-Attentive Function Embeddings for Binary Similarity
The binary similarity problem consists in determining if two functions are
similar by only considering their compiled form. Advanced techniques for binary
similarity recently gained momentum as they can be applied in several fields,
such as copyright disputes, malware analysis, vulnerability detection, etc.,
and thus have an immediate practical impact. Current solutions compare
functions by first transforming their binary code in multi-dimensional vector
representations (embeddings), and then comparing vectors through simple and
efficient geometric operations. However, embeddings are usually derived from
binary code using manual feature extraction, that may fail in considering
important function characteristics, or may consider features that are not
important for the binary similarity problem. In this paper we propose SAFE, a
novel architecture for the embedding of functions based on a self-attentive
neural network. SAFE works directly on disassembled binary functions, does not
require manual feature extraction, is computationally more efficient than
existing solutions (i.e., it does not incur in the computational overhead of
building or manipulating control flow graphs), and is more general as it works
on stripped binaries and on multiple architectures. We report the results from
a quantitative and qualitative analysis that show how SAFE provides a
noticeable performance improvement with respect to previous solutions.
Furthermore, we show how clusters of our embedding vectors are closely related
to the semantic of the implemented algorithms, paving the way for further
interesting applications (e.g. semantic-based binary function search).Comment: Published in International Conference on Detection of Intrusions and
Malware, and Vulnerability Assessment (DIMVA) 201
Improved Binary Similarity Measures for Software Modularization
Various binary similarity measures have been employed in clustering approaches to make homogeneous groups of similar entities in the data. These similarity measures are mostly based only on the presence and absence of features. Binary similarity measures have also been explored with different clustering approaches (e.g., agglomerative hierarchical clustering) for software modularization to make the software systems understandable and manageable. Each similarity measure has its own strengths and weaknesses that result in improving and deteriorating the clustering results, respectively. This paper highlights the strengths of some well-known existing binary similarity measures for software modularization. Furthermore, based on these existing similarity measures, this paper introduces the improved new binary similarity measures. Proofs of the correctness with illustration and a series of experiments are presented to evaluate the effectiveness of our new binary similarity measures
Function Representations for Binary Similarity
The binary similarity problem consists in determining if two functions are similar considering only their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as copyright disputes, malware analysis, vulnerability detection, etc. In this paper we describe SAFE, a novel architecture for function representation based on a self-attentive neural network. SAFE works directly on disassembled binary functions, does not require manual feature extraction, is computationally more efficient than existing solutions, and is more general as it works on stripped binaries and on multiple architectures. Results from our experimental evaluation show how SAFE provides a performance improvement with respect to previoussolutions. Furthermore, we show how SAFE can be used in widely different use cases, thus providing a general solution for several application scenarios
Adversarial Attacks against Binary Similarity Systems
In recent years, binary analysis gained traction as a fundamental approach to
inspect software and guarantee its security. Due to the exponential increase of
devices running software, much research is now moving towards new autonomous
solutions based on deep learning models, as they have been showing
state-of-the-art performances in solving binary analysis problems. One of the
hot topics in this context is binary similarity, which consists in determining
if two functions in assembly code are compiled from the same source code.
However, it is unclear how deep learning models for binary similarity behave in
an adversarial context. In this paper, we study the resilience of binary
similarity models against adversarial examples, showing that they are
susceptible to both targeted and untargeted attacks (w.r.t. similarity goals)
performed by black-box and white-box attackers. In more detail, we extensively
test three current state-of-the-art solutions for binary similarity against two
black-box greedy attacks, including a new technique that we call Spatial
Greedy, and one white-box attack in which we repurpose a gradient-guided
strategy used in attacks to image classifiers
The Power of Asymmetry in Binary Hashing
When approximating binary similarity using the hamming distance between short
binary hashes, we show that even if the similarity is symmetric, we can have
shorter and more accurate hashes by using two distinct code maps. I.e. by
approximating the similarity between and as the hamming distance
between and , for two distinct binary codes , rather than as
the hamming distance between and .Comment: Accepted to NIPS 2013, 9 pages, 5 figure
Binary indices at various densities
Binary similarity indices are numerical analysis methods used to compare data involving two binary vectors (lists). The scope of this project involved comparing 54 binary similarity indices methods in relationship to binary vector density using the R programming language. Matrices were created of various vector data. The matrices were then scrambled to represent random data. Finally, the data was analyzed and plotted. Vector density variation can result in large differences - in both rate of change relative to density and magnitude. Awareness of these differences is important when selecting an analysis method and understanding the effects of changing vector density on analysis of results
Discrimination Ability of Assessors in Check-All-That-Apply Tests: Method and Product Development
Binary similarity measures have been used in several research fields, but their application in sensory data analysis is limited as of yet. Since check-all-that-apply (CATA) data consist of binary answers from the participants, binary similarity measures seem to be a natural choice for their evaluation. This work aims to define the discrimination ability of CATA participants by calculating the consensus values of 44 binary similarity measures. The proposed methodology consists of three steps: (i) calculating the binary similarity values of the assessors, sample pair-wise; (ii) clustering participants into good and poor discriminators based on their binary similarity values; (iii) performing correspondence analysis on the CATA data of the two clusters. Results of three case studies are presented, highlighting that a simple clustering based on the computed binary similarity measures results in higher quality correspondence analysis with more significant attributes, as well as better sample discrimination (even according to overall liking)
- …