Search CORE

1,085 research outputs found

SAFE: Self-Attentive Function Embeddings for Binary Similarity

Author: Baldoni Roberto
Di Luna Giuseppe Antonio
Massarelli Luca
Petroni Fabio
Querzoni Leonardo
Publication venue
Publication date: 01/01/2019
Field of study

The binary similarity problem consists in determining if two functions are similar by only considering their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as copyright disputes, malware analysis, vulnerability detection, etc., and thus have an immediate practical impact. Current solutions compare functions by first transforming their binary code in multi-dimensional vector representations (embeddings), and then comparing vectors through simple and efficient geometric operations. However, embeddings are usually derived from binary code using manual feature extraction, that may fail in considering important function characteristics, or may consider features that are not important for the binary similarity problem. In this paper we propose SAFE, a novel architecture for the embedding of functions based on a self-attentive neural network. SAFE works directly on disassembled binary functions, does not require manual feature extraction, is computationally more efficient than existing solutions (i.e., it does not incur in the computational overhead of building or manipulating control flow graphs), and is more general as it works on stripped binaries and on multiple architectures. We report the results from a quantitative and qualitative analysis that show how SAFE provides a noticeable performance improvement with respect to previous solutions. Furthermore, we show how clusters of our embedding vectors are closely related to the semantic of the implemented algorithms, paving the way for further interesting applications (e.g. semantic-based binary function search).Comment: Published in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA) 201

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Improved Binary Similarity Measures for Software Modularization

Author: Deris Mustafa Bin Mat
Li Jingpeng
Maqbool Onaiza
Naseem Rashid
Shah Habib
Shahzad Sarah
Publication venue: 'Zhejiang University Press'
Publication date: 01/08/2017
Field of study

Various binary similarity measures have been employed in clustering approaches to make homogeneous groups of similar entities in the data. These similarity measures are mostly based only on the presence and absence of features. Binary similarity measures have also been explored with different clustering approaches (e.g., agglomerative hierarchical clustering) for software modularization to make the software systems understandable and manageable. Each similarity measure has its own strengths and weaknesses that result in improving and deteriorating the clustering results, respectively. This paper highlights the strengths of some well-known existing binary similarity measures for software modularization. Furthermore, based on these existing similarity measures, this paper introduces the improved new binary similarity measures. Proofs of the correctness with illustration and a series of experiments are presented to evaluate the effectiveness of our new binary similarity measures

Crossref

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

Function Representations for Binary Similarity

Author: Baldoni Roberto
Di Luna Giuseppe Antonio
Massarelli Luca
Petroni Fabio
Querzoni Leonardo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

The binary similarity problem consists in determining if two functions are similar considering only their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as copyright disputes, malware analysis, vulnerability detection, etc. In this paper we describe SAFE, a novel architecture for function representation based on a self-attentive neural network. SAFE works directly on disassembled binary functions, does not require manual feature extraction, is computationally more efficient than existing solutions, and is more general as it works on stripped binaries and on multiple architectures. Results from our experimental evaluation show how SAFE provides a performance improvement with respect to previoussolutions. Furthermore, we show how SAFE can be used in widely different use cases, thus providing a general solution for several application scenarios

Archivio della ricerca- Università di Roma La Sapienza

Adversarial Attacks against Binary Similarity Systems

Author: Capozzi Gianluca
D'Elia Daniele Cono
Di Luna Giuseppe Antonio
Querzoni Leonardo
Publication venue
Publication date: 03/11/2023
Field of study

In recent years, binary analysis gained traction as a fundamental approach to inspect software and guarantee its security. Due to the exponential increase of devices running software, much research is now moving towards new autonomous solutions based on deep learning models, as they have been showing state-of-the-art performances in solving binary analysis problems. One of the hot topics in this context is binary similarity, which consists in determining if two functions in assembly code are compiled from the same source code. However, it is unclear how deep learning models for binary similarity behave in an adversarial context. In this paper, we study the resilience of binary similarity models against adversarial examples, showing that they are susceptible to both targeted and untargeted attacks (w.r.t. similarity goals) performed by black-box and white-box attackers. In more detail, we extensively test three current state-of-the-art solutions for binary similarity against two black-box greedy attacks, including a new technique that we call Spatial Greedy, and one white-box attack in which we repurpose a gradient-guided strategy used in attacks to image classifiers

arXiv.org e-Print Archive

The Power of Asymmetry in Binary Hashing

Author: Makarychev Yury
Neyshabur Behnam
Salakhutdinov Ruslan
Srebro Nathan
Yadollahpour Payman
Publication venue
Publication date: 29/11/2013
Field of study

When approximating binary similarity using the hamming distance between short binary hashes, we show that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps. I.e. by approximating the similarity between

x

and

x'

as the hamming distance between

f(x)

and

g(x')

, for two distinct binary codes

f,g

, rather than as the hamming distance between

f(x)

and

f(x')

.Comment: Accepted to NIPS 2013, 9 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Binary indices at various densities

Author: Price Joseph A., III
Turner Skylar
Publication venue: Oklahoma State University Center for Health Sciences
Publication date: 22/02/2019
Field of study

Binary similarity indices are numerical analysis methods used to compare data involving two binary vectors (lists). The scope of this project involved comparing 54 binary similarity indices methods in relationship to binary vector density using the R programming language. Matrices were created of various vector data. The matrices were then scrambled to represent random data. Finally, the data was analyzed and plotted. Vector density variation can result in large differences - in both rate of change relative to density and magnitude. Awareness of these differences is important when selecting an analysis method and understanding the effects of changing vector density on analysis of results

SHAREOK repository

Discrimination Ability of Assessors in Check-All-That-Apply Tests: Method and Product Development

Author: Bajusz Dávid
Biró Barbara
Gere Attila
Rácz Anita
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

Binary similarity measures have been used in several research fields, but their application in sensory data analysis is limited as of yet. Since check-all-that-apply (CATA) data consist of binary answers from the participants, binary similarity measures seem to be a natural choice for their evaluation. This work aims to define the discrimination ability of CATA participants by calculating the consensus values of 44 binary similarity measures. The proposed methodology consists of three steps: (i) calculating the binary similarity values of the assessors, sample pair-wise; (ii) clustering participants into good and poor discriminators based on their binary similarity values; (iii) performing correspondence analysis on the CATA data of the two clusters. Results of three case studies are presented, highlighting that a simple clustering based on the computed binary similarity measures results in higher quality correspondence analysis with more significant attributes, as well as better sample discrimination (even according to overall liking)

Multidisciplinary Digital Publishing Institute

Repository of the Academy's Library