2,542 research outputs found

    SAFE: Self-Attentive Function Embeddings for Binary Similarity

    Get PDF
    The binary similarity problem consists in determining if two functions are similar by only considering their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as copyright disputes, malware analysis, vulnerability detection, etc., and thus have an immediate practical impact. Current solutions compare functions by first transforming their binary code in multi-dimensional vector representations (embeddings), and then comparing vectors through simple and efficient geometric operations. However, embeddings are usually derived from binary code using manual feature extraction, that may fail in considering important function characteristics, or may consider features that are not important for the binary similarity problem. In this paper we propose SAFE, a novel architecture for the embedding of functions based on a self-attentive neural network. SAFE works directly on disassembled binary functions, does not require manual feature extraction, is computationally more efficient than existing solutions (i.e., it does not incur in the computational overhead of building or manipulating control flow graphs), and is more general as it works on stripped binaries and on multiple architectures. We report the results from a quantitative and qualitative analysis that show how SAFE provides a noticeable performance improvement with respect to previous solutions. Furthermore, we show how clusters of our embedding vectors are closely related to the semantic of the implemented algorithms, paving the way for further interesting applications (e.g. semantic-based binary function search).Comment: Published in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA) 201

    A new mass-ratio for the X-ray Binary X2127+119 in M15?

    Full text link
    The luminous low-mass X-ray binary X2127+119 in the core of the globular cluster M15 (NGC 7078), which has an orbital period of 17 hours, has long been assumed to contain a donor star evolving off the main sequence, with a mass of 0.8 solar masses (the main-sequence turn-off mass for M15). We present orbital-phase-resolved spectroscopy of X2127+119 in the H-alpha and He I 6678 spectral region, obtained with the Hubble Space Telescope. We show that these data are incompatible with the assumed masses of X2127+119's component stars. The continuum eclipse is too shallow, indicating that much of the accretion disc remains visible during eclipse, and therefore that the size of the donor star relative to the disc is much smaller in this high-inclination system than the assumed mass-ratio allows. Furthermore, the flux of X2127+119's He I 6678 emission, which has a velocity that implies an association with the stream-disc impact region, remains unchanged through eclipse, implying that material from the impact region is always visible. This should not be possible if the previously-assumed mass ratio is correct. In addition, we do not detect any spectral features from the donor star, which is unexpected for a 0.8 solar-mass sub-giant in a system with a 17-hour period.Comment: 6 pages, 4 figures, accepted by A&

    Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

    Full text link
    Binary code analysis allows analyzing binary code without having access to the corresponding source code. A binary, after disassembly, is expressed in an assembly language. This inspires us to approach binary analysis by leveraging ideas and techniques from Natural Language Processing (NLP), a rich area focused on processing text of various natural languages. We notice that binary code analysis and NLP share a lot of analogical topics, such as semantics extraction, summarization, and classification. This work utilizes these ideas to address two important code similarity comparison problems. (I) Given a pair of basic blocks for different instruction set architectures (ISAs), determining whether their semantics is similar or not; and (II) given a piece of code of interest, determining if it is contained in another piece of assembly code for a different ISA. The solutions to these two problems have many applications, such as cross-architecture vulnerability discovery and code plagiarism detection. We implement a prototype system INNEREYE and perform a comprehensive evaluation. A comparison between our approach and existing approaches to Problem I shows that our system outperforms them in terms of accuracy, efficiency and scalability. And the case studies utilizing the system demonstrate that our solution to Problem II is effective. Moreover, this research showcases how to apply ideas and techniques from NLP to large-scale binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium 201

    MicroWalk: A Framework for Finding Side Channels in Binaries

    Full text link
    Microarchitectural side channels expose unprotected software to information leakage attacks where a software adversary is able to track runtime behavior of a benign process and steal secrets such as cryptographic keys. As suggested by incremental software patches for the RSA algorithm against variants of side-channel attacks within different versions of cryptographic libraries, protecting security-critical algorithms against side channels is an intricate task. Software protections avoid leakages by operating in constant time with a uniform resource usage pattern independent of the processed secret. In this respect, automated testing and verification of software binaries for leakage-free behavior is of importance, particularly when the source code is not available. In this work, we propose a novel technique based on Dynamic Binary Instrumentation and Mutual Information Analysis to efficiently locate and quantify memory based and control-flow based microarchitectural leakages. We develop a software framework named \tool~for side-channel analysis of binaries which can be extended to support new classes of leakage. For the first time, by utilizing \tool, we perform rigorous leakage analysis of two widely-used closed-source cryptographic libraries: \emph{Intel IPP} and \emph{Microsoft CNG}. We analyze 1515 different cryptographic implementations consisting of 112112 million instructions in about 105105 minutes of CPU time. By locating previously unknown leakages in hardened implementations, our results suggest that \tool~can efficiently find microarchitectural leakages in software binaries
    • …
    corecore