Search CORE

2,045 research outputs found

Fast location of similar code fragments using semantic 'juice'

Author: Arun Lakhotia
DALLA PREDA Mila
Giacobazzi Roberto
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Abstraction of semantics of blocks of a binary is termed as \u2018juice.\u2019Whereas the denotational semantics summarizes the computationperformed by a block, its juice presents a template of the relationshipsestablished by the block. BinJuice is a tool for extracting the\u2018juice\u2019 of a binary. It symbolically interprets individual blocks ofa binary to extract their semantics: the effect of the block on theprogram state. The semantics is generalized to juice by replacingregister names and literal constants by typed, logical variables. Thejuice also maintains algebraic constraints between the numeric variables.Thus, this juice forms a semantic template that is expected tobe identical regardless of code variations due to register renaming,memory address allocation, and constant replacement. The termsin juice can be canonically ordered using a linear order presented.Thus semantically equivalent (rather, similar) code fragments canbe identified by simple structural comparison of their juice, or bycomparing their hashes. While BinJuice cannot find all equivalentconstructs, for that would solve the Halting Problem, it does significantlyimprove the state-of-the-art in both the computational complexityas well as the set of equivalences it can establish. Preliminaryresults show that juice is effective in pairing code variantscreated by post-compile obfuscating transformations

Crossref

Catalogo dei prodotti della ricerca

Analyzing program dependences for malware detection.

Author: DALLA PREDA Mila
Giacobazzi Roberto
Mastroeni Isabella
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Metamorphic malware continuously modify their code, while preserving their functionality, in order to foil misuse detection. The key for defeating metamorphism relies in a semantic characterization of the embedding of the malware into the target program. Indeed, a behavioral model of program infection that does not relay on syntactic program features should be able to defeat metamorphism. Moreover, a general model of infection should be able to express dependences and interactions between the malicious codeand the target program. ANI is a general theory for the analysis of dependences of data in a program. We propose an high order theory for ANI, later called HOANI, that allows to study program dependencies. Our idea is then to formalize and study the malware detection problem in terms of HOANI

Crossref

Catalogo dei prodotti della ricerca

BinGold: Towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (SFGs)

Author: Alrabaee Saed
Debbabi Mourad
Wang Lingyu
Publication venue: The Author(s). Published by Elsevier Ltd.
Publication date: 07/08/2016
Field of study

AbstractBinary analysis is useful in many practical applications, such as the detection of malware or vulnerable software components. However, our survey of the literature shows that most existing binary analysis tools and frameworks rely on assumptions about specific compilers and compilation settings. It is well known that techniques such as refactoring and light obfuscation can significantly alter the structure of code, even for simple programs. Applying such techniques or changing the compiler and compilation settings can significantly affect the accuracy of available binary analysis tools, which severely limits their practicability, especially when applied to malware. To address these issues, we propose a novel technique that extracts the semantics of binary code in terms of both data and control flow. Our technique allows more robust binary analysis because the extracted semantics of the binary code is generally immune from light obfuscation, refactoring, and varying the compilers or compilation settings. Specifically, we apply data-flow analysis to extract the semantic flow of the registers as well as the semantic components of the control flow graph, which are then synthesized into a novel representation called the semantic flow graph (SFG). Subsequently, various properties, such as reflexive, symmetric, antisymmetric, and transitive relations, are extracted from the SFG and applied to binary analysis. We implement our system in a tool called BinGold and evaluate it against thirty binary code applications. Our evaluation shows that BinGold successfully determines the similarity between binaries, yielding results that are highly robust against light obfuscation and refactoring. In addition, we demonstrate the application of BinGold to two important binary analysis tasks: binary code authorship attribution, and the detection of clone components across program executables. The promising results suggest that BinGold can be used to enhance existing techniques, making them more robust and practical

Elsevier - Publisher Connector

Deep Joint Entity Disambiguation with Local Neural Attention

Author: Ganea Octavian-Eugen
Hofmann Thomas
Publication venue
Publication date: 31/07/2017
Field of study

We propose a novel deep learning model for joint document-level entity disambiguation, which leverages learned neural representations. Key components are entity embeddings, a neural attention mechanism over local context windows, and a differentiable joint inference stage for disambiguation. Our approach thereby combines benefits of deep learning with more traditional approaches such as graphical models and probabilistic mention-entity maps. Extensive experiments show that we are able to obtain competitive or state-of-the-art accuracy at moderate computational costs.Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP) 2017 long pape

arXiv.org e-Print Archive

Repository for Publications and Research Data

Names as Language and Capital.

Author: Boerrigter R.
Nijboer H.T.
Publication venue: Meertens Instituut, Amsterdam
Publication date: 01/01/2012
Field of study

KNAW Repository

International Migration, Integration and Social Cohesion online publications

Function similarity using family context

Author: Black Paul
Gondal Iqbal
Lakhotia Arun
Vamplew Peter
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Finding changed and similar functions between a pair of binaries is an important problem in malware attribution and for the identification of new malware capabilities. This paper presents a new technique called Function Similarity using Family Context (FSFC) for this problem. FSFC trains a Support Vector Machine (SVM) model using pairs of similar functions from two program variants. This method improves upon previous research called Cross Version Contextual Function Similarity (CVCFS) e epresenting a function using features extracted not just from the function itself, but also, from other functions with which it has a caller and callee relationship. We present the results of an initial experiment that shows that the use of additional features from the context of a function significantly decreases the false positive rate, obviating the need for a separate pass for cleaning false positives. The more surprising and unexpected finding is that the SVM model produced by FSFC can abstract function similarity features from one pair of program variants to find similar functions in an unrelated pair of program variants. If validated by a larger study, this new property leads to the possibility of creating generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra.This research was performed in the Internet Commerce Security Lab (ICSL), which is a joint venture with research partners Westpac, IBM, and Federation University Australia

Federation ResearchOnline

Techniques for the reverse engineering of banking malware

Author: Black Paul
Publication venue: Federation University of Australia
Publication date: 01/01/2020
Field of study

Malware attacks are a signiﬁcant and frequently reported problem, adversely aﬀecting the productivity of organisations and governments worldwide. The well-documented consequences of malware attacks include ﬁnancial loss, data loss, reputation damage, infrastructure damage, theft of intellectual property, compromise of commercial negotiations, and national security risks. Mitiga-tion activities involve a signiﬁcant amount of manual analysis. Therefore, there is a need for automated techniques for malware analysis to identify malicious behaviours. Research into automated techniques for malware analysis covers a wide range of activities. This thesis consists of a series of studies: an anal-ysis of banking malware families and their common behaviours, an emulated command and control environment for dynamic malware analysis, a technique to identify similar malware functions, and a technique for the detection of ransomware. An analysis of the nature of banking malware, its major malware families, behaviours, variants, and inter-relationships are provided in this thesis. In doing this, this research takes a broad view of malware analysis, starting with the implementation of the malicious behaviours through to detailed analysis using machine learning. The broad approach taken in this thesis diﬀers from some other studies that approach malware research in a more abstract sense. A disadvantage of approaching malware research without domain knowledge, is that important methodology questions may not be considered. Large datasets of historical malware samples are available for countermea-sures research. However, due to the age of these samples, the original malware infrastructure is no longer available, often restricting malware operations to initialisation functions only. To address this absence, an emulated command and control environment is provided. This emulated environment provides full control of the malware, enabling the capabilities of the original in-the-wild operation, while enabling feature extraction for research purposes. A major focus of this thesis has been the development of a machine learn-ing function similarity method with a novel feature encoding that increases feature strength. This research develops techniques to demonstrate that the machine learning model trained on similarity features from one program can ﬁnd similar functions in another, unrelated program. This ﬁnding can lead to the development of generic similar function classiﬁers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra. Further, this research examines the use of API call features for the identi-ﬁcation of ransomware and shows that a failure to consider malware analysis domain knowledge can lead to weaknesses in experimental design. In this case, we show that existing research has diﬃculty in discriminating between ransomware and benign cryptographic software. This thesis by publication, has developed techniques to advance the disci-pline of malware reverse engineering, in order to minimize harm due to cyber-attacks on critical infrastructure, government institutions, and industry.Doctor of Philosoph

Federation ResearchOnline

Program variation for software security

Author: Coppens Bart
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2013
Field of study

Ghent University Academic Bibliography