2,050 research outputs found

    Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

    Full text link
    Binary code analysis allows analyzing binary code without having access to the corresponding source code. A binary, after disassembly, is expressed in an assembly language. This inspires us to approach binary analysis by leveraging ideas and techniques from Natural Language Processing (NLP), a rich area focused on processing text of various natural languages. We notice that binary code analysis and NLP share a lot of analogical topics, such as semantics extraction, summarization, and classification. This work utilizes these ideas to address two important code similarity comparison problems. (I) Given a pair of basic blocks for different instruction set architectures (ISAs), determining whether their semantics is similar or not; and (II) given a piece of code of interest, determining if it is contained in another piece of assembly code for a different ISA. The solutions to these two problems have many applications, such as cross-architecture vulnerability discovery and code plagiarism detection. We implement a prototype system INNEREYE and perform a comprehensive evaluation. A comparison between our approach and existing approaches to Problem I shows that our system outperforms them in terms of accuracy, efficiency and scalability. And the case studies utilizing the system demonstrate that our solution to Problem II is effective. Moreover, this research showcases how to apply ideas and techniques from NLP to large-scale binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium 201

    Malware Detection Using Dynamic Analysis

    Get PDF
    In this research, we explore the field of dynamic analysis which has shown promis- ing results in the field of malware detection. Here, we extract dynamic software birth- marks during malware execution and apply machine learning based detection tech- niques to the resulting feature set. Specifically, we consider Hidden Markov Models and Profile Hidden Markov Models. To determine the effectiveness of this dynamic analysis approach, we compare our detection results to the results obtained by using static analysis. We show that in some cases, significantly stronger results can be obtained using our dynamic approach

    Distributed detection of anomalous internet sessions

    Get PDF
    Financial service providers are moving many services online reducing their costs and facilitating customers¿ interaction. Unfortunately criminals have quickly found several ways to avoid most security measures applied to browsers and banking sites. The use of highly dangerous malware has become the most significant threat and traditional signature-detection methods are nowadays easily circumvented due to the amount of new samples and the use of sophisticated evasion techniques. Antivirus vendors and malware experts are pushed to seek for new methodologies to improve the identification and understanding of malicious applications behavior and their targets. Financial institutions are now playing an important role by deploying their own detection tools against malware that specifically affect their customers. However, most detection approaches tend to base on sequence of bytes in order to create new signatures. This thesis approach is based on new sources of information: the web logs generated from each banking session, the normal browser execution and customers mobile phone behavior. The thesis can be divided in four parts: The first part involves the introduction of the thesis along with the presentation of the problems and the methodology used to perform the experimentation. The second part describes our contributions to the research, which are based in two areas: *Server side: Weblogs analysis. We first focus on the real time detection of anomalies through the analysis of web logs and the challenges introduced due to the amount of information generated daily. We propose different techniques to detect multiple threats by deploying per user and global models in a graph based environment that will allow increase performance of a set of highly related data. *Customer side: Browser analysis. We deal with the detection of malicious behaviors from the other side of a banking session: the browser. Malware samples must interact with the browser in order to retrieve or add information. Such relation interferes with the normal behavior of the browser. We propose to develop models capable of detecting unusual patterns of function calls in order to detect if a given sample is targeting an specific financial entity. In the third part, we propose to adapt our approaches to mobile phones and Critical Infrastructures environments. The latest online banking attack techniques circumvent protection schemes such password verification systems send via SMS. Man in the Mobile attacks are capable of compromising mobile devices and gaining access to SMS traffic. Once the Transaction Authentication Number is obtained, criminals are free to make fraudulent transfers. We propose to model the behavior of the applications related messaging services to automatically detect suspicious actions. Real time detection of unwanted SMS forwarding can improve the effectiveness of second channel authentication and build on detection techniques applied to browsers and Web servers. Finally, we describe possible adaptations of our techniques to another area outside the scope of online banking: critical infrastructures, an environment with similar features since the applications involved can also be profiled. Just as financial entities, critical infrastructures are experiencing an increase in the number of cyber attacks, but the sophistication of the malware samples utilized forces to new detection approaches. The aim of the last proposal is to demonstrate the validity of out approach in different scenarios. Conclusions. Finally, we conclude with a summary of our findings and the directions for future work

    Detection of app collusion potential using logic programming

    Get PDF
    Mobile devices pose a particular security risk because they hold personal details (accounts, locations, contacts, photos) and have capabilities potentially exploitable for eavesdropping (cameras/microphone, wireless connections). The Android operating system is designed with a number of built-in security features such as application sandboxing and permission-based access control. Unfortunately, these restrictions can be bypassed, without the user noticing, by colluding apps whose combined permissions allow them to carry out attacks that neither app is able to execute by itself. While the possibility of app collusion was first warned in 2011, it has been unclear if collusion is used by malware in the wild due to a lack of suitable detection methods and tools. This paper describes how we found the first collusion in the wild. We also present a strategy for detecting collusions and its implementation in Prolog that allowed us to make this discovery. Our detection strategy is grounded in concise definitions of collusion and the concept of ASR (Access-Send-Receive) signatures. The methodology is supported by statistical evidence. Our approach scales and is applicable to inclusion into professional malware detection systems: we applied it to a set of more than 50,000 apps collected in the wild. Code samples of our tool as well as of the detected malware are available

    Malware Resistant Data Protection in Hyper-connected Networks: A survey

    Full text link
    Data protection is the process of securing sensitive information from being corrupted, compromised, or lost. A hyperconnected network, on the other hand, is a computer networking trend in which communication occurs over a network. However, what about malware. Malware is malicious software meant to penetrate private data, threaten a computer system, or gain unauthorised network access without the users consent. Due to the increasing applications of computers and dependency on electronically saved private data, malware attacks on sensitive information have become a dangerous issue for individuals and organizations across the world. Hence, malware defense is critical for keeping our computer systems and data protected. Many recent survey articles have focused on either malware detection systems or single attacking strategies variously. To the best of our knowledge, no survey paper demonstrates malware attack patterns and defense strategies combinedly. Through this survey, this paper aims to address this issue by merging diverse malicious attack patterns and machine learning (ML) based detection models for modern and sophisticated malware. In doing so, we focus on the taxonomy of malware attack patterns based on four fundamental dimensions the primary goal of the attack, method of attack, targeted exposure and execution process, and types of malware that perform each attack. Detailed information on malware analysis approaches is also investigated. In addition, existing malware detection techniques employing feature extraction and ML algorithms are discussed extensively. Finally, it discusses research difficulties and unsolved problems, including future research directions.Comment: 30 pages, 9 figures, 7 tables, no where submitted ye

    Neural-Augmented Static Analysis of Android Communication

    Full text link
    We address the problem of discovering communication links between applications in the popular Android mobile operating system, an important problem for security and privacy in Android. Any scalable static analysis in this complex setting is bound to produce an excessive amount of false-positives, rendering it impractical. To improve precision, we propose to augment static analysis with a trained neural-network model that estimates the probability that a communication link truly exists. We describe a neural-network architecture that encodes abstractions of communicating objects in two applications and estimates the probability with which a link indeed exists. At the heart of our architecture are type-directed encoders (TDE), a general framework for elegantly constructing encoders of a compound data type by recursively composing encoders for its constituent types. We evaluate our approach on a large corpus of Android applications, and demonstrate that it achieves very high accuracy. Further, we conduct thorough interpretability studies to understand the internals of the learned neural networks.Comment: Appears in Proceedings of the 2018 ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE
    • …
    corecore