670 research outputs found

    An Efficient Platform for the Automatic Extraction of Patterns in Native Code

    Get PDF
    Different software tools, such as decompilers, code quality analyzers, recognizers of packed executable files, authorship analyzers, and malware detectors, search for patterns in binary code. The use of machine learning algorithms, trained with programs taken from the huge number of applications in the existing open source code repositories, allows finding patterns not detected with the manual approach. To this end, we have created a versatile platform for the automatic extraction of patterns from native code, capable of processing big binary files. Its implementation has been parallelized, providing important runtime performance benefits for multicore architectures. Compared to the single-processor execution, the average performance improvement obtained with the best configuration is 3.5 factors over the maximum theoretical gain of 4 factors

    Automated Dynamic Firmware Analysis at Scale: A Case Study on Embedded Web Interfaces

    Full text link
    Embedded devices are becoming more widespread, interconnected, and web-enabled than ever. However, recent studies showed that these devices are far from being secure. Moreover, many embedded systems rely on web interfaces for user interaction or administration. Unfortunately, web security is known to be difficult, and therefore the web interfaces of embedded systems represent a considerable attack surface. In this paper, we present the first fully automated framework that applies dynamic firmware analysis techniques to achieve, in a scalable manner, automated vulnerability discovery within embedded firmware images. We apply our framework to study the security of embedded web interfaces running in Commercial Off-The-Shelf (COTS) embedded devices, such as routers, DSL/cable modems, VoIP phones, IP/CCTV cameras. We introduce a methodology and implement a scalable framework for discovery of vulnerabilities in embedded web interfaces regardless of the vendor, device, or architecture. To achieve this goal, our framework performs full system emulation to achieve the execution of firmware images in a software-only environment, i.e., without involving any physical embedded devices. Then, we analyze the web interfaces within the firmware using both static and dynamic tools. We also present some interesting case-studies, and discuss the main challenges associated with the dynamic analysis of firmware images and their web interfaces and network services. The observations we make in this paper shed light on an important aspect of embedded devices which was not previously studied at a large scale. We validate our framework by testing it on 1925 firmware images from 54 different vendors. We discover important vulnerabilities in 185 firmware images, affecting nearly a quarter of vendors in our dataset. These experimental results demonstrate the effectiveness of our approach

    CryptoKnight:generating and modelling compiled cryptographic primitives

    Get PDF
    Cryptovirological augmentations present an immediate, incomparable threat. Over the last decade, the substantial proliferation of crypto-ransomware has had widespread consequences for consumers and organisations alike. Established preventive measures perform well, however, the problem has not ceased. Reverse engineering potentially malicious software is a cumbersome task due to platform eccentricities and obfuscated transmutation mechanisms, hence requiring smarter, more efficient detection strategies. The following manuscript presents a novel approach for the classification of cryptographic primitives in compiled binary executables using deep learning. The model blueprint, a Dynamic Convolutional Neural Network (DCNN), is fittingly configured to learn from variable-length control flow diagnostics output from a dynamic trace. To rival the size and variability of equivalent datasets, and to adequately train our model without risking adverse exposure, a methodology for the procedural generation of synthetic cryptographic binaries is defined, using core primitives from OpenSSL with multivariate obfuscation, to draw a vastly scalable distribution. The library, CryptoKnight, rendered an algorithmic pool of AES, RC4, Blowfish, MD5 and RSA to synthesise combinable variants which automatically fed into its core model. Converging at 96% accuracy, CryptoKnight was successfully able to classify the sample pool with minimal loss and correctly identified the algorithm in a real-world crypto-ransomware applicatio

    The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files

    Get PDF
    In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated

    Malware detection and analysis via layered annotative execution

    Get PDF
    Malicious software (i.e., malware) has become a severe threat to interconnected computer systems for decades and has caused billions of dollars damages each year. A large volume of new malware samples are discovered daily. Even worse, malware is rapidly evolving to be more sophisticated and evasive to strike against current malware analysis and defense systems. This dissertation takes a root-cause oriented approach to the problem of automatic malware detection and analysis. In this approach, we aim to capture the intrinsic natures of malicious behaviors, rather than the external symptoms of existing attacks. We propose a new architecture for binary code analysis, which is called whole-system out-of-the-box fine-grained dynamic binary analysis, to address the common challenges in malware detection and analysis. to realize this architecture, we build a unified and extensible analysis platform, codenamed TEMU. We propose a core technique for fine-grained dynamic binary analysis, called layered annotative execution, and implement this technique in TEMU. Then on the basis of TEMU, we have proposed and built a series of novel techniques for automatic malware detection and analysis. For postmortem malware analysis, we have developed Renovo, Panorama, HookFinder, and MineSweeper, for detecting and analyzing various aspects of malware. For proactive malware detection, we have built HookScout as a proactive hook detection system. These techniques capture intrinsic characteristics of malware and thus are well suited for dealing with new malware samples and attack mechanisms

    Automating the application data placement in hybrid memory systems

    Get PDF
    Multi-tiered memory systems, such as those based on Intel® Xeon Phi™processors, are equipped with several memory tiers with different characteristics including, among others, capacity, access latency, bandwidth, energy consumption, and volatility. The proper distribution of the application data objects into the available memory layers is key to shorten the time– to–solution, but the way developers and end-users determine the most appropriate memory tier to place the application data objects has not been properly addressed to date.In this paper we present a novel methodology to build an extensible framework to automatically identify and place the application’s most relevant memory objects into the Intel Xeon Phi fast on-package memory. Our proposal works on top of inproduction binaries by first exploring the application behavior and then substituting the dynamic memory allocations. This makes this proposal valuable even for end-users who do not have the possibility of modifying the application source code. We demonstrate the value of a framework based in our methodology for several relevant HPC applications using different allocation strategies to help end-users improve performance with minimal intervention. The results of our evaluation reveal that our proposal is able to identify the key objects to be promoted into fast on-package memory in order to optimize performance, leading to even surpassing hardware-based solutions.This work has been performed in the Intel-BSC Exascale Lab. Antonio J. Peña is cofinanced by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva fellowship number IJCI-2015-23266. We would like to thank the Intel’s DCG HEAT team for allowing us to access their computational resources. We also want to acknowledge this team, especially Larry Meadows and Jason Sewall, as well as Pardo Keppel for the productive discussions. We thank Raphaël Léger for allowing us to access the MAXW-DGTD application and its input.Peer ReviewedPostprint (author's final draft

    Robust Low-Overhead Binary Rewriting: Design, Extensibility, And Customizability

    Get PDF
    Binary rewriting is the foundation of a wide range of binary analysis tools and techniques, including securing untrusted code, enforcing control-flow integrity, dynamic optimization, profiling, race detection, and taint tracking to prevent data leaks. There are two equally important and necessary criteria that a binary rewriter must have: it must be robust and incur low overhead. First, a binary rewriter must work for different binaries, including those produced by commercial compilers from a wide variety of languages, and possibly modified by obfuscation tools. Second, the binary rewriter must be low overhead. Although the off-line use of programs, such as testing and profiling, can tolerate large overheads, the use of binary rewriters in deployed programs must not introduce significant overheads; typically, it should not be more than a few percent. Existing binary rewriters have their challenges: static rewriters do not reliably work for stripped binaries (i.e., those without relocation information), and dynamic rewriters suffer from high base overhead. Because of this high overhead, existing dynamic rewriters are limited to off-line testing and cannot be practically used in deployment. In the first part, we have designed and implemented a dynamic binary rewriter called RL-Bin, a robust binary rewriter that can instrument binaries reliably with very low overhead. Unlike existing static rewriters, RL-Bin works for all benign binaries, including stripped binaries that do not contain relocation information. In addition, RL-Bin does not suffer from high overhead because its design is not based on the code-cache, which is the primary mechanism for other dynamic rewriters such as Pin, DynamoRIO, and Dyninst. RL-Bin's design and optimization methods have empowered RL-Bin to rewrite binaries with very low overhead (1.04x on average for SPECrate 2017) and very low memory overhead (1.69x for SPECrate 2017). In comparison, existing dynamic rewriters have a high runtime overhead (1.16x for DynamoRIO, 1.29x for Pin, and 1.20x for Dyninst) and have a bigger memory footprint (2.5x for DynamoRIO, 2.73x for Pin, and 2.3x for Dyninst). RL-Bin differentiates itself from other rewriters by having negligible overhead, which is proportional to the added instrumentation. This low overhead is achieved by utilizing an in-place design and applying multiple novel optimization methods. As a result, lightweight instrumentation can be added to applications deployed in live systems for monitoring and analysis purposes. In the second part, we present RL-Bin++, an improved version of RL-Bin, that handles various problematic real-world features commonly found in obfuscated binaries. We demonstrate the effectiveness of RL-Bin++ for the SPECrate 2017 benchmark obfuscated with UPX, PECompact, and ASProtect obfuscation tools. RL-Bin++ can efficiently instrument heavily obfuscated binaries (overhead averaging 2.76x, compared to 4.11x, 4.72x, and 5.31x overhead respectively caused by DynamoRIO, Dyninst, and Pin). However, the major accomplishment is that we achieved this while maintaining the low overhead of RL-Bin for unobfuscated binaries (only 1.04x). The extra level of robustness is achieved by employing dynamic deobfuscation techniques and using a novel hybrid in-place and code-cache design. Finally, to show the efficacy of RL-Bin in the development of sophisticated and efficient analysis tools, we have designed, implemented, and tested two novel applications of RL-Bin; An application-level file access permission system and a security tool for enforcing secure execution of applications. Using RL-Bin's system call instrumentation capability, we developed a fine-grained file access permission system that enables the user to define separate file access policies for each application. The overhead is very low, only 6%, making this tool practical to be used in live systems. Secondly, we designed a security enforcement tool that instruments indirect control transfer instructions to ensure that the program execution follows the predetermined anticipated path. Hence, it would protect the application from being hijacked. Our implementation showed effectiveness in detecting exploits in real-world programs while being practical with a low overhead of only 9%
    • …
    corecore