Search CORE

7 research outputs found

A compiler level intermediate representation based binary analysis system and its applications

Author: Anand Kapil
Publication venue
Publication date: 01/01/2013
Field of study

Analyzing and optimizing programs from their executables has received a lot of attention recently in the research community. There has been a tremendous amount of activity in executable-level research targeting varied applications such as security vulnerability analysis, untrusted code analysis, malware analysis, program testing, and binary optimizations. The vision of this dissertation is to advance the field of static analysis of executables and bridge the gap between source-level analysis and executable analysis. The main thesis of this work is scalable static binary rewriting and analysis using compiler-level intermediate representation without relying on the presence of metadata information such as debug or symbolic information. In spite of a significant overlap in the overall goals of several source-code methods and executables-level techniques, several sophisticated transformations that are well-understood and implemented in source-level infrastructures have yet to become available in executable frameworks. It is a well known fact that a standalone executable without any meta data is less amenable to analysis than the source code. Nonetheless, we believe that one of the prime reasons behind the limitations of existing executable frameworks is that current executable frameworks define their own intermediate representations (IR) which are significantly more constrained than an IR used in a compiler. Intermediate representations used in existing binary frameworks lack high level features like abstract stack, variables, and symbols and are even machine dependent in some cases. This severely limits the application of well-understood compiler transformations to executables and necessitates new research to make them applicable. In the first part of this dissertation, we present techniques to convert the binaries to the same high-level intermediate representation that compilers use. We propose methods to segment the flat address space in an executable containing undifferentiated blocks of memory. We demonstrate the inadequacy of existing variable identification methods for their promotion to symbols and present our methods for symbol promotion. We also present methods to convert the physically addressed stack in an executable to an abstract stack. The proposed methods are practical since they do not employ symbolic, relocation, or debug information which are usually absent in deployed executables. We have integrated our techniques with a prototype x86 binary framework called \emph{SecondWrite} that uses LLVM as the IR. The robustness of the framework is demonstrated by handling executables totaling more than a million lines of source-code, including several real world programs. In the next part of this work, we demonstrate that several well-known source-level analysis frameworks such as symbolic analysis have limited effectiveness in the executable domain since executables typically lack higher-level semantics such as program variables. The IR should have a precise memory abstraction for an analysis to effectively reason about memory operations. Our first work of recovering a compiler-level representation addresses this limitation by recovering several higher-level semantics information from executables. In the next part of this work, we propose methods to handle the scenarios when such semantics cannot be recovered. First, we propose a hybrid static-dynamic mechanism for recovering a precise and correct memory model in executables in presence of executable-specific artifacts such as indirect control transfers. Next, the enhanced memory model is employed to define a novel symbolic analysis framework for executables that can perform the same types of program analysis as source-level tools. Frameworks hitherto fail to simultaneously maintain the properties of correct representation and precise memory model and ignore memory-allocated variables while defining symbolic analysis mechanisms. We exemplify that our framework is robust, efficient and it significantly improves the performance of various traditional analyses like global value numbering, alias analysis and dependence analysis for executables. Finally, the underlying representation and analysis framework is employed for two separate applications. First, the framework is extended to define a novel static analysis framework, \emph{DemandFlow}, for identifying information flow security violations in program executables. Unlike existing static vulnerability detection methods for executables, DemandFlow analyzes memory locations in addition to symbols, thus improving the precision of the analysis. DemandFlow proposes a novel demand-driven mechanism to identify and precisely analyze only those program locations and memory accesses which are relevant to a vulnerability, thus enhancing scalability. DemandFlow uncovers six previously undiscovered format string and directory traversal vulnerabilities in popular ftp and internet relay chat clients. Next, the framework is extended to implement a platform-specific optimization for embedded processors. Several embedded systems provide the facility of locking one or more lines in the cache. We devise the first method in literature that employs instruction cache locking as a mechanism for improving the average-case run-time of general embedded applications. We demonstrate that the optimal solution for instruction cache locking can be obtained in polynomial time. Since our scheme is implemented inside a binary framework, it successfully addresses the portability concern by enabling the implementation of cache locking at the time of deployment when all the details of the memory hierarchy are available

CiteSeerX

Digital Repository at the University of Maryland

Low-overhead Online Code Transformations.

Author: Laurenzano Michael A.
Publication venue
Publication date
Field of study

The ability to perform online code transformations - to dynamically change the implementation of running native programs - has been shown to be useful in domains as diverse as optimization, security, debugging, resilience and portability. However, conventional techniques for performing online code transformations carry significant runtime overhead, limiting their applicability for performance-sensitive applications. This dissertation proposes and investigates a novel low-overhead online code transformation technique that works by running the dynamic compiler asynchronously and in parallel to the running program. As a consequence, this technique allows programs to execute with the online code transformation capability at near-native speed, unlocking a host of additional opportunities that can take advantage of the ability to re-visit compilation choices as the program runs. This dissertation builds on the low-overhead online code transformation mechanism, describing three novel runtime systems that represent in best-in-class solutions to three challenging problems facing modern computer scientists. First, I leverage online code transformations to significantly increase the utilization of multicore datacenter servers by dynamically managing program cache contention. Compared to state-of-the-art prior work that mitigate contention by throttling application execution, the proposed technique achieves a 1.3-1.5x improvement in application performance. Second, I build a technique to automatically configure and parameterize approximate computing techniques for each program input. This technique results in the ability to configure approximate computing to achieve an average performance improvement of 10.2x while maintaining 90% result accuracy, which significantly improves over oracle versions of prior techniques. Third, I build an operating system designed to secure running applications from dynamic return oriented programming attacks by efficiently, transparently and continuously re-randomizing the code of running programs. The technique is able to re-randomize program code at a frequency of 300ms with an average overhead of 9%, a frequency fast enough to resist state-of-the-art return oriented programming attacks based on memory disclosures and side channels.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120775/1/mlaurenz_1.pd

Deep Blue Documents at the University of Michigan

Improving systems software security through program analysis and instrumentation

Author: Kuznetsov Volodymyr
Publication venue: Lausanne, EPFL
Publication date: 25/05/2016
Field of study

Security and reliability bugs are prevalent in systems software. Systems code is often written in low-level languages like C/C++, which offer many benefits but also delegate memory management and type safety to programmers. This invites bugs that cause crashes or can be exploited by attackers to take control of the program. This thesis presents techniques to detect and fix security and reliability issues in systems software without burdening the software developers. First, we present code-pointer integrity (CPI), a technique that combines static analysis with compile-time instrumentation to guarantee the integrity of all code pointers in a program and thereby prevent all control-flow hijack attacks. We also present code-pointer separation (CPS), a relaxation of CPI with better performance properties. CPI and CPS offer substantially better security-to-overhead ratios than the state of the art in control flow hijack defense mechanisms, they are practical (we protect a complete FreeBSD system and over 100 packages like apache and postgresql), effective (prevent all attacks in the RIPE benchmark), and efficient: on SPEC CPU2006, CPS averages 1.2% overhead for C and 1.9% for C/C++, while CPIâs overhead is 2.9% for C and 8.4% for C/C++. Second, we present DDT, a tool for testing closed-source device drivers to automatically find bugs like memory errors or race conditions. DDT showcases a combination of a form of program analysis called selective symbolic execution with virtualization to thoroughly exercise tested drivers and produce detailed, executable traces for every path that leads to a failure. We applied DDT to several closed-source Microsoft-certified Windows device drivers and discovered 14 serious new bugs that can cause crashes or compromise security of the entire system. Third, we present a technique for increasing the scalability of symbolic execution by merging states obtained on different execution paths. State merging reduces the number of states to analyze, but the merged states can be more complex and harder to analyze than their individual components. We introduce query count estimation, a technique to reason about the analysis time of merged states and decide which states to merge in order to achieve optimal net performance of symbolic execution. We also introduce dynamic state merging, a technique for merging states that interacts favorably with search strategies employed by practical bug finding tools, such as DDT and KLEE. Experiments on the 96 GNU Coreutils show that our approach consistently achieves several orders of magnitude speedup over previously published results

Infoscience - École polytechnique fédérale de Lausanne

Backdoor detection systems for embedded devices

Author: Thomas Sam Lloyd
Publication venue
Publication date: 01/12/2018
Field of study

A system is said to contain a backdoor when it intentionally includes a means to trigger the execution of functionality that serves to subvert its expected security. Unfortunately, such constructs are pervasive in software and systems today, particularly in the firmware of commodity embedded systems and “Internet of Things” devices. The work presented in this thesis concerns itself with the problem of detecting backdoor-like constructs, specifically those present in embedded device firmware, which, as we show, presents additional challenges in devising detection methodologies. The term “backdoor”, while used throughout the academic literature, by industry, and in the media, lacks a rigorous definition, which exacerbates the challenges in their detection. To this end, we present such a definition, as well as a framework, which serves as a basis for their discovery, devising new detection techniques and evaluating the current state-of-the-art. Further, we present two backdoor detection methodologies, as well as corresponding tools which implement those approaches. Both of these methods serve to automate many of the currently manual aspects of backdoor identification and discovery. And, in both cases, we demonstrate that our approaches are capable of analysing device firmware at scale and can be used to discover previously undocumented real-world backdoors

University of Birmingham Research Archive, E-theses Repository

Vulnerability detection in device drivers

Author: Mendonça Manuel José Ferreira Carneiro
Publication venue
Publication date: 01/01/2017
Field of study

Tese de doutoramento, Informática (Ciência da Computação), Universidade de Lisboa, Faculdade de Ciências, 2017The constant evolution in electronics lets new equipment/devices to be regularly made available on the market, which has led to the situation where common operating systems (OS) include many device drivers(DD) produced by very diverse manufactures. Experience has shown that the development of DD is error prone, as a majority of the OS crashes can be attributed to flaws in their implementation. This thesis addresses the challenge of designing methodologies and tools to facilitate the detection of flaws in DD, contributing to decrease the errors in this kind of software, their impact in the OS stability, and the security threats caused by them. This is especially relevant because it can help developers to improve the quality of drivers during their implementation or when they are integrated into a system. The thesis work started by assessing how DD flaws can impact the correct execution of the Windows OS. The employed approach used a statistical analysis to obtain the list of kernel functions most used by the DD, and then automatically generated synthetic drivers that introduce parameter errors when calling a kernel function, thus mimicking a faulty interaction. The experimental results showed that most targeted functions were ineffective in the defence of the incorrect parameters. A reasonable number of crashes and a small number of hangs were observed suggesting a poor error containment capability of these OS functions. Then, we produced an architecture and a tool that supported the automatic injection of network attacks in mobile equipment (e.g., phone), with the objective of finding security flaws (or vulnerabilities) in Wi-Fi drivers. These DD were selected because they are of easy access to an external adversary, which simply needs to create malicious traffic to exploit them, and therefore the flaws in their implementation could have an important impact. Experiments with the tool uncovered a previously unknown vulnerability that causes OS hangs, when a specific value was assigned to the TIM element in the Beacon frame. The experiments also revealed a potential implementation problem of the TCP-IP stack by the use of disassociation frames when the target device was associated and authenticated with a Wi-Fi access point. Next, we developed a tool capable of registering and instrumenting the interactions between a DD and the OS. The solution used a wrapper DD around the binary of the driver under test, enabling full control over the function calls and parameters involved in the OS-DD interface. This tool can support very diverse testing operations, including the log of system activity and to reverse engineer the driver behaviour. Some experiments were performed with the tool, allowing to record the insights of the behaviour of the interactions between the DD and the OS, the parameter values and return values. Results also showed the ability to identify bugs in drivers, by executing tests based on the knowledge obtained from the driver’s dynamics. Our final contribution is a methodology and framework for the discovery of errors and vulnerabilities in Windows DD by resorting to the execution of the drivers in a fully emulated environment. This approach is capable of testing the drivers without requiring access to the associated hardware or the DD source code, and has a granular control over each machine instruction. Experiments performed with Off the Shelf DD confirmed a high dependency of the correctness of the parameters passed by the OS, identified the precise location and the motive of memory leaks, the existence of dormant and vulnerable code.A constante evolução da eletrónica tem como consequência a disponibilização regular no mercado de novos equipamentos/dispositivos, levando a uma situação em que os sistemas operativos (SO) mais comuns incluem uma grande quantidade de gestores de dispositivos (GD) produzidos por diversos fabricantes. A experiência tem mostrado que o desenvolvimento dos GD é sujeito a erros uma vez que a causa da maioria das paragens do SO pode ser atribuída a falhas na sua implementação. Esta tese centra-se no desafio da criação de metodologias e ferramentas que facilitam a deteção de falhas nos GD, contribuindo para uma diminuição nos erros neste tipo de software, o seu impacto na estabilidade do SO, e as ameaças de segurança por eles causadas. Isto é especialmente relevante porque pode ajudar a melhorar a qualidade dos GD tanto na sua implementação como quando estes são integrados em sistemas. Este trabalho inicia-se com uma avaliação de como as falhas nos GD podem levar a um funcionamento incorreto do SO Windows. A metodologia empregue usa uma análise estatística para obter a lista das funções do SO que são mais utilizadas pelos GD, e posteriormente constrói GD sintéticos que introduzem erros nos parâmetros passados durante a chamada às funções do SO, e desta forma, imita a integração duma falta. Os resultados das experiências mostraram que a maioria das funções testadas não se protege eficazmente dos parâmetros incorretos. Observou-se a ocorrência de um número razoável de paragens e um pequeno número de bloqueios, o que sugere uma pobre capacidade das funções do SO na contenção de erros. Posteriormente, produzimos uma arquitetura e uma ferramenta que suporta a injeção automática de ataques em equipamentos móveis (e.g., telemóveis), com o objetivo de encontrar falhas de segurança (ou vulnerabilidades) em GD de placas de rede Wi-Fi. Estes GD foram selecionados porque são de fácil acesso a um atacante remoto, o qual apenas necessita de criar tráfego malicioso para explorar falhas na sua implementação podendo ter um impacto importante. As experiências realizadas com a ferramenta revelaram uma vulnerabilidade anteriormente desconhecida que provoca um bloqueio no SO quando é atribuído um valor específico ao campo TIM da mensagem de Beacon. As experiências também revelaram um potencial problema na implementação do protocolo TCP-IP no uso das mensagens de desassociação quando o dispositivo alvo estava associado e autenticado com o ponto de acesso Wi-Fi. A seguir, desenvolvemos uma ferramenta com a capacidade de registar e instrumentar as interações entre os GD e o SO. A solução usa um GD que envolve o código binário do GD em teste, permitindo um controlo total sobre as chamadas a funções e aos parâmetros envolvidos na interface SO-GD. Esta ferramenta suporta diversas operações de teste, incluindo o registo da atividade do sistema e compreensão do comportamento do GD. Foram realizadas algumas experiências com esta ferramenta, permitindo o registo das interações entre o GD e o SO, os valores dos parâmetros e os valores de retorno das funções. Os resultados mostraram a capacidade de identificação de erros nos GD, através da execução de testes baseados no conhecimento da dinâmica do GD. A nossa contribuição final é uma metodologia e uma ferramenta para a descoberta de erros e vulnerabilidades em GD Windows recorrendo à execução do GD num ambiente totalmente emulado. Esta abordagem permite testar GD sem a necessidade do respetivo hardware ou o código fonte, e possuí controlo granular sobre a execução de cada instrução máquina. As experiências realizadas com GD disponíveis comercialmente confirmaram a grande dependência que os GD têm nos parâmetros das funções do SO, e identificaram o motivo e a localização precisa de fugas de memória, a existência de código não usado e vulnerável

Universidade de Lisboa: Repositório.UL

Techniques for Detection, Root Cause Diagnosis, and Classification of In-Production Concurrency Bugs

Author: Kasikci Baris Can Cengiz
Publication venue: Lausanne, EPFL
Publication date: 16/12/2015
Field of study

Concurrency bugs are at the heart of some of the worst bugs that plague software. Concurrency bugs slow down software development because it can take weeks or even months before developers can identify and fix them. In-production detection, root cause diagnosis, and classification of concurrency bugs is challenging. This is because these activities require heavyweight analyses such as exploring program paths and determining failing program inputs and schedules, all of which are not suited for software running in production. This dissertation develops practical techniques for the detection, root cause diagnosis, and classification of concurrency bugs for inproduction software. Furthermore, we develop ways for developers to better reason about concurrent programs. This dissertation builds upon the following principles: — The approach in this dissertation spans multiple layers of the system stack, because concurrency spans many layers of the system stack. — It performs most of the heavyweight analyses in-house and resorts to minimal in-production analysis in order to move the heavy lifting to where it is least disruptive. — It eschews custom hardware solutions that may be infeasible to implement in the real world. Relying on the aforementioned principles, this dissertation introduces: 1. Techniques to automatically detect concurrency bugs (data races and atomicity violations) in-production by combining in-house static analysis and in-production dynamic analysis. 2. A technique to automatically identify the root causes of in-production failures, with a particular emphasis on failures caused by concurrency bugs. 3. A technique that given a data race, automatically classifies it based on its potential consequence, allowing developers to answer questions such as “can the data race cause a crash or a hang?”, or “does the data race have any observable effect?”. We build a toolchain that implements all the aforementioned techniques. We show that the tools we develop in this dissertation are effective, incur low runtime performance overhead, and have high accuracy and precision

Infoscience - École polytechnique fédérale de Lausanne