48 research outputs found
Code Clone Discovery Based on Concolic Analysis
Software is often large, complicated and expensive to build and maintain. Redundant
code can make these applications even more costly and difficult to maintain. Duplicated
code is often introduced into these systems for a variety of reasons. Some of which
include developer churn, deficient developer application comprehension and lack of
adherence to proper development practices.
Code redundancy has several adverse effects on a software application including an
increased size of the codebase and inconsistent developer changes due to elevated
program comprehension needs. A code clone is defined as multiple code fragments that
produce similar results when given the same input. There are generally four types of
clones that are recognized. They range from simple type-1 and 2 clones, to the more
complicated type-3 and 4 clones. Numerous clone detection mechanisms are able to
identify the simpler types of code clone candidates, but far fewer claim the ability to find
the more difficult type-3 clones. Before CCCD, MeCC and FCD were the only clone
detection techniques capable of finding type-4 clones. A drawback of MeCC is the
excessive time required to detect clones and the likely exploration of an unreasonably
large number of possible paths. FCD requires extensive amounts of random data and a
significant period of time in order to discover clones.
This dissertation presents a new process for discovering code clones known as Concolic
Code Clone Discovery (CCCD). This technique discovers code clone candidates based on
the functionality of the application, not its syntactical nature. This means that things like
naming conventions and comments in the source code have no effect on the proposed
clone detection process. CCCD finds clones by first performing concolic analysis on the
targeted source code. Concolic analysis combines concrete and symbolic execution in
order to traverse all possible paths of the targeted program. These paths are represented
by the generated concolic output. A diff tool is then used to determine if the concolic
output for a method is identical to the output produced for another method. Duplicated
output is indicative of a code clone.
CCCD was validated against several open source applications along with clones of all
four types as defined by previous research. The results demonstrate that CCCD was able
to detect all types of clone candidates with a high level of accuracy.
In the future, CCCD will be used to examine how software developers work with type-3
and type-4 clones. CCCD will also be applied to various areas of security research,
including intrusion detection mechanisms
Implementation and testing of a blackbox and a whitebox fuzzer for file compression routines
Fuzz testing is a software testing technique that has risen to prominence over the past two decades. The unifying feature of all fuzz testers (fuzzers) is their ability to somehow automatically produce random test cases for software. Fuzzers can generally be placed in one of two classes: black-box or white-box. Blackbox fuzzers do not derive information from a program\u27s source or binary in order to restrict the domain of their generated input while white-box fuzzers do. A tradeoff involved in the choice between blackbox and whitebox fuzzing is the rate at which inputs can be produced; since blackbox fuzzers need not do any thinking about the software under test to generate inputs, blackbox fuzzers can generate more inputs per unit time if all other factors are equal. The question of how blackbox and whitebox fuzzing should be used together for ideal economy of software testing has been posed and even speculated about, however, to my knowledge, no publically available study with the intent of characterizing an answer exists. The purpose of this thesis is to provide an initial exploration of the bug-finding characteristics of blackbox and whitebox fuzzers. A blackbox fuzzer is implemented and extended with a concolic execution program to make it whitebox. Both versions of the fuzzer are then used to run tests on some small programs and some parts of a file compression library
SymFusion: Hybrid Instrumentation for Concolic Execution
Concolic execution is a dynamic twist of symbolic execution de-
signed with scalability in mind. Recent concolic executors heavily
rely on program instrumentation to achieve such scalability. The
instrumentation code can be added at compilation time (e.g., using
an LLVM pass), or directly at execution time with the help of a
dynamic binary translator. The former approach results in more ef-
ficient code but requires recompilation. Unfortunately, recompiling
the entire code of a program is not always feasible or practical (e.g.,
in presence of third-party components). On the contrary, the latter
approach does not require recompilation but incurs significantly
higher execution time overhead.
In this paper, we investigate a hybrid instrumentation approach
for concolic execution, called SymFusion. In particular, this hybrid
instrumentation approach allows the user to recompile the core
components of an application, thus minimizing the analysis over-
head on them, while still being able to dynamically instrument the
rest of the application components at execution time. Our experi-
mental evaluation shows that our design can achieve a nice balance
between efficiency and efficacy on several real-world application
Taming the Static Analysis Beast
While industrial-strength static analysis over large, real-world codebases has become commonplace, so too have difficult-to-analyze language constructs, large libraries, and popular frameworks. These features make constructing and evaluating a novel, sound analysis painful, error-prone, and tedious. We motivate the need for research to address these issues by highlighting some of the many challenges faced by static analysis developers in today\u27s software ecosystem. We then propose our short- and long-term research agenda to make static analysis over modern software less burdensome
Combining High-Level and Low-Level Approaches to Evaluate Software Implementations Robustness Against Multiple Fault Injection Attacks
International audiencePhysical fault injections break security functionalities of algorithms by targeting their implementations. Software techniques strengthen such implementations to enhance their robustness against fault attacks. Exhaustively testing physical fault injections is time consuming and requires complex platforms. Simulation solutions are developed for this specific purpose. We chose two independent tools presented in 2014, the Laser Attack Robustness (Lazart) and the Embedded Fault Simulator (EFS) in order to evaluate software implementations against multiple fault injection attacks. Lazart and the EFS share the common goal that consists in detecting vulnerabilities in the code. However, they operate with different techniques , fault models and abstraction levels. This paper aims at exhibiting specific advantages of both approaches and proposes a combining scheme that emphasizes their complementary nature