176 research outputs found
Sciduction: Combining Induction, Deduction, and Structure for Verification and Synthesis
Even with impressive advances in automated formal methods, certain problems
in system verification and synthesis remain challenging. Examples include the
verification of quantitative properties of software involving constraints on
timing and energy consumption, and the automatic synthesis of systems from
specifications. The major challenges include environment modeling,
incompleteness in specifications, and the complexity of underlying decision
problems.
This position paper proposes sciduction, an approach to tackle these
challenges by integrating inductive inference, deductive reasoning, and
structure hypotheses. Deductive reasoning, which leads from general rules or
concepts to conclusions about specific problem instances, includes techniques
such as logical inference and constraint solving. Inductive inference, which
generalizes from specific instances to yield a concept, includes algorithmic
learning from examples. Structure hypotheses are used to define the class of
artifacts, such as invariants or program fragments, generated during
verification or synthesis. Sciduction constrains inductive and deductive
reasoning using structure hypotheses, and actively combines inductive and
deductive reasoning: for instance, deductive techniques generate examples for
learning, and inductive reasoning is used to guide the deductive engines.
We illustrate this approach with three applications: (i) timing analysis of
software; (ii) synthesis of loop-free programs, and (iii) controller synthesis
for hybrid systems. Some future applications are also discussed
Evaluation Methodologies in Software Protection Research
Man-at-the-end (MATE) attackers have full control over the system on which
the attacked software runs, and try to break the confidentiality or integrity
of assets embedded in the software. Both companies and malware authors want to
prevent such attacks. This has driven an arms race between attackers and
defenders, resulting in a plethora of different protection and analysis
methods. However, it remains difficult to measure the strength of protections
because MATE attackers can reach their goals in many different ways and a
universally accepted evaluation methodology does not exist. This survey
systematically reviews the evaluation methodologies of papers on obfuscation, a
major class of protections against MATE attacks. For 572 papers, we collected
113 aspects of their evaluation methodologies, ranging from sample set types
and sizes, over sample treatment, to performed measurements. We provide
detailed insights into how the academic state of the art evaluates both the
protections and analyses thereon. In summary, there is a clear need for better
evaluation methodologies. We identify nine challenges for software protection
evaluations, which represent threats to the validity, reproducibility, and
interpretation of research results in the context of MATE attacks
PowerDrive: Accurate De-Obfuscation and Analysis of PowerShell Malware
PowerShell is nowadays a widely-used technology to administrate and manage
Windows-based operating systems. However, it is also extensively used by
malware vectors to execute payloads or drop additional malicious contents.
Similarly to other scripting languages used by malware, PowerShell attacks are
challenging to analyze due to the extensive use of multiple obfuscation layers,
which make the real malicious code hard to be unveiled. To the best of our
knowledge, a comprehensive solution for properly de-obfuscating such attacks is
currently missing. In this paper, we present PowerDrive, an open-source, static
and dynamic multi-stage de-obfuscator for PowerShell attacks. PowerDrive
instruments the PowerShell code to progressively de-obfuscate it by showing the
analyst the employed obfuscation steps. We used PowerDrive to successfully
analyze thousands of PowerShell attacks extracted from various malware vectors
and executables. The attained results show interesting patterns used by
attackers to devise their malicious scripts. Moreover, we provide a taxonomy of
behavioral models adopted by the analyzed codes and a comprehensive list of the
malicious domains contacted during the analysis
Simplification of General Mixed Boolean-Arithmetic Expressions: GAMBA
Malware code often resorts to various self-protection techniques to
complicate analysis. One such technique is applying Mixed-Boolean Arithmetic
(MBA) expressions as a way to create opaque predicates and diversify and
obfuscate the data flow.
In this work we aim to provide tools for the simplification of nonlinear MBA
expressions in a very practical context to compete in the arms race between the
generation of hard, diverse MBAs and their analysis. The proposed algorithm
GAMBA employs algebraic rewriting at its core and extends SiMBA. It achieves
efficient deobfuscation of MBA expressions from the most widely tested public
datasets and simplifies expressions to their ground truths in most cases,
surpassing peer tools
Understanding Android Obfuscation Techniques: A Large-Scale Investigation in the Wild
In this paper, we seek to better understand Android obfuscation and depict a
holistic view of the usage of obfuscation through a large-scale investigation
in the wild. In particular, we focus on four popular obfuscation approaches:
identifier renaming, string encryption, Java reflection, and packing. To obtain
the meaningful statistical results, we designed efficient and lightweight
detection models for each obfuscation technique and applied them to our massive
APK datasets (collected from Google Play, multiple third-party markets, and
malware databases). We have learned several interesting facts from the result.
For example, malware authors use string encryption more frequently, and more
apps on third-party markets than Google Play are packed. We are also interested
in the explanation of each finding. Therefore we carry out in-depth code
analysis on some Android apps after sampling. We believe our study will help
developers select the most suitable obfuscation approach, and in the meantime
help researchers improve code analysis systems in the right direction
Recommended from our members
Uncovering Features in Behaviorally Similar Programs
The detection of similar code can support many so ware engineering tasks such as program understanding and program classification. Many excellent approaches have been proposed to detect programs having similar syntactic features. However, these approaches are unable to identify programs dynamically or statistically close to each other, which we call behaviorally similar programs. We believe the detection of behaviorally similar programs can enhance or even automate the tasks relevant to program classification. In this thesis, we will discuss our current approaches to identify programs having similar behavioral features in multiple perspectives.
We first discuss how to detect programs having similar functionality. While the definition of a program’s functionality is undecidable, we use inputs and outputs (I/Os) of programs as the proxy of their functionality. We then use I/Os of programs as a behavioral feature to detect which programs are functionally similar: two programs are functionally similar if they share similar inputs and outputs. This approach has been studied and developed in the C language to detect functionally equivalent programs having equivalent I/Os. Nevertheless, some natural problems in Object Oriented languages, such as input generation and comparisons between application-specific data types, hinder the development of this approach. We propose a new technique, in-vivo detection, which uses existing and meaningful inputs to drive applications systematically and then applies a novel similarity model considering both inputs and outputs of programs, to detect functionally similar programs. We develop the tool, HitoshiIO, based on our in-vivo detection. In the subjects that we study, HitoshiIO correctly detect 68.4% of functionally similar programs, where its false positive rate is only 16.6%.
In addition to functional I/Os of programs, we attempt to discover programs having similar execution behavior. Again, the execution behavior of a program can be undecidable, so we use instructions executed at run-time as a behavioral feature of a program. We create DyCLINK, which observes program executions and encodes them in dynamic instruction graphs. A vertex in a dynamic instruction graph is an instruction and an edge is a type of dependency between two instructions. The problem to detect which programs have similar executions can then be reduced to a problem of solving inexact graph isomorphism. We propose a link analysis based algorithm, LinkSub, which vectorizes each dynamic instruction graph by the importance of every instruction, to solve this graph isomorphism problem efficiently. In a K Nearest Neighbor (KNN) based program classification experiment, DyCLINK achieves 90 + % precision.
Because HitoshiIO and DyCLINK both rely on dynamic analysis to expose program behavior, they have better capability to locate and search for behaviorally similar programs than traditional static analysis tools. However, they suffer from some common problems of dynamic analysis, such as input generation and run-time overhead. These problems may make our approaches challenging to scale. Thus, we create the system, Macneto, which integrates static analysis with machine topic modeling and deep learning to approximate program behaviors from their binaries without truly executing programs. In our deobfuscation experiments considering two commercial obfuscators that alter lexical information and syntax in programs, Macneto achieves 90 + % precision, where the groundtruth is that the behavior of a program before and after obfuscation should be the same.
In this thesis, we offer a more extensive view of similar programs than the traditional definitions. While the traditional definitions of similar programs mostly use static features, such as syntax and lexical information, we propose to leverage the power of dynamic analysis and machine learning models to trace/collect behavioral features of pro- grams. These behavioral features of programs can then apply to detect behaviorally similar programs. We believe the techniques we invented in this thesis to detect behaviorally similar programs can improve the development of software engineering and security applications, such as code search and deobfuscation
- …