Search CORE

950 research outputs found

Succinct Representations for Abstract Interpretation

Author: A. Miné
B. Dutertre
B. Jeannet
B. Jeannet
C. Lattner
D. Gopan
D. Monniaux
D. Monniaux
L. Gonnord
L. Moura de
N. Halbwachs
P. Cousot
R. Bagnara
R. Bagnara
R. Bagnara
R. Sharma
T. Gawlitza
X. Rival
Publication venue
Publication date: 01/01/2012
Field of study

Abstract interpretation techniques can be made more precise by distinguishing paths inside loops, at the expense of possibly exponential complexity. SMT-solving techniques and sparse representations of paths and sets of paths avoid this pitfall. We improve previously proposed techniques for guided static analysis and the generation of disjunctive invariants by combining them with techniques for succinct representations of paths and symbolic representations for transitions based on static single assignment. Because of the non-monotonicity of the results of abstract interpretation with widening operators, it is difficult to conclude that some abstraction is more precise than another based on theoretical local precision results. We thus conducted extensive comparisons between our new techniques and previous ones, on a variety of open-source packages.Comment: Static analysis symposium (SAS), Deauville : France (2012

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

Lightweight Multilingual Software Analysis

Author: Baird David
Bogar Anne Marie
Lyons Damian M.
Publication venue
Publication date: 01/01/2017
Field of study

Developer preferences, language capabilities and the persistence of older languages contribute to the trend that large software codebases are often multilingual, that is, written in more than one computer language. While developers can leverage monolingual software development tools to build software components, companies are faced with the problem of managing the resultant large, multilingual codebases to address issues with security, efficiency, and quality metrics. The key challenge is to address the opaque nature of the language interoperability interface: one language calling procedures in a second (which may call a third, or even back to the first), resulting in a potentially tangled, inefficient and insecure codebase. An architecture is proposed for lightweight static analysis of large multilingual codebases: the MLSA architecture. Its modular and table-oriented structure addresses the open-ended nature of multiple languages and language interoperability APIs. We focus here as an application on the construction of call-graphs that capture both inter-language and intra-language calls. The algorithms for extracting multilingual call-graphs from codebases are presented, and several examples of multilingual software engineering analysis are discussed. The state of the implementation and testing of MLSA is presented, and the implications for future work are discussed.Comment: 15 page

arXiv.org e-Print Archive

Crossref

Fordham University: DigitalResearch@Fordham

A Tree Locality-Sensitive Hash for Secure Software Testing

Author: Cady Camdon J.
Publication venue: AFIT Scholar
Publication date: 14/09/2017
Field of study

Bugs in software that make it through testing can cost tens of millions of dollars each year, and in some cases can even result in the loss of human life. In order to eliminate bugs, developers may use symbolic execution to search through possible program states looking for anomalous states. Most of the computational effort to search through these states is spent solving path constraints in order to determine the feasibility of entering each state. State merging can make this search more efficient by combining program states, allowing multiple execution paths to be analyzed at the same time. However, a merge with dissimilar path constraints dramatically increases the time necessary to solve the path constraint. Currently, there are no distance measures for path constraints, and pairwise comparison of program states is not scalable. A hashing method is presented that clusters constraints in such a way that similar constraints are placed in the same cluster without requiring pairwise comparisons between queries. When combined with other state-of-the-art state merging techniques, the hashing method allows the symbolic executor to execute more instructions per second and find more terminal execution states than the other techniques alone, without decreasing the high path coverage achieved by merging many states together

AFTI Scholar (Air Force Institute of Technology)

Recommended from our members

Uncovering Features in Behaviorally Similar Programs

Author: Su Fang-Hsiang
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

The detection of similar code can support many so ware engineering tasks such as program understanding and program classification. Many excellent approaches have been proposed to detect programs having similar syntactic features. However, these approaches are unable to identify programs dynamically or statistically close to each other, which we call behaviorally similar programs. We believe the detection of behaviorally similar programs can enhance or even automate the tasks relevant to program classification. In this thesis, we will discuss our current approaches to identify programs having similar behavioral features in multiple perspectives. We first discuss how to detect programs having similar functionality. While the definition of a program’s functionality is undecidable, we use inputs and outputs (I/Os) of programs as the proxy of their functionality. We then use I/Os of programs as a behavioral feature to detect which programs are functionally similar: two programs are functionally similar if they share similar inputs and outputs. This approach has been studied and developed in the C language to detect functionally equivalent programs having equivalent I/Os. Nevertheless, some natural problems in Object Oriented languages, such as input generation and comparisons between application-specific data types, hinder the development of this approach. We propose a new technique, in-vivo detection, which uses existing and meaningful inputs to drive applications systematically and then applies a novel similarity model considering both inputs and outputs of programs, to detect functionally similar programs. We develop the tool, HitoshiIO, based on our in-vivo detection. In the subjects that we study, HitoshiIO correctly detect 68.4% of functionally similar programs, where its false positive rate is only 16.6%. In addition to functional I/Os of programs, we attempt to discover programs having similar execution behavior. Again, the execution behavior of a program can be undecidable, so we use instructions executed at run-time as a behavioral feature of a program. We create DyCLINK, which observes program executions and encodes them in dynamic instruction graphs. A vertex in a dynamic instruction graph is an instruction and an edge is a type of dependency between two instructions. The problem to detect which programs have similar executions can then be reduced to a problem of solving inexact graph isomorphism. We propose a link analysis based algorithm, LinkSub, which vectorizes each dynamic instruction graph by the importance of every instruction, to solve this graph isomorphism problem efficiently. In a K Nearest Neighbor (KNN) based program classification experiment, DyCLINK achieves 90 + % precision. Because HitoshiIO and DyCLINK both rely on dynamic analysis to expose program behavior, they have better capability to locate and search for behaviorally similar programs than traditional static analysis tools. However, they suffer from some common problems of dynamic analysis, such as input generation and run-time overhead. These problems may make our approaches challenging to scale. Thus, we create the system, Macneto, which integrates static analysis with machine topic modeling and deep learning to approximate program behaviors from their binaries without truly executing programs. In our deobfuscation experiments considering two commercial obfuscators that alter lexical information and syntax in programs, Macneto achieves 90 + % precision, where the groundtruth is that the behavior of a program before and after obfuscation should be the same. In this thesis, we offer a more extensive view of similar programs than the traditional definitions. While the traditional definitions of similar programs mostly use static features, such as syntax and lexical information, we propose to leverage the power of dynamic analysis and machine learning models to trace/collect behavioral features of pro- grams. These behavioral features of programs can then apply to detect behaviorally similar programs. We believe the techniques we invented in this thesis to detect behaviorally similar programs can improve the development of software engineering and security applications, such as code search and deobfuscation

Columbia University Academic Commons

Where's Crypto?: Automated Identification and Classification of Proprietary Cryptographic Primitives in Binary Code

Author: Meijer Carlo
Moonsamy Veelasha
Wetzels Jos
Publication venue
Publication date: 09/09/2020
Field of study

The continuing use of proprietary cryptography in embedded systems across many industry verticals, from physical access control systems and telecommunications to machine-to-machine authentication, presents a significant obstacle to black-box security-evaluation efforts. In-depth security analysis requires locating and classifying the algorithm in often very large binary images, thus rendering manual inspection, even when aided by heuristics, time consuming. In this paper, we present a novel approach to automate the identification and classification of (proprietary) cryptographic primitives within binary code. Our approach is based on Data Flow Graph (DFG) isomorphism, previously proposed by Lestringant et al. Unfortunately, their DFG isomorphism approach is limited to known primitives only, and relies on heuristics for selecting code fragments for analysis. By combining the said approach with symbolic execution, we overcome all limitations of their work, and are able to extend the analysis into the domain of unknown, proprietary cryptographic primitives. To demonstrate that our proposal is practical, we develop various signatures, each targeted at a distinct class of cryptographic primitives, and present experimental evaluations for each of them on a set of binaries, both publicly available (and thus providing reproducible results), and proprietary ones. Lastly, we provide a free and open-source implementation of our approach, called Where's Crypto?, in the form of a plug-in for the popular IDA disassembler.Comment: A proof-of-concept implementation can be found at https://github.com/wheres-crypto/wheres-crypt

arXiv.org e-Print Archive

Identifying Multi-Binary Vulnerabilities in Embedded Firmware at Scale

Author: Continella Andrea
Kruegel Christopher
Machiry Aravind
Redini Nilo
Shoshitaishvili Yan
Spensky Chad
Vigna Giovanni
Wang Ruoyu
Publication venue
Publication date: 01/04/2020
Field of study

University of Twente Research Information

Identifying reusable functions in code using specification driven techniques

Author: De Lucia Andrea
Publication venue
Publication date: 01/01/1995
Field of study

The work described in this thesis addresses the field of software reuse. Software reuse is widely considered as a way to increase the productivity and improve the quality and reliability of new software systems. Identifying, extracting and reengineering software. components which implement abstractions within existing systems is a promising cost-effective way to create reusable assets. Such a process is referred to as reuse reengineering. A reference paradigm has been defined within the RE(^2) project which decomposes a reuse reengineering process in five sequential phases. In particular, the first phase of the reference paradigm, called Candidature phase, is concerned with the analysis of source code for the identification of software components implementing abstractions and which are therefore candidate to be reused. Different candidature criteria exist for the identification of reuse-candidate software components. They can be classified in structural methods (based on structural properties of the software) and specification driven methods (that search for software components implementing a given specification).In this thesis a new specification driven candidature criterion for the identification and the extraction of code fragments implementing functional abstractions is presented. The method is driven by a formal specification of the function to be isolated (given in terms of a precondition and a post condition) and is based on the theoretical frameworks of program slicing and symbolic execution. Symbolic execution and theorem proving techniques are used to map the specification of the functional abstractions onto a slicing criterion. Once the slicing criterion has been identified the slice is isolated using algorithms based on dependence graphs. The method has been specialised for programs written in the C language. Both symbolic execution and program slicing are performed by exploiting the Combined C Graph (CCG), a fine-grained dependence based program representation that can be used for several software maintenance tasks

Durham e-Theses

Software Model Checking with Explicit Scheduler and Symbolic Threads

Author: A. Cimatti F. Giunchiglia, G. Mongardi,
A. Groce and W. Visser
A. Lal and T. W. Reps
Alessandro Cimatti
C. B. Jones
D. A. Peled
D. Beyer T. A. Henzinger, R. Jhala, and
E. M. Clarke H. Jain, and D. Kroening
E. M. Clarke O. Grumberg, S. Jha, Y. Lu
F. Boussinot
G. J. Holzmann
Iman Narasamdya
L. Waszniowski and Z. Hanzálek
Marco Roveri
P. Godefroid
Parosh Abdulla
R. Alur R. K. Brayton, T. A. Henzinger,
R. Cytron J. Ferrante, B. K. Rosen, M.
S. Chaki J. Ouaknine, K. Yorav, and E.
S. S. Owicki and D. Gries
W. Craig
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2011
Field of study

In many practical application domains, the software is organized into a set of threads, whose activation is exclusive and controlled by a cooperative scheduling policy: threads execute, without any interruption, until they either terminate or yield the control explicitly to the scheduler. The formal verification of such software poses significant challenges. On the one side, each thread may have infinite state space, and might call for abstraction. On the other side, the scheduling policy is often important for correctness, and an approach based on abstracting the scheduler may result in loss of precision and false positives. Unfortunately, the translation of the problem into a purely sequential software model checking problem turns out to be highly inefficient for the available technologies. We propose a software model checking technique that exploits the intrinsic structure of these programs. Each thread is translated into a separate sequential program and explored symbolically with lazy abstraction, while the overall verification is orchestrated by the direct execution of the scheduler. The approach is optimized by filtering the exploration of the scheduler with the integration of partial-order reduction. The technique, called ESST (Explicit Scheduler, Symbolic Threads) has been implemented and experimentally evaluated on a significant set of benchmarks. The results demonstrate that ESST technique is way more effective than software model checking applied to the sequentialized programs, and that partial-order reduction can lead to further performance improvements.Comment: 40 pages, 10 figures, accepted for publication in journal of logical methods in computer scienc

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Using Graph Transformations and Graph Abstractions for Software Verification

Author: Rensink Arend
Zambon Eduardo
Publication venue: European Association of Software Science and Technology
Publication date: 03/05/2011
Field of study

In this paper we describe our intended approach for the verification of software written in imperative programming languages. We base our approach on model checking of graph transition systems, where each state is a graph and the transitions are specified by graph transformation rules. We believe that graph transformation is a very suitable technique to model the execution semantics of languages with dynamic memory allocation. Furthermore, such representation allows us to investigate the use of graph abstractions, which can mitigate the combinatorial explosion inherent to model checking. In addition to presenting our planned approach, we reason about its feasibility, and, by providing a brief comparison to other existing methods, we highlight the benefits and drawbacks that are expected

Electronic Communications of the EASST (European Association of Software Science and Technology)