Search CORE

3,930 research outputs found

PAC Learning-Based Verification and Model Synthesis

Author: Chen Yu-Fang
Hsieh Chiao
Lengál Ondřej
Lii Tsung-Ju
Tsai Ming-Hsien
Wang Bow-Yaw
Wang Farn
Publication venue
Publication date: 02/11/2015
Field of study

We introduce a novel technique for verification and model synthesis of sequential programs. Our technique is based on learning a regular model of the set of feasible paths in a program, and testing whether this model contains an incorrect behavior. Exact learning algorithms require checking equivalence between the model and the program, which is a difficult problem, in general undecidable. Our learning procedure is therefore based on the framework of probably approximately correct (PAC) learning, which uses sampling instead and provides correctness guarantees expressed using the terms error probability and confidence. Besides the verification result, our procedure also outputs the model with the said correctness guarantees. Obtained preliminary experiments show encouraging results, in some cases even outperforming mature software verifiers.Comment: 11 page

arXiv.org e-Print Archive

Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction

Author: Bui Lam Thu
Nguyen Minh Le
Phan Anh Viet
Publication venue
Publication date: 14/02/2018
Field of study

Existing defects in software components is unavoidable and leads to not only a waste of time and money but also many serious consequences. To build predictive models, previous studies focus on manually extracting features or using tree representations of programs, and exploiting different machine learning algorithms. However, the performance of the models is not high since the existing features and tree structures often fail to capture the semantics of programs. To explore deeply programs' semantics, this paper proposes to leverage precise graphs representing program execution flows, and deep neural networks for automatically learning defect features. Firstly, control flow graphs are constructed from the assembly instructions obtained by compiling source code; we thereafter apply multi-view multi-layer directed graph-based convolutional neural networks (DGCNNs) to learn semantic features. The experiments on four real-world datasets show that our method significantly outperforms the baselines including several other deep learning approaches.Comment: presented at ICTAI 201

arXiv.org e-Print Archive

Control-Flow Integrity: Precision, Security, and Performance

Author: Brunthaler Stefan
Burow Nathan
Carr Scott A.
Franz Michael
Larsen Per
Nash Joseph
Payer Mathias
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/01/2017
Field of study

Memory corruption errors in C/C++ programs remain the most common source of security vulnerabilities in today's systems. Control-flow hijacking attacks exploit memory corruption vulnerabilities to divert program execution away from the intended control flow. Researchers have spent more than a decade studying and refining defenses based on Control-Flow Integrity (CFI), and this technique is now integrated into several production compilers. However, so far no study has systematically compared the various proposed CFI mechanisms, nor is there any protocol on how to compare such mechanisms. We compare a broad range of CFI mechanisms using a unified nomenclature based on (i) a qualitative discussion of the conceptual security guarantees, (ii) a quantitative security evaluation, and (iii) an empirical evaluation of their performance in the same test environment. For each mechanism, we evaluate (i) protected types of control-flow transfers, (ii) the precision of the protection for forward and backward edges. For open-source compiler-based implementations, we additionally evaluate (iii) the generated equivalence classes and target sets, and (iv) the runtime performance.Comment: Version submitted to ACM CSUR 01/27/1

arXiv.org e-Print Archive

BENCHIP: Benchmarking Intelligence Processors

Author: Chen Tianshi
Chen Willian
Chen Yunji
Du Zidong
Guo Qi
Lan Huiying
Liu Cong
Liu Haifeng
Liu Shaoli
Rush Allen
Tang Shan
Tao Jinhua
Xu Lingjie
Zhang Lei
Zhou Shengyuan
Publication venue
Publication date: 25/11/2017
Field of study

The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in both software and hardware). However, existing benchmarks are unsuitable for benchmarking intelligence processors due to their non-diversity and nonrepresentativeness. Also, the lack of a standard benchmarking methodology further exacerbates this problem. In this paper, we propose BENCHIP, a benchmark suite and benchmarking methodology for intelligence processors. The benchmark suite in BENCHIP consists of two sets of benchmarks: microbenchmarks and macrobenchmarks. The microbenchmarks consist of single-layer networks. They are mainly designed for bottleneck analysis and system optimization. The macrobenchmarks contain state-of-the-art industrial networks, so as to offer a realistic comparison of different platforms. We also propose a standard benchmarking methodology built upon an industrial software stack and evaluation metrics that comprehensively reflect the various characteristics of the evaluated intelligence processors. BENCHIP is utilized for evaluating various hardware platforms, including CPUs, GPUs, and accelerators. BENCHIP will be open-sourced soon.Comment: 37pages, 14 figure

arXiv.org e-Print Archive

Automatic Library Version Identification, an Exploration of Techniques

Author: Rinsma Thomas
Publication venue
Publication date: 01/03/2017
Field of study

This paper is the result of a two month research internship on the topic of library version identification. In this paper, ideas and techniques from literature in the area of binary comparison and fingerprinting are outlined and applied to the problem of (version) identification of shared libraries and of libraries within statically linked binary executables. Six comparison techniques are chosen and implemented in an open-source tool which in turn makes use of the open-source radare2 framework for signature generation. The effectiveness of the techniques is empirically analyzed by comparing both artificial and real sample files against a reference dataset of multiple versions of dozens of libraries. The results show that out of these techniques, readable string--based techniques perform the best and that one of these techniques correctly identifies multiple libraries contained in a stripped statically linked executable file.Comment: 9 pages, short technical repor

arXiv.org e-Print Archive

A Searchable Compressed Edit-Sensitive Parsing

Author: Kishiue Naoya
Maruyama Shirou
Nakahara Masaya
Sakamoto Hiroshi
Publication venue
Publication date: 09/01/2011
Field of study

Practical data structures for the edit-sensitive parsing (ESP) are proposed. Given a string S, its ESP tree is equivalent to a context-free grammar G generating just S, which is represented by a DAG. Using the succinct data structures for trees and permutations, G is decomposed to two LOUDS bit strings and single array in (1+\epsilon)n\log n+4n+o(n) bits for any 0<\epsilon <1 and the number n of variables in G. The time to count occurrences of P in S is in O(\frac{1}{\epsilon}(m\log n+occ_c(\log m\log u)), whereas m = |P|, u = |S|, and occ_c is the number of occurrences of a maximal common subtree in ESPs of P and S. The efficiency of the proposed index is evaluated by the experiments conducted on several benchmarks complying with the other compressed indexes.Comment: 16 pages, 14 figure

arXiv.org e-Print Archive

SymPas: Symbolic Program Slicing

Author: Zhang Yingzhou
Publication venue
Publication date: 13/03/2019
Field of study

Program slicing is a technique for simplifying programs by focusing on selected aspects of their behaviour. Current mainstream static slicing methods operate on the PDG (program dependence graph) or SDG (system dependence graph), but these friendly graph representations may be expensive and error-prone for some users. We attempt in this paper to study a light-weight approach of static program slicing, called Symbolic Program Slicing (SymPas), which works as a dataflow analysis on LLVM (Low-Level Virtual Machine). In our SymPas approach, slices are stored symbolically rather than procedure being re-analysed (cf. procedure summaries). Instead of re-analysing a procedure multiple times to find its slices for each callling context, SymPas calculates a single symbolic (or parameterized) slice which can be instantiated at call sites avoiding re-analysis; it is implemented in LLVM to perform slicing on its intermediate representation (IR). For comparison, we systematically adapt IFDS (Interprocedural Finite Distributive Subset) analysis and the SDG-based slicing method (SDG-IFDS) to statically IR slice programs. Evaluated on open-source and benchmark programs, our backward SymPas shows a factor-of-6 reduction in time cost and a factor-of-4 reduction in space cost, compared to backward SDG-IFDS, thus being more efficient. In addition, the result shows that after studying slices from 66 programs, ranging up to 336,800 IR instructions in size, SymPas is highly size-scalable.Comment: 29 pages, 11 figure

arXiv.org e-Print Archive

Symbolic Computation of the Worst-Case Execution Time of a Program

Author: Ballabriga Clément
Forget Julien
Lipari Giuseppe
Publication venue
Publication date: 27/09/2017
Field of study

Parametric Worst-case execution time (WCET) analysis of a sequential program produces a formula that represents the worst-case execution time of the program, where parameters of the formula are user-defined parameters of the program (as loop bounds, values of inputs or internal variables, etc). In this paper we propose a novel methodology to compute the parametric WCET of a program. Unlike other algorithms in the literature, our method is not based on Integer Linear Programming (ILP). Instead, we follow an approach based on the notion of symbolic computation of WCET formulae. After explaining our methodology and proving its correctness, we present a set of experiments to compare our method against the state of the art. We show that our approach dominates other parametric analyses, and produces results that are very close to those produced by non-parametric ILP-based approaches, while keeping very good computing time

arXiv.org e-Print Archive

Predicting Rankings of Software Verification Competitions

Author: Czech Mike
Hüllermeier Eyke
Jakobs Marie-Christine
Wehrheim Heike
Publication venue
Publication date: 02/03/2017
Field of study

Software verification competitions, such as the annual SV-COMP, evaluate software verification tools with respect to their effectivity and efficiency. Typically, the outcome of a competition is a (possibly category-specific) ranking of the tools. For many applications, such as building portfolio solvers, it would be desirable to have an idea of the (relative) performance of verification tools on a given verification task beforehand, i.e., prior to actually running all tools on the task. In this paper, we present a machine learning approach to predicting rankings of tools on verification tasks. The method builds upon so-called label ranking algorithms, which we complement with appropriate kernels providing a similarity measure for verification tasks. Our kernels employ a graph representation for software source code that mixes elements of control flow and program dependence graphs with abstract syntax trees. Using data sets from SV-COMP, we demonstrate our rank prediction technique to generalize well and achieve a rather high predictive accuracy. In particular, our method outperforms a recently proposed feature-based approach of Demyanova et al. (when applied to rank predictions)

arXiv.org e-Print Archive

On Distributed Online Classification in the Midst of Concept Drifts

Author: Chen Jianshu
Sayed Ali H.
Towfic Zaid J.
Publication venue: 'Elsevier BV'
Publication date: 31/12/2012
Field of study

In this work, we analyze the generalization ability of distributed online learning algorithms under stationary and non-stationary environments. We derive bounds for the excess-risk attained by each node in a connected network of learners and study the performance advantage that diffusion strategies have over individual non-cooperative processing. We conduct extensive simulations to illustrate the results.Comment: 19 pages, 14 figures, to appear in Neurocomputing, 201

arXiv.org e-Print Archive