1,550 research outputs found

    Synthesizing Program Input Grammars

    Full text link
    We present an algorithm for synthesizing a context-free grammar encoding the language of valid program inputs from a set of input examples and blackbox access to the program. Our algorithm addresses shortcomings of existing grammar inference algorithms, which both severely overgeneralize and are prohibitively slow. Our implementation, GLADE, leverages the grammar synthesized by our algorithm to fuzz test programs with structured inputs. We show that GLADE substantially increases the incremental coverage on valid inputs compared to two baseline fuzzers

    On Hilberg's Law and Its Links with Guiraud's Law

    Full text link
    Hilberg (1990) supposed that finite-order excess entropy of a random human text is proportional to the square root of the text length. Assuming that Hilberg's hypothesis is true, we derive Guiraud's law, which states that the number of word types in a text is greater than proportional to the square root of the text length. Our derivation is based on some mathematical conjecture in coding theory and on several experiments suggesting that words can be defined approximately as the nonterminals of the shortest context-free grammar for the text. Such operational definition of words can be applied even to texts deprived of spaces, which do not allow for Mandelbrot's ``intermittent silence'' explanation of Zipf's and Guiraud's laws. In contrast to Mandelbrot's, our model assumes some probabilistic long-memory effects in human narration and might be capable of explaining Menzerath's law.Comment: To appear in Journal of Quantitative Linguistic

    Active Learning of Points-To Specifications

    Full text link
    When analyzing programs, large libraries pose significant challenges to static points-to analysis. A popular solution is to have a human analyst provide points-to specifications that summarize relevant behaviors of library code, which can substantially improve precision and handle missing code such as native code. We propose ATLAS, a tool that automatically infers points-to specifications. ATLAS synthesizes unit tests that exercise the library code, and then infers points-to specifications based on observations from these executions. ATLAS automatically infers specifications for the Java standard library, and produces better results for a client static information flow analysis on a benchmark of 46 Android apps compared to using existing handwritten specifications

    Dynamic Protocol Reverse Engineering a Grammatical Inference Approach

    Get PDF
    Round trip engineering of software from source code and reverse engineering of software from binary files have both been extensively studied and the state-of-practice have documented tools and techniques. Forward engineering of protocols has also been extensively studied and there are firmly established techniques for generating correct protocols. While observation of protocol behavior for performance testing has been studied and techniques established, reverse engineering of protocol control flow from observations of protocol behavior has not received the same level of attention. State-of-practice in reverse engineering the control flow of computer network protocols is comprised of mostly ad hoc approaches. We examine state-of-practice tools and techniques used in three open source projects: Pidgin, Samba, and rdesktop . We examine techniques proposed by computational learning researchers for grammatical inference. We propose to extend the state-of-art by inferring protocol control flow using grammatical inference inspired techniques to reverse engineer automata representations from captured data flows. We present evidence that grammatical inference is applicable to the problem domain under consideration

    Search diversification techniques for grammatical inference

    Get PDF
    Grammatical Inference (GI) addresses the problem of learning a grammar G, from a finite set of strings generated by G. By using GI techniques we want to be able to learn relations between syntactically structured sequences. This process of inferring the target grammar G can easily be posed as a search problem through a lattice of possible solutions. The vast majority of research being carried out in this area focuses on non-monotonic searches, i.e. use the same heuristic function to perform a depth first search into the lattice until a hypothesis is chosen. EDSM and S-EDSM are prime examples of this technique. In this paper we discuss the introduction of diversification into our search space [5]. By introducing diversification through pairwise incompatible merges, we traverse multiple disjoint paths in the search lattice and obtain better results for the inference process.peer-reviewe

    Search diversification techniques for grammatical inference

    Get PDF
    Grammatical Inference (GI) addresses the problem of learning a grammar G, from a finite set of strings generated by G. By using GI techniques we want to be able to learn relations between syntactically structured sequences. This process of inferring the target grammar G can easily be posed as a search problem through a lattice of possible solutions. The vast majority of research being carried out in this area focuses on non-monotonic searches, i.e. use the same heuristic function to perform a depth first search into the lattice until a hypothesis is chosen. EDSM and S-EDSM are prime examples of this technique. In this paper we discuss the introduction of diversification into our search space [5]. By introducing diversification through pairwise incompatible merges, we traverse multiple disjoint paths in the search lattice and obtain better results for the inference process.peer-reviewe
    corecore