1,550 research outputs found
Synthesizing Program Input Grammars
We present an algorithm for synthesizing a context-free grammar encoding the
language of valid program inputs from a set of input examples and blackbox
access to the program. Our algorithm addresses shortcomings of existing grammar
inference algorithms, which both severely overgeneralize and are prohibitively
slow. Our implementation, GLADE, leverages the grammar synthesized by our
algorithm to fuzz test programs with structured inputs. We show that GLADE
substantially increases the incremental coverage on valid inputs compared to
two baseline fuzzers
On Hilberg's Law and Its Links with Guiraud's Law
Hilberg (1990) supposed that finite-order excess entropy of a random human
text is proportional to the square root of the text length. Assuming that
Hilberg's hypothesis is true, we derive Guiraud's law, which states that the
number of word types in a text is greater than proportional to the square root
of the text length. Our derivation is based on some mathematical conjecture in
coding theory and on several experiments suggesting that words can be defined
approximately as the nonterminals of the shortest context-free grammar for the
text. Such operational definition of words can be applied even to texts
deprived of spaces, which do not allow for Mandelbrot's ``intermittent
silence'' explanation of Zipf's and Guiraud's laws. In contrast to
Mandelbrot's, our model assumes some probabilistic long-memory effects in human
narration and might be capable of explaining Menzerath's law.Comment: To appear in Journal of Quantitative Linguistic
Active Learning of Points-To Specifications
When analyzing programs, large libraries pose significant challenges to
static points-to analysis. A popular solution is to have a human analyst
provide points-to specifications that summarize relevant behaviors of library
code, which can substantially improve precision and handle missing code such as
native code. We propose ATLAS, a tool that automatically infers points-to
specifications. ATLAS synthesizes unit tests that exercise the library code,
and then infers points-to specifications based on observations from these
executions. ATLAS automatically infers specifications for the Java standard
library, and produces better results for a client static information flow
analysis on a benchmark of 46 Android apps compared to using existing
handwritten specifications
Dynamic Protocol Reverse Engineering a Grammatical Inference Approach
Round trip engineering of software from source code and reverse engineering of software from binary files have both been extensively studied and the state-of-practice have documented tools and techniques. Forward engineering of protocols has also been extensively studied and there are firmly established techniques for generating correct protocols. While observation of protocol behavior for performance testing has been studied and techniques established, reverse engineering of protocol control flow from observations of protocol behavior has not received the same level of attention. State-of-practice in reverse engineering the control flow of computer network protocols is comprised of mostly ad hoc approaches. We examine state-of-practice tools and techniques used in three open source projects: Pidgin, Samba, and rdesktop . We examine techniques proposed by computational learning researchers for grammatical inference. We propose to extend the state-of-art by inferring protocol control flow using grammatical inference inspired techniques to reverse engineer automata representations from captured data flows. We present evidence that grammatical inference is applicable to the problem domain under consideration
Search diversification techniques for grammatical inference
Grammatical Inference (GI) addresses the problem of learning a grammar G, from a finite set of strings generated by G. By using GI techniques we want to be able to learn relations between syntactically structured sequences. This process of inferring the target grammar G can easily be posed as a search problem through a lattice of possible solutions. The vast majority of research being carried out in this area focuses on non-monotonic searches, i.e. use the same heuristic function to perform a depth first search into the lattice until a hypothesis is chosen. EDSM and S-EDSM are prime examples of this technique. In this paper we discuss the introduction of diversification into our search space [5]. By introducing diversification through pairwise incompatible merges, we traverse multiple disjoint paths in the search lattice and obtain better results for the inference process.peer-reviewe
Search diversification techniques for grammatical inference
Grammatical Inference (GI) addresses the problem of learning a grammar G, from a finite set of strings generated by G. By using GI techniques we want to be able to learn relations between syntactically structured sequences. This process of inferring the target grammar G can easily be posed as a search problem through a lattice of possible solutions. The vast majority of research being carried out in this area focuses on non-monotonic searches, i.e. use the same heuristic function to perform a depth first search into the lattice until a hypothesis is chosen. EDSM and S-EDSM are prime examples of this technique. In this paper we discuss the introduction of diversification into our search space [5]. By introducing diversification through pairwise incompatible merges, we traverse multiple disjoint paths in the search lattice and obtain better results for the inference process.peer-reviewe
- …