4 research outputs found
Automated Derivation of Random Generators for Algebraic Data Types
Many testing techniques such as generational fuzzing or random property-based testing require the existence of some sort of random generation process for the values used as test inputs. Implementing such generators is usually a task left to end-users, who do their best to come up with somewhat sensible implementations after several iterations of trial and error. This necessary effort is of no surprise, implementing good random data generators is a hard task. It requires deep knowledge about both the domain of the data being generated, as well as the behavior of the stochastic process generating such data. In addition, when the data we want to generate has a large number of possible variations, this process is not only intricate, but also very cumbersome. To mitigate this issues, this thesis explores different ideas for automatically deriving random generators based on existing static information. In this light, we design and implement different derivation algorithms in Haskell for obtaining random generators of values encoded using Algebraic Data Types (ADTs). Although there exists other tools designed directly or indirectly for this very purpose, they are not without disadvantages. In particular, we aim to tackle the lack of flexibility and static guarantees in the distribution induced by derived generators. We show how automatically derived generators for ADTs can be framed using a simple yet powerful stochastic model. This models can be used to obtain analytical guarantees about the distribution of values produced by the derived generators. This, in consequence, can be used to optimize the stochastic generation parameters of the derived generators towards target distributions set by the user, providing more flexible derivation mechanisms
On the Termination Problem for Probabilistic Higher-Order Recursive Programs
In the last two decades, there has been much progress on model checking of
both probabilistic systems and higher-order programs. In spite of the emergence
of higher-order probabilistic programming languages, not much has been done to
combine those two approaches. In this paper, we initiate a study on the
probabilistic higher-order model checking problem, by giving some first
theoretical and experimental results. As a first step towards our goal, we
introduce PHORS, a probabilistic extension of higher-order recursion schemes
(HORS), as a model of probabilistic higher-order programs. The model of PHORS
may alternatively be viewed as a higher-order extension of recursive Markov
chains. We then investigate the probabilistic termination problem -- or,
equivalently, the probabilistic reachability problem. We prove that almost sure
termination of order-2 PHORS is undecidable. We also provide a fixpoint
characterization of the termination probability of PHORS, and develop a sound
(but possibly incomplete) procedure for approximately computing the termination
probability. We have implemented the procedure for order-2 PHORSs, and
confirmed that the procedure works well through preliminary experiments that
are reported at the end of the article
Branching Processes for QuickCheck Generators
In QuickCheck (or, more generally, random testing), it is challenging to control random data generators\u27 distributions---specially when it comes to user-defined algebraic data types (ADT). In this paper, we adapt results from an area of mathematics known as branching processes, and show how they help to analytically predict (at compile-time) the expected number of generated constructors, even in the presence of mutually recursive or composite ADTs. Using our probabilistic formulas, we design heuristics capable of automatically adjusting probabilities in order to synthesize generators which distributions are aligned with users\u27 demands. We provide a Haskell implementation of our mechanism in a tool called DRaGeN and perform case studies with real-world applications. When generating random values, our synthesized QuickCheck generators show improvements in code coverage when compared with those automatically derived by state-of-the-art tools