1,955 research outputs found

    Flexible constrained sampling with guarantees for pattern mining

    Get PDF
    Pattern sampling has been proposed as a potential solution to the infamous pattern explosion. Instead of enumerating all patterns that satisfy the constraints, individual patterns are sampled proportional to a given quality measure. Several sampling algorithms have been proposed, but each of them has its limitations when it comes to 1) flexibility in terms of quality measures and constraints that can be used, and/or 2) guarantees with respect to sampling accuracy. We therefore present Flexics, the first flexible pattern sampler that supports a broad class of quality measures and constraints, while providing strong guarantees regarding sampling accuracy. To achieve this, we leverage the perspective on pattern mining as a constraint satisfaction problem and build upon the latest advances in sampling solutions in SAT as well as existing pattern mining algorithms. Furthermore, the proposed algorithm is applicable to a variety of pattern languages, which allows us to introduce and tackle the novel task of sampling sets of patterns. We introduce and empirically evaluate two variants of Flexics: 1) a generic variant that addresses the well-known itemset sampling task and the novel pattern set sampling task as well as a wide range of expressive constraints within these tasks, and 2) a specialized variant that exploits existing frequent itemset techniques to achieve substantial speed-ups. Experiments show that Flexics is both accurate and efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal (ECML/PKDD 2017 journal track

    Light On String Solving: Approaches to Efficiently and Correctly Solving String Constraints

    Get PDF
    Widespread use of string solvers in formal analysis of string-heavy programs has led to a growing demand for more efficient and reliable techniques which can be applied in this context, especially for real-world cases. Designing an algorithm for the (generally undecidable) satisfiability problem for systems of string constraints requires a thorough understanding of the structure of constraints present in the targeted cases. We target the aforementioned case in different perspectives: We present an algorithm which works by reformulating the satisfiability of bounded word equations as a reachability problem for non-deterministic finite automata. Secondly, we present a transformation-system-based technique to solving string constraints. Thirdly, we investigate benchmarks presented in the literature containing regular expression membership predicates and design a decission procedure for a PSPACE-complete sub-theory. Additionally, we introduce a new benchmarking framework for string solvers and use it to showcase the power of our algorithms via an extensive empirical evaluation over a diverse set of benchmarks

    Criticality and Universality in the Unit-Propagation Search Rule

    Full text link
    The probability Psuccess(alpha, N) that stochastic greedy algorithms successfully solve the random SATisfiability problem is studied as a function of the ratio alpha of constraints per variable and the number N of variables. These algorithms assign variables according to the unit-propagation (UP) rule in presence of constraints involving a unique variable (1-clauses), to some heuristic (H) prescription otherwise. In the infinite N limit, Psuccess vanishes at some critical ratio alpha\_H which depends on the heuristic H. We show that the critical behaviour is determined by the UP rule only. In the case where only constraints with 2 and 3 variables are present, we give the phase diagram and identify two universality classes: the power law class, where Psuccess[alpha\_H (1+epsilon N^{-1/3}), N] ~ A(epsilon)/N^gamma; the stretched exponential class, where Psuccess[alpha\_H (1+epsilon N^{-1/3}), N] ~ exp[-N^{1/6} Phi(epsilon)]. Which class is selected depends on the characteristic parameters of input data. The critical exponent gamma is universal and calculated; the scaling functions A and Phi weakly depend on the heuristic H and are obtained from the solutions of reaction-diffusion equations for 1-clauses. Computation of some non-universal corrections allows us to match numerical results with good precision. The critical behaviour for constraints with >3 variables is given. Our results are interpreted in terms of dynamical graph percolation and we argue that they should apply to more general situations where UP is used.Comment: 30 pages, 13 figure

    On abstraction refinement for program analyses in Datalog

    Get PDF
    A central task for a program analysis concerns how to efficiently find a program abstraction that keeps only information relevant for proving properties of interest. We present a new approach for finding such abstractions for program analyses written in Datalog. Our approach is based on counterexample-guided abstraction refinement: when a Datalog analysis run fails using an abstraction, it seeks to generalize the cause of the failure to other abstractions, and pick a new abstraction that avoids a similar failure. Our solution uses a boolean satisfiability formulation that is general, complete, and optimal: it is independent of the Datalog solver, it generalizes the failure of an abstraction to as many other abstractions as possible, and it identifies the cheapest refined abstraction to try next. We show the performance of our approach on a pointer analysis and a typestate analysis, on eight real-world Java benchmark programs
    • …
    corecore