1,955 research outputs found
Flexible constrained sampling with guarantees for pattern mining
Pattern sampling has been proposed as a potential solution to the infamous
pattern explosion. Instead of enumerating all patterns that satisfy the
constraints, individual patterns are sampled proportional to a given quality
measure. Several sampling algorithms have been proposed, but each of them has
its limitations when it comes to 1) flexibility in terms of quality measures
and constraints that can be used, and/or 2) guarantees with respect to sampling
accuracy. We therefore present Flexics, the first flexible pattern sampler that
supports a broad class of quality measures and constraints, while providing
strong guarantees regarding sampling accuracy. To achieve this, we leverage the
perspective on pattern mining as a constraint satisfaction problem and build
upon the latest advances in sampling solutions in SAT as well as existing
pattern mining algorithms. Furthermore, the proposed algorithm is applicable to
a variety of pattern languages, which allows us to introduce and tackle the
novel task of sampling sets of patterns. We introduce and empirically evaluate
two variants of Flexics: 1) a generic variant that addresses the well-known
itemset sampling task and the novel pattern set sampling task as well as a wide
range of expressive constraints within these tasks, and 2) a specialized
variant that exploits existing frequent itemset techniques to achieve
substantial speed-ups. Experiments show that Flexics is both accurate and
efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal
(ECML/PKDD 2017 journal track
Light On String Solving: Approaches to Efficiently and Correctly Solving String Constraints
Widespread use of string solvers in formal analysis of string-heavy programs has led to a growing demand for more efficient and reliable techniques which can be applied in this context, especially for real-world cases. Designing an algorithm for the (generally undecidable) satisfiability problem for systems of string constraints requires a thorough understanding of the structure of constraints present in the targeted cases. We target the aforementioned case in different perspectives: We present an algorithm which works by reformulating the satisfiability of bounded word equations as a reachability problem for non-deterministic finite automata. Secondly, we present a transformation-system-based technique to solving string constraints. Thirdly, we investigate benchmarks presented in the literature containing regular expression membership predicates and design a decission procedure for a PSPACE-complete sub-theory. Additionally, we introduce a new benchmarking framework for string solvers and use it to showcase the power of our algorithms via an extensive empirical evaluation over a diverse set of benchmarks
Criticality and Universality in the Unit-Propagation Search Rule
The probability Psuccess(alpha, N) that stochastic greedy algorithms
successfully solve the random SATisfiability problem is studied as a function
of the ratio alpha of constraints per variable and the number N of variables.
These algorithms assign variables according to the unit-propagation (UP) rule
in presence of constraints involving a unique variable (1-clauses), to some
heuristic (H) prescription otherwise. In the infinite N limit, Psuccess
vanishes at some critical ratio alpha\_H which depends on the heuristic H. We
show that the critical behaviour is determined by the UP rule only. In the case
where only constraints with 2 and 3 variables are present, we give the phase
diagram and identify two universality classes: the power law class, where
Psuccess[alpha\_H (1+epsilon N^{-1/3}), N] ~ A(epsilon)/N^gamma; the stretched
exponential class, where Psuccess[alpha\_H (1+epsilon N^{-1/3}), N] ~
exp[-N^{1/6} Phi(epsilon)]. Which class is selected depends on the
characteristic parameters of input data. The critical exponent gamma is
universal and calculated; the scaling functions A and Phi weakly depend on the
heuristic H and are obtained from the solutions of reaction-diffusion equations
for 1-clauses. Computation of some non-universal corrections allows us to match
numerical results with good precision. The critical behaviour for constraints
with >3 variables is given. Our results are interpreted in terms of dynamical
graph percolation and we argue that they should apply to more general
situations where UP is used.Comment: 30 pages, 13 figure
On abstraction refinement for program analyses in Datalog
A central task for a program analysis concerns how to efficiently find a program abstraction that keeps only information relevant for proving properties of interest. We present a new approach for finding such abstractions for program analyses written in Datalog. Our approach is based on counterexample-guided abstraction refinement: when a Datalog analysis run fails using an abstraction, it seeks to generalize the cause of the failure to other abstractions, and pick a new abstraction that avoids a similar failure. Our solution uses a boolean satisfiability formulation that is general, complete, and optimal: it is independent of the Datalog solver, it generalizes the failure of an abstraction to as many other abstractions as possible, and it identifies the cheapest refined abstraction to try next. We show the performance of our approach on a pointer analysis and a typestate analysis, on eight real-world Java benchmark programs
- …