175 research outputs found
Mining patterns of unsatisfiable constraints to detect infeasible paths
Detection of infeasible paths is required in many areas including test coverage analysis, test case generation, security vulnerability analysis, etc. Existing approaches typically use static analysis coupled with symbolic evaluation, heuristics, or path-pattern analysis. This paper is related to these approaches but with a different objective. It is to analyze code of real systems to build patterns of unsatisfiable constraints in infeasible paths. The resulting patterns can be used to detect infeasible paths without the use of constraint solver and evaluation of function calls involved, thus improving scalability. The patterns can be built gradually. Evaluation of the proposed approach shows promising results
Symbolic execution of verification languages and floating-point code
The focus of this thesis is a program analysis technique named symbolic
execution. We present three main contributions to this field.
First, an investigation into comparing several state-of-the-art program
analysis tools at the level of an intermediate verification language over a
large set of benchmarks, and improvements to the state-of-the-art of symbolic
execution for this language. This is explored via a new tool, Symbooglix, that
operates on the Boogie intermediate verification language.
Second, an investigation into performing symbolic execution of floating-point
programs via a standardised theory of floating-point arithmetic that is
supported by several existing constraint solvers. This is investigated via two
independent extensions of the KLEE symbolic execution engine to support
reasoning about floating-point operations (with one tool developed by the
thesis author).
Third, an investigation into the use of coverage-guided fuzzing as a means for
solving constraints over finite data types, inspired by the difficulties
associated with solving floating-point constraints. The associated prototype
tool, JFS, which builds on the LibFuzzer project, can at present be applied to
a wide range of SMT queries over bit-vector and floating-point variables, and
shows promise on floating-point constraints.Open Acces
KNOWLEDGE REPRESENTATION AND INFERENCE FOR ANALYSIS AND DESIGN OF DATABASES AND TABULAR RULE-BASED SYSTEMS
Rulc-based Systems constitute a powerful tool for speciftcation of knowledge in design and implementation of knowledge-based Systems. They provide also a universal programming paradigm for domains such as intelligent control, decision support, situation classification and opcrational knowledge encoding. In order to assure safe and reliable performance, such Systems should satisfy certain format reąuirements, including completeness and consistency. This paper addresses the issue of analysis and verification of selected properties of a class of such Systems in a systematic way. A uniform, tabular scheme of single-levcl rule-bascd Systems is considered. Such systcms can be applied as a generalized form of databases for speciftcation of data pattems (unconditional knowledge), or can be used for deftning attributive decision tables (conditional knowledge in form of rules). They can also serve as lower-level componcnts of a hierarchical, multi-lcvcl control and decision support knowledge-based systcms. An algebraic knowledge rcprescntation paradigm using extcnded tabular rcprcsentation, similar to relational databasc tables is prcsentcd and algebraic bascs for system analysis, vcrification and design support arc outlined
OWL-Miner: Concept Induction in OWL Knowledge Bases
The Resource Description Framework (RDF) and Web Ontology
Language (OWL)
have been widely used in recent years, and automated methods for
the analysis of
data and knowledge directly within these formalisms are of
current interest. Concept
induction is a technique for discovering descriptions of data,
such as inducing OWL
class expressions to describe RDF data. These class expressions
capture patterns in
the data which can be used to characterise interesting clusters
or to act as classifica-
tion rules over unseen data. The semantics of OWL is underpinned
by Description
Logics (DLs), a family of expressive and decidable fragments of
first-order logic.
Recently, methods of concept induction which are well studied in
the field of
Inductive Logic Programming have been applied to the related
formalism of DLs.
These methods have been developed for a number of purposes
including unsuper-
vised clustering and supervised classification. Refinement-based
search is a concept
induction technique which structures the search space of DL
concept/OWL class
expressions and progressively generalises or specialises
candidate concepts to cover
example data as guided by quality criteria such as accuracy.
However, the current
state-of-the-art in this area is limited in that such methods:
were not primarily de-
signed to scale over large RDF/OWL knowledge bases; do not
support class lan-
guages as expressive as OWL2-DL; or, are limited to one purpose,
such as learning
OWL classes for integration into ontologies. Our work addresses
these limitations
by increasing the efficiency of these learning methods whilst
permitting a concept
language up to the expressivity of OWL2-DL classes. We describe
methods which
support both classification (predictive induction) and subgroup
discovery (descrip-
tive induction), which, in this context, are fundamentally
related.
We have implemented our methods as the system called OWL-Miner
and show
by evaluation that our methods outperform state-of-the-art
systems for DL learning
in both the quality of solutions found and the speed in which
they are computed.
Furthermore, we achieve the best ever ten-fold cross validation
accuracy results on
the long-standing benchmark problem of carcinogenesis. Finally,
we present a case
study on ongoing work in the application of OWL-Miner to a
real-world problem
directed at improving the efficiency of biological macromolecular
crystallisation
Feedback driven adaptive combinatorial testing
The configuration spaces of modern software systems are too large to test exhaustively. Combinatorial interaction testing (CIT) approaches, such as covering arrays, systematically sample the configuration space and test only the selected configurations. The basic justification for CIT approaches is that they can cost-effectively exercise all system behaviors caused by the settings of t or fewer options. We conjecture, however, that in practice many such behaviors are not actually tested because of masking effects – failures that perturb execution so as to prevent some behaviors from being exercised. In this work we present a feedback-driven, adaptive, combinatorial testing approach aimed at detecting and working around masking effects. At each iteration we detect potential masking effects, heuristically isolate their likely causes, and then generate new covering arrays that allow previously masked combinations to be tested in the subsequent iteration. We empirically assess the effectiveness of the proposed approach on two large widely used open source software systems. Our results suggest that masking effects do exist and that our approach provides a promising and efficient way to work around them
AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection
The rapid progress of modern computing systems has led to a growing interest
in informative run-time logs. Various log-based anomaly detection techniques
have been proposed to ensure software reliability. However, their
implementation in the industry has been limited due to the lack of high-quality
public log resources as training datasets.
While some log datasets are available for anomaly detection, they suffer from
limitations in (1) comprehensiveness of log events; (2) scalability over
diverse systems; and (3) flexibility of log utility. To address these
limitations, we propose AutoLog, the first automated log generation methodology
for anomaly detection. AutoLog uses program analysis to generate run-time log
sequences without actually running the system. AutoLog starts with probing
comprehensive logging statements associated with the call graphs of an
application. Then, it constructs execution graphs for each method after pruning
the call graphs to find log-related execution paths in a scalable manner.
Finally, AutoLog propagates the anomaly label to each acquired execution path
based on human knowledge. It generates flexible log sequences by walking along
the log execution paths with controllable parameters. Experiments on 50 popular
Java projects show that AutoLog acquires significantly more (9x-58x) log events
than existing log datasets from the same system, and generates log messages
much faster (15x) with a single machine than existing passive data collection
approaches. We hope AutoLog can facilitate the benchmarking and adoption of
automated log analysis techniques.Comment: The paper has been accepted by ASE 2023 (Research Track
Automated streamliner portfolios for constraint satisfaction problems
Funding: This work is supported by the EPSRC grants EP/P015638/1 and EP/P026842/1, and Nguyen Dang is a Leverhulme Early Career Fellow. We used the Cirrus UK National Tier-2 HPC Service at EPCC (http://www.cirrus.ac.uk) funded by the University of Edinburgh and EPSRC (EP/P020267/1).Constraint Programming (CP) is a powerful technique for solving large-scale combinatorial problems. Solving a problem proceeds in two distinct phases: modelling and solving. Effective modelling has a huge impact on the performance of the solving process. Even with the advance of modern automated modelling tools, search spaces involved can be so vast that problems can still be difficult to solve. To further constrain the model, a more aggressive step that can be taken is the addition of streamliner constraints, which are not guaranteed to be sound but are designed to focus effort on a highly restricted but promising portion of the search space. Previously, producing effective streamlined models was a manual, difficult and time-consuming task. This paper presents a completely automated process to the generation, search and selection of streamliner portfolios to produce a substantial reduction in search effort across a diverse range of problems. The results demonstrate a marked improvement in performance for both Chuffed, a CP solver with clause learning, and lingeling, a modern SAT solver.Publisher PDFPeer reviewe
Solving hard subgraph problems in parallel
This thesis improves the state of the art in exact, practical algorithms for finding subgraphs. We study maximum clique, subgraph isomorphism, and maximum common subgraph problems. These are widely applicable: within computing science, subgraph problems arise in document clustering, computer vision, the design of communication protocols, model checking, compiler code generation, malware detection, cryptography, and robotics; beyond, applications occur in biochemistry, electrical engineering, mathematics, law enforcement, fraud detection, fault diagnosis, manufacturing, and sociology. We therefore consider both the ``pure'' forms of these problems, and variants with labels and other domain-specific constraints.
Although subgraph-finding should theoretically be hard, the constraint-based search algorithms we discuss can easily solve real-world instances involving graphs with thousands of vertices, and millions of edges. We therefore ask: is it possible to generate ``really hard'' instances for these problems, and if so, what can we learn? By extending research into combinatorial phase transition phenomena, we develop a better understanding of branching heuristics, as well as highlighting a serious flaw in the design of graph database systems.
This thesis also demonstrates how to exploit two of the kinds of parallelism offered by current computer hardware. Bit parallelism allows us to carry out operations on whole sets of vertices in a single instruction---this is largely routine. Thread parallelism, to make use of the multiple cores offered by all modern processors, is more complex. We suggest three desirable performance characteristics that we would like when introducing thread parallelism: lack of risk (parallel cannot be exponentially slower than sequential), scalability (adding more processing cores cannot make runtimes worse), and reproducibility (the same instance on the same hardware will take roughly
the same time every time it is run). We then detail the difficulties in guaranteeing these characteristics when using modern algorithmic techniques.
Besides ensuring that parallelism cannot make things worse, we also increase the likelihood of it making things better. We compare randomised work stealing to new tailored strategies, and perform experiments to identify the factors contributing to good speedups. We show that whilst load balancing is difficult, the primary factor influencing the results is the interaction between branching heuristics and parallelism. By using parallelism to explicitly offset the commitment made to weak early branching choices, we obtain parallel subgraph solvers which are substantially and consistently better than the best sequential algorithms
- …