34 research outputs found

    Using deep learning to construct stochastic local search SAT solvers with performance bounds

    Full text link
    The Boolean Satisfiability problem (SAT) is the most prototypical NP-complete problem and of great practical relevance. One important class of solvers for this problem are stochastic local search (SLS) algorithms that iteratively and randomly update a candidate assignment. Recent breakthrough results in theoretical computer science have established sufficient conditions under which SLS solvers are guaranteed to efficiently solve a SAT instance, provided they have access to suitable "oracles" that provide samples from an instance-specific distribution, exploiting an instance's local structure. Motivated by these results and the well established ability of neural networks to learn common structure in large datasets, in this work, we train oracles using Graph Neural Networks and evaluate them on two SLS solvers on random SAT instances of varying difficulty. We find that access to GNN-based oracles significantly boosts the performance of both solvers, allowing them, on average, to solve 17% more difficult instances (as measured by the ratio between clauses and variables), and to do so in 35% fewer steps, with improvements in the median number of steps of up to a factor of 8. As such, this work bridges formal results from theoretical computer science and practically motivated research on deep learning for constraint satisfaction problems and establishes the promise of purpose-trained SAT solvers with performance guarantees.Comment: 15 pages, 9 figures, code available at https://github.com/porscheofficial/sls_sat_solving_with_deep_learnin

    Set-theoretic duality: A fundamental feature of combinatorial optimisation

    Get PDF
    The duality between conflicts and diagnoses in the field of diagnosis, or between plans and landmarks in the field of planning, or between unsatisfiable cores and minimal co-satisfiable sets in SAT or CSP solving, has been known for many years. Recent wo

    Efficient local search for Pseudo Boolean Optimization

    Get PDF
    Algorithms and the Foundations of Software technolog

    Partitionnement d’instances de processus basé sur les techniques de conformité de modèles

    Get PDF
    As event data becomes an ubiquitous source of information, data science techniques represent an unprecedented opportunity to analyze and react to the processes that generate this data. Process Mining is an emerging field that bridges the gap between traditional data analysis techniques, like Data Mining, and Business Process Management. One core value of Process Mining is the discovery of formal process models like Petri nets or BPMN models which attempt to make sense of the events recorded in logs. Due to the complexity of event data, automated process discovery algorithms tend to create dense process models which are hard to interpret by humans. Fortunately, Conformance Checking, a sub-field of Process Mining, enables relating observed and modeled behavior, so that humans can map these two pieces of process information. Conformance checking is possible through alignment artefacts, which associate process models and event logs. Different types of alignment artefacts exist, namely alignments, multi-alignments and anti-alignments. Currently, only alignment artefacts are deeply addressed in the literature. It allows to relate the process model to a given process instance. However, because many behaviors exist in logs, identifying an alignment per process instance hinders the readability of the log-to-model relationships.The present thesis proposes to exploit the conformance checking artefacts for clustering the process executions recorded in event logs, thereby extracting a restrictive number of modeled representatives. Data clustering is a common method for extracting information from dense and complex data. By grouping objects by similarities into clusters, data clustering enables to mine simpler datasets which embrace the similarities and the differences contained in data. Using the conformance checking artefacts in a clustering approach allows to consider a reliable process model as a baseline for grouping the process instances. Hence, the discovered clusters are associated with modeled artefacts, that we call model-based trace variants, which provides opportune log-to-model explanations.From this motivation, we have elaborated a set of methods for computing conformance checking artefacts. The first contribution is the computation of a unique modeled behavior that represents of a set of process instances, namely multi-alignment. Then, we propose several alignment-based clustering approaches which provide clusters of process instances associated to a modeled artefact. Finally, we highlight the interest of anti-alignment for extracting deviations of process models with respect to the log. This latter artefact enables to estimate model precision, and we show its impact in model-based clustering. We provide SAT encoding for all the proposed techniques. Heuristic algorithms are then added to deal with computing capacity of today’s computers, at the expense of loosing optimality.Les données d'événements devenant une source d'information omniprésente, les techniques d'analyse de données représentent une opportunité sans précédent pour étudier et réagir aux processus qui génèrent ces données. Le Process Mining est un domaine émergent qui comble le fossé entre les techniques d'analyse de données, comme le Data Mining, et les techniques de management des entreprises, à savoir, le Business Process Management. L'une des bases fondamentales du Process Mining est la découverte de modèles de processus formels tels que les réseaux de Petri ou les modèles BPMN qui tentent de donner un sens aux événements enregistrés dans les journaux. En raison de la complexité des données d'événements, les algorithmes de découverte de processus ont tendance à créer des modèles de processus denses, qui sont difficiles à interpréter par les humains. Heureusement, la Vérification de Conformité, un sous-domaine du Process Mining, permet d'établir des liens entre le comportement observé et le comportement modélisé, facilitant ainsi la compréhension des correspondance entre ces deux éléments d'information sur les processus. La Vérification de Conformité est possible grâce aux artefacts d'alignement, qui associent les modèles de processus et les journaux d'événements. Il existe différents types d'artefacts d'alignement, à savoir les alignements, les multi-alignements et les anti-alignements. Actuellement, seuls les alignements sont traités en profondeur dans la littérature scientifique. Un alignement permet de relier le modèle de processus à une instance de processus donnée. Cependant, étant donné que de nombreux comportements existent dans les logs, l'identification d'un alignement par instance de processus nuit à la lisibilité des relations log-modèle.La présente thèse propose d'exploiter les artefacts de conformité pour regrouper les exécutions de processus enregistrées dans les journaux d'événements, et ainsi extraire un nombre restrictif de représentations modélisées. Le regroupement de données, communément appelé partitionnement, est une méthode courante pour extraire l'information de données denses et complexes. En regroupant les objets par similarité dans des clusters, le partitionnement permet d'extraire des ensembles de données plus simples qui englobent les similarités et les différences contenues dans les données. L'utilisation des artefacts de conformité dans une approche de partitionnement permet de considérer un modèle de processus fiable comme une base de référence pour le regroupement des instances de processus. Ainsi, les clusters découverts sont associés à des artefacts modélisés, que nous appelons variantes modélisées des traces, ce qui fournit des explications opportunes sur les relations entre le journal et le modèle.Avec cette motivation, nous avons élaboré un ensemble de méthodes pour calculer les artefacts de conformité. La première contribution est le calcul d'un comportement modélisé unique qui représente un ensemble d'instances de processus, à savoir le multi-alignement. Ensuite, nous proposons plusieurs approches de partitionnement basées sur l'alignement qui fournissent des clusters d'instances de processus associés à un artefact modélisé. Enfin, nous soulignons l'intérêt de l'anti-alignement pour extraire les déviations des modèles de processus par rapport au journal. Ce dernier artefact permet d'estimer la précision du modèle. Nous montrons son impact sur nos approches de partitionnement basées sur des modèles. Nous fournissons un encodage SAT pour toutes les techniques proposées. Des heuristiques sont ensuite ajoutées pour tenir compte de la capacité de calcul des ordinateurs actuels, au prix d'une perte d'optimalité

    Improving data quality : data consistency, deduplication, currency and accuracy

    Get PDF
    Data quality is one of the key problems in data management. An unprecedented amount of data has been accumulated and has become a valuable asset of an organization. The value of the data relies greatly on its quality. However, data is often dirty in real life. It may be inconsistent, duplicated, stale, inaccurate or incomplete, which can reduce its usability and increase the cost of businesses. Consequently the need for improving data quality arises, which comprises of five central issues of improving data quality, namely, data consistency, data deduplication, data currency, data accuracy and information completeness. This thesis presents the results of our work on the first four issues with regards to data consistency, deduplication, currency and accuracy. The first part of the thesis investigates incremental verifications of data consistencies in distributed data. Given a distributed database D, a set S of conditional functional dependencies (CFDs), the set V of violations of the CFDs in D, and updates ΔD to D, it is to find, with minimum data shipment, changes ΔV to V in response to ΔD. Although the problems are intractable, we show that they are bounded: there exist algorithms to detect errors such that their computational cost and data shipment are both linear in the size of ΔD and ΔV, independent of the size of the database D. Such incremental algorithms are provided for both vertically and horizontally partitioned data, and we show that the algorithms are optimal. The second part of the thesis studies the interaction between record matching and data repairing. Record matching, the main technique underlying data deduplication, aims to identify tuples that refer to the same real-world object, and repairing is to make a database consistent by fixing errors in the data using constraints. These are treated as separate processes in most data cleaning systems, based on heuristic solutions. However, our studies show that repairing can effectively help us identify matches, and vice versa. To capture the interaction, a uniform framework that seamlessly unifies repairing and matching operations is proposed to clean a database based on integrity constraints, matching rules and master data. The third part of the thesis presents our study of finding certain fixes that are absolutely correct for data repairing. Data repairing methods based on integrity constraints are normally heuristic, and they may not find certain fixes. Worse still, they may even introduce new errors when attempting to repair the data, which may not work well when repairing critical data such as medical records, in which a seemingly minor error often has disastrous consequences. We propose a framework and an algorithm to find certain fixes, based on master data, a class of editing rules and user interactions. A prototype system is also developed. The fourth part of the thesis introduces inferring data currency and consistency for conflict resolution, where data currency aims to identify the current values of entities, and conflict resolution is to combine tuples that pertain to the same real-world entity into a single tuple and resolve conflicts, which is also an important issue for data deduplication. We show that data currency and consistency help each other in resolving conflicts. We study a number of associated fundamental problems, and develop an approach for conflict resolution by inferring data currency and consistency. The last part of the thesis reports our study of data accuracy on the longstanding relative accuracy problem which is to determine, given tuples t1 and t2 that refer to the same entity e, whether t1[A] is more accurate than t2[A], i.e., t1[A] is closer to the true value of the A attribute of e than t2[A]. We introduce a class of accuracy rules and an inference system with a chase procedure to deduce relative accuracy, and the related fundamental problems are studied. We also propose a framework and algorithms for inferring accurate values with users’ interaction

    A SAT approach to branchwidth

    Get PDF
    Branch decomposition is a prominent method for structurally decomposing a graph, a hypergraph, or a propositional formula in conjunctive normal form. The width of a branch decomposition provides a measure of how well the object is decomposed. For many applications, it is crucial to computing a branch decomposition whose width is as small as possible. We propose an approach based on Boolean Satisfiability (SAT) to finding branch decompositions of small width. The core of our approach is an efficient SAT encoding that determines with a single SAT-call whether a given hypergraph admits a branch decomposition of a certain width. For our encoding, we propose a natural partition-based characterization of branch decompositions. The encoding size imposes a limit on the size of the given hypergraph. To break through this barrier and to scale the SAT approach to larger instances, we develop a new heuristic approach where the SAT encoding is used to locally improve a given candidate decomposition until a fixed-point is reached. This new SAT-based local improvement method scales now to instances with several thousands of vertices and edges
    corecore