51 research outputs found

    The Efficient Discovery of Interesting Closed Pattern Collections

    Get PDF
    Enumerating closed sets that are frequent in a given database is a fundamental data mining technique that is used, e.g., in the context of market basket analysis, fraud detection, or Web personalization. There are two complementing reasons for the importance of closed sets---one semantical and one algorithmic: closed sets provide a condensed basis for non-redundant collections of interesting local patterns, and they can be enumerated efficiently. For many databases, however, even the closed set collection can be way too large for further usage and correspondingly its computation time can be infeasibly long. In such cases, it is inevitable to focus on smaller collections of closed sets, and it is essential that these collections retain both: controlled semantics reflecting some notion of interestingness as well as efficient enumerability. This thesis discusses three different approaches to achieve this: constraint-based closed set extraction, pruning by quantifying the degree or strength of closedness, and controlled random generation of closed sets instead of exhaustive enumeration. For the original closed set family, efficient enumerability results from the fact that there is an inducing efficiently computable closure operator and that its fixpoints can be enumerated by an amortized polynomial number of closure computations. Perhaps surprisingly, it turns out that this connection does not generally hold for other constraint combinations, as the restricted domains induced by additional constraints can cause two things to happen: the fixpoints of the closure operator cannot be enumerated efficiently or an inducing closure operator does not even exist. This thesis gives, for the first time, a formal axiomatic characterization of constraint classes that allow to efficiently enumerate fixpoints of arbitrary closure operators as well as of constraint classes that guarantee the existence of a closure operator inducing the closed sets. As a complementary approach, the thesis generalizes the notion of closedness by quantifying its strength, i.e., the difference in supporting database records between a closed set and all its supersets. This gives rise to a measure of interestingness that is able to select long and thus particularly informative closed sets that are robust against noise and dynamic changes. Moreover, this measure is algorithmically sound because all closed sets with a minimum strength again form a closure system that can be enumerated efficiently and that directly ties into the results on constraint-based closed sets. In fact both approaches can easily be combined. In some applications, however, the resulting set of constrained closed sets is still intractably large or it is too difficult to find meaningful hard constraints at all (including values for their parameters). Therefore, the last part of this thesis presents an alternative algorithmic paradigm to the extraction of closed sets: instead of exhaustively listing a potentially exponential number of sets, randomly generate exactly the desired amount of them. By using the Markov chain Monte Carlo method, this generation can be performed according to any desired probability distribution that favors interesting patterns. This novel randomized approach complements traditional enumeration techniques (including those mentioned above): On the one hand, it is only applicable in scenarios that do not require deterministic guarantees for the output such as exploratory data analysis or global model construction. On the other hand, random closed set generation provides complete control over the number as well as the distribution of the produced sets.Das Aufzählen abgeschlossener Mengen (closed sets), die häufig in einer gegebenen Datenbank vorkommen, ist eine algorithmische Grundaufgabe im Data Mining, die z.B. in Warenkorbanalyse, Betrugserkennung oder Web-Personalisierung auftritt. Die Wichtigkeit abgeschlossener Mengen ist semantisch als auch algorithmisch begründet: Sie bilden eine nicht-redundante Basis zur Erzeugung von lokalen Mustern und können gleichzeitig effizient aufgezählt werden. Allerdings kann die Anzahl aller abgeschlossenen Mengen, und damit ihre Auflistungszeit, das Maß des effektiv handhabbaren oft deutlich übersteigen. In diesem Fall ist es unvermeidlich, kleinere Ausgabefamilien zu betrachten, und es ist essenziell, dass dabei beide o.g. Eigenschaften erhalten bleiben: eine kontrollierte Semantik im Sinne eines passenden Interessantheitsbegriffes sowie effiziente Aufzählbarkeit. Diese Arbeit stellt dazu drei Ansätze vor: das Einführen zusätzlicher Constraints, die Quantifizierung der Abgeschlossenheit und die kontrollierte zufällige Erzeugung einzelner Mengen anstelle von vollständiger Aufzählung. Die effiziente Aufzählbarkeit der ursprünglichen Familie abgeschlossener Mengen rührt daher, dass sie durch einen effizient berechenbaren Abschlussoperator erzeugt wird und dass desweiteren dessen Fixpunkte durch eine amortisiert polynomiell beschränkte Anzahl von Abschlussberechnungen aufgezählt werden können. Wie sich herausstellt ist dieser Zusammenhang im Allgemeinen nicht mehr gegeben, wenn die Funktionsdomäne durch Constraints einschränkt wird, d.h., dass die effiziente Aufzählung der Fixpunkte nicht mehr möglich ist oder ein erzeugender Abschlussoperator unter Umständen gar nicht existiert. Diese Arbeit gibt erstmalig eine axiomatische Charakterisierung von Constraint-Klassen, die die effiziente Fixpunktaufzählung von beliebigen Abschlussoperatoren erlauben, sowie von Constraint-Klassen, die die Existenz eines erzeugenden Abschlussoperators garantieren. Als ergänzenden Ansatz stellt die Dissertation eine Generalisierung bzw. Quantifizierung des Abgeschlossenheitsbegriffs vor, der auf der Differenz zwischen den Datenbankvorkommen einer Menge zu den Vorkommen all seiner Obermengen basiert. Mengen, die bezüglich dieses Begriffes stark abgeschlossen sind, weisen eine bestimmte Robustheit gegen Veränderungen der Eingabedaten auf. Desweiteren wird die gewünschte effiziente Aufzählbarkeit wiederum durch die Existenz eines effizient berechenbaren erzeugenden Abschlussoperators sichergestellt. Zusätzlich zu dieser algorithmischen Parallele zum Constraint-basierten Vorgehen, können beide Ansätze auch inhaltlich kombiniert werden. In manchen Anwendungen ist die Familie der abgeschlossenen Mengen, zu denen die beiden oben genannten Ansätze führen, allerdings immer noch zu groß bzw. ist es nicht möglich, sinnvolle harte Constraints und zugehörige Parameterwerte zu finden. Daher diskutiert diese Arbeit schließlich noch ein völlig anderes Paradigma zur Erzeugung abgeschlossener Mengen als vollständige Auflistung, nämlich die randomisierte Generierung einer Anzahl von Mengen, die exakt den gewünschten Vorgaben entspricht. Durch den Einsatz der Markov-Ketten-Monte-Carlo-Methode ist es möglich die Verteilung dieser Zufallserzeugung so zu steuern, dass das Ziehen interessanter Mengen begünstigt wird. Dieser neue Ansatz bildet eine sinnvolle Ergänzung zu herkömmlichen Techniken (einschließlich der oben genannten): Er ist zwar nur anwendbar, wenn keine deterministischen Garantien erforderlich sind, erlaubt aber andererseits eine vollständige Kontrolle über Anzahl und Verteilung der produzierten Mengen

    Proof Nets as Processes

    Get PDF
    This work describes a process algebraic interpretation of Proof-nets, which are the canonical objects of Linear Logic proofs. It therefore offers a logically founded basis for deterministic, implicit parallelism.We present delta-calculus, a novel interpretation of Linear Logic, in the form of a typed process algebra that enjoys a Curry-Howard correspondence with Proof Nets. Reduction inherits the qualities of the logical objects: termination, deadlock-freedom, determinism, and very importantly, a high degree of parallelism. We obtain the necessary soundness results and provide a propositions-as-types theorem. The basic system is extended in two directions. First, we adapt it to interpret Affine Logic. Second, we propose extensions for general recursion, and introduce a novel form of recursive linear types. As an application we show a highly parallel type-preserving translation from a linear System F and extend it to the recursive variation. Our interpretation can be seen as a more canonical proof-theoretic alternative to several recent works on pi-calculus interpretations of linear sequent proofs (propositions-as-sessions) which exhibit reduced parallelism

    Foundations of Software Science and Computation Structures

    Get PDF
    This open access book constitutes the proceedings of the 24th International Conference on Foundations of Software Science and Computational Structures, FOSSACS 2021, which was held during March 27 until April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The 28 regular papers presented in this volume were carefully reviewed and selected from 88 submissions. They deal with research on theories and methods to support the analysis, integration, synthesis, transformation, and verification of programs and software systems

    Verification of Non-Regular Program Properties

    Get PDF
    Most temporal logics which have been introduced and studied in the past decades can be embedded into the modal mu-calculus. This is the case for e.g. PDL, CTL, CTL*, ECTL, LTL, etc. and entails that these logics cannot express non-regular program properties. In recent years, some novel approaches towards an increase in expressive power have been made: Fixpoint Logic with Chop enriches the mu-calculus with a sequential composition operator and thereby allows to characterise context-free processes. The Modal Iteration Calculus uses inflationary fixpoints to exceed the expressive power of the mu-calculus. Higher-Order Fixpoint Logic (HFL) incorporates a simply typed lambda-calculus into a setting with extremal fixpoint operators and even exceeds the expressive power of Fixpoint Logic with Chop. But also PDL has been equipped with context-free programs instead of regular ones. In terms of expressivity there is a natural demand for richer frameworks since program property specifications are simply not limited to the regular sphere. Expressivity however usually comes at the price of an increased computational complexity of logic-related decision problems. For instance are the satisfiability problems for the above mentioned logics undecidable. We investigate in this work the model checking problem of three different logics which are capable of expressing non-regular program properties and aim at identifying fragments with feasible model checking complexity. Firstly, we develop a generic method for determining the complexity of model checking PDL over arbitrary classes of programs and show that the border to undecidability runs between PDL over indexed languages and PDL over context-sensitive languages. It is however still in PTIME for PDL over linear indexed languages and in EXPTIME for PDL over indexed languages. We present concrete algorithms which allow implementations of model checkers for these two fragments. We then introduce an extension of CTL in which the UNTIL- and RELEASE- operators are adorned with formal languages. These are interpreted over labeled paths and restrict the moments on such a path at which the operators are satisfied. The UNTIL-operator is for instance satisfied if some path prefix forms a word in the language it is adorned with (besides the usual requirement that until that moment some property has to hold and at that very moment some other property must hold). Again, we determine the computational complexities of the model checking problems for varying classes of allowed languages in either operator. It turns out that either enabling context-sensitive languages in the UNTIL or context-free languages in the RELEASE- operator renders the model checking problem undecidable while it is EXPTIME-complete for indexed languages in the UNTIL and visibly pushdown languages in the RELEASE- operator. PTIME-completeness is a result of allowing linear indexed languages in the UNTIL and deterministic context-free languages in the RELEASE. We do also give concrete model checking algorithms for several interesting fragments of these logics. Finally, we turn our attention to the model checking problem of HFL which we have already studied in previous works. On finite state models it is k-EXPTIME-complete for HFL(k), the fragment of HFL obtained by restricting functions in the lambda-calculus to order k. Novel in this work is however the generalisation (from the first-order case to the case for functions of arbitrary order) of an idea to improve the best and average case behaviour of a model checking algorithm by using partial functions during the fixpoint iteration guided by the neededness of arguments. This is possible, because the semantics of a closed HFL formula is not a total function but the value of a function at some argument. Again, we give a concrete algorithm for such an improved model checker and argue that despite the very high model checking complexity this improvement is very useful in practice and gives feasible results for HFL with lower order fuctions, backed up by a statistical analysis of the number of needed arguments on a concrete example. Furthermore, we show how HFL can be used as a tool for the development of algorithms. Its high expressivity allows to encode a wide variety of problems as instances of model checking already in the first-order fragment. The rather unintuitive -- yet very succinct -- problem encoding together with an analysis of the behaviour of the above sketched optimisation may give deep insights into the problem. We demonstrate this on the example of the universality problem for nondeterministic finite automata, where a slight variation of the optimised model checking algorithm yields one of the best known methods so far which was only discovered recently. We do also investigate typical model-theoretic properties for each of these logics and compare them with respect to expressive power

    Programming Languages and Systems

    Get PDF
    This open access book constitutes the proceedings of the 28th European Symposium on Programming, ESOP 2019, which took place in Prague, Czech Republic, in April 2019, held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019

    ITL Monitor: Compositional Runtime Analysis with Interval Temporal Logic

    Get PDF
    Runtime verification has gained significant interest in recent years. It is a process in which the execution trace of a program is analysed while it is running. A popular language for specifying temporal requirements for runtime verification is Linear Temporal Logic (LTL), which is excellent for expressing properties such as safety and liveness. Another formalism that is used is Interval Temporal Logic (ITL). This logic has constructs for specifying the behaviour of programs that can be decomposed into subintervals of activity. Traditionally, only a restricted subset of ITL has been used for runtime verification due to the limitations imposed by making the subset executable. In this thesis an alternative restriction of ITL was considered as the basis for constructing a library of runtime verification monitors (ITL-Monitor). The thesis introduces a new first-occurrence operator (|>) into ITL and explores its properties. This operator is the basis of the translation from runtime monitors to their corresponding ITL formulae. ITL-Monitor is then introduced formally, and the algebraic properties of its operators are analysed. An implementation of ITL-Monitor is given, based upon the construction of a Domain Specific Language using Scala. The architecture of the underlying system comprises a network of concurrent actors built on top of Akka - an industrial strength distributed actor framework. A number of example systems are constructed to evaluate ITL-Monitor's performance against alternative verification tools. ITL-Monitor is also subjected to a simulation that generates a very large quantity of state data. The monitors were observed to deliver consistent performance across execution traces of up to a million states, and to verify subintervals of up to 300 states against ITL formulae with evaluation complexity of O(n^3)
    corecore