5,081 research outputs found

    The Efficient Discovery of Interesting Closed Pattern Collections

    Get PDF
    Enumerating closed sets that are frequent in a given database is a fundamental data mining technique that is used, e.g., in the context of market basket analysis, fraud detection, or Web personalization. There are two complementing reasons for the importance of closed sets---one semantical and one algorithmic: closed sets provide a condensed basis for non-redundant collections of interesting local patterns, and they can be enumerated efficiently. For many databases, however, even the closed set collection can be way too large for further usage and correspondingly its computation time can be infeasibly long. In such cases, it is inevitable to focus on smaller collections of closed sets, and it is essential that these collections retain both: controlled semantics reflecting some notion of interestingness as well as efficient enumerability. This thesis discusses three different approaches to achieve this: constraint-based closed set extraction, pruning by quantifying the degree or strength of closedness, and controlled random generation of closed sets instead of exhaustive enumeration. For the original closed set family, efficient enumerability results from the fact that there is an inducing efficiently computable closure operator and that its fixpoints can be enumerated by an amortized polynomial number of closure computations. Perhaps surprisingly, it turns out that this connection does not generally hold for other constraint combinations, as the restricted domains induced by additional constraints can cause two things to happen: the fixpoints of the closure operator cannot be enumerated efficiently or an inducing closure operator does not even exist. This thesis gives, for the first time, a formal axiomatic characterization of constraint classes that allow to efficiently enumerate fixpoints of arbitrary closure operators as well as of constraint classes that guarantee the existence of a closure operator inducing the closed sets. As a complementary approach, the thesis generalizes the notion of closedness by quantifying its strength, i.e., the difference in supporting database records between a closed set and all its supersets. This gives rise to a measure of interestingness that is able to select long and thus particularly informative closed sets that are robust against noise and dynamic changes. Moreover, this measure is algorithmically sound because all closed sets with a minimum strength again form a closure system that can be enumerated efficiently and that directly ties into the results on constraint-based closed sets. In fact both approaches can easily be combined. In some applications, however, the resulting set of constrained closed sets is still intractably large or it is too difficult to find meaningful hard constraints at all (including values for their parameters). Therefore, the last part of this thesis presents an alternative algorithmic paradigm to the extraction of closed sets: instead of exhaustively listing a potentially exponential number of sets, randomly generate exactly the desired amount of them. By using the Markov chain Monte Carlo method, this generation can be performed according to any desired probability distribution that favors interesting patterns. This novel randomized approach complements traditional enumeration techniques (including those mentioned above): On the one hand, it is only applicable in scenarios that do not require deterministic guarantees for the output such as exploratory data analysis or global model construction. On the other hand, random closed set generation provides complete control over the number as well as the distribution of the produced sets.Das Aufzählen abgeschlossener Mengen (closed sets), die häufig in einer gegebenen Datenbank vorkommen, ist eine algorithmische Grundaufgabe im Data Mining, die z.B. in Warenkorbanalyse, Betrugserkennung oder Web-Personalisierung auftritt. Die Wichtigkeit abgeschlossener Mengen ist semantisch als auch algorithmisch begründet: Sie bilden eine nicht-redundante Basis zur Erzeugung von lokalen Mustern und können gleichzeitig effizient aufgezählt werden. Allerdings kann die Anzahl aller abgeschlossenen Mengen, und damit ihre Auflistungszeit, das Maß des effektiv handhabbaren oft deutlich übersteigen. In diesem Fall ist es unvermeidlich, kleinere Ausgabefamilien zu betrachten, und es ist essenziell, dass dabei beide o.g. Eigenschaften erhalten bleiben: eine kontrollierte Semantik im Sinne eines passenden Interessantheitsbegriffes sowie effiziente Aufzählbarkeit. Diese Arbeit stellt dazu drei Ansätze vor: das Einführen zusätzlicher Constraints, die Quantifizierung der Abgeschlossenheit und die kontrollierte zufällige Erzeugung einzelner Mengen anstelle von vollständiger Aufzählung. Die effiziente Aufzählbarkeit der ursprünglichen Familie abgeschlossener Mengen rührt daher, dass sie durch einen effizient berechenbaren Abschlussoperator erzeugt wird und dass desweiteren dessen Fixpunkte durch eine amortisiert polynomiell beschränkte Anzahl von Abschlussberechnungen aufgezählt werden können. Wie sich herausstellt ist dieser Zusammenhang im Allgemeinen nicht mehr gegeben, wenn die Funktionsdomäne durch Constraints einschränkt wird, d.h., dass die effiziente Aufzählung der Fixpunkte nicht mehr möglich ist oder ein erzeugender Abschlussoperator unter Umständen gar nicht existiert. Diese Arbeit gibt erstmalig eine axiomatische Charakterisierung von Constraint-Klassen, die die effiziente Fixpunktaufzählung von beliebigen Abschlussoperatoren erlauben, sowie von Constraint-Klassen, die die Existenz eines erzeugenden Abschlussoperators garantieren. Als ergänzenden Ansatz stellt die Dissertation eine Generalisierung bzw. Quantifizierung des Abgeschlossenheitsbegriffs vor, der auf der Differenz zwischen den Datenbankvorkommen einer Menge zu den Vorkommen all seiner Obermengen basiert. Mengen, die bezüglich dieses Begriffes stark abgeschlossen sind, weisen eine bestimmte Robustheit gegen Veränderungen der Eingabedaten auf. Desweiteren wird die gewünschte effiziente Aufzählbarkeit wiederum durch die Existenz eines effizient berechenbaren erzeugenden Abschlussoperators sichergestellt. Zusätzlich zu dieser algorithmischen Parallele zum Constraint-basierten Vorgehen, können beide Ansätze auch inhaltlich kombiniert werden. In manchen Anwendungen ist die Familie der abgeschlossenen Mengen, zu denen die beiden oben genannten Ansätze führen, allerdings immer noch zu groß bzw. ist es nicht möglich, sinnvolle harte Constraints und zugehörige Parameterwerte zu finden. Daher diskutiert diese Arbeit schließlich noch ein völlig anderes Paradigma zur Erzeugung abgeschlossener Mengen als vollständige Auflistung, nämlich die randomisierte Generierung einer Anzahl von Mengen, die exakt den gewünschten Vorgaben entspricht. Durch den Einsatz der Markov-Ketten-Monte-Carlo-Methode ist es möglich die Verteilung dieser Zufallserzeugung so zu steuern, dass das Ziehen interessanter Mengen begünstigt wird. Dieser neue Ansatz bildet eine sinnvolle Ergänzung zu herkömmlichen Techniken (einschließlich der oben genannten): Er ist zwar nur anwendbar, wenn keine deterministischen Garantien erforderlich sind, erlaubt aber andererseits eine vollständige Kontrolle über Anzahl und Verteilung der produzierten Mengen

    Transitions in large eddy simulation of box turbulence

    Full text link
    One promising decomposition of turbulent dynamics is that into building blocks such as equilibrium and periodic solutions and orbits connecting these. While the numerical approximation of such building blocks is feasible for flows in small domains and at low Reynolds numbers, computations in developed turbulence are currently out of reach because of the large number of degrees of freedom necessary to represent Navier-Stokes flow on all relevant spatial scales. We mitigate this problem by applying large eddy simulation (LES), which aims to model, rather than resolve, motion on scales below the filter length, which is fixed by a model parameter. By considering a periodic spatial domain, we avoid complications that arise in LES modelling in the presence of boundary layers. We consider the motion of an LES fluid subject to a constant body force of the Taylor-Green type as the separation between the forcing length scale and the filter length is increased. In particular, we discuss the transition from laminar to weakly turbulent motion, regulated by simple invariant solution, on a grid of 32332^3 points

    Lie-algebraic classification of effective theories with enhanced soft limits

    Full text link
    A great deal of effort has recently been invested in developing methods of calculating scattering amplitudes that bypass the traditional construction based on Lagrangians and Feynman rules. Motivated by this progress, we investigate the long-wavelength behavior of scattering amplitudes of massless scalar particles: Nambu-Goldstone (NG) bosons. The low-energy dynamics of NG bosons is governed by the underlying spontaneously broken symmetry, which likewise allows one to bypass the Lagrangian and connect the scaling of the scattering amplitudes directly to the Lie algebra of the symmetry generators. We focus on theories with enhanced soft limits, where the scattering amplitudes scale with a higher power of momentum than expected based on the mere existence of Adler's zero. Our approach is complementary to that developed recently by Cheung et al., and in the first step we reproduce their result. That is, as far as Lorentz-invariant theories with a single physical NG boson are concerned, we find no other nontrivial theories featuring enhanced soft limits beyond the already well-known ones: the Galileon and the Dirac-Born-Infeld (DBI) scalar. Next, we show that in a certain sense, these theories do not admit a nontrivial generalization to non-Abelian internal symmetries. Namely, for compact internal symmetry groups, all NG bosons featuring enhanced soft limits necessarily belong to the center of the group. For noncompact symmetry groups such as the ISO(nn) group featured by some multi-Galileon theories, these NG bosons then necessarily belong to an Abelian normal subgroup. The Lie-algebraic consistency constraints admit two infinite classes of solutions, generalizing the known multi-Galileon and multi-flavor DBI theories.Comment: 1+48 pages; v2: minor changes and some references added, matches version published in JHE

    Two-mode overconstrained three-DOFs rotational-translational linear-motor-based parallel-kinematics mechanism for machine tool applications

    Full text link
    The paper introduces a family of three-DOFs translational-rotational Parallel-Kinematics Mechanisms (PKMs) as well as the mobility analysis of such family using Lie-group theory. Each member of this family has two-rotational one-translational DOFs. A novel mechanism is presented and analyzed as a representative of that family. The use and the practical value of that modular mechanism are emphasized.<br /
    corecore