Search CORE

1,739 research outputs found

Beyond Hypergraph Dualization

Author: B Goethals
D Gunopulos
DJ Kavvadias
G Gottlob
H Arimura
H Mannila
KM Elbassioni
KM Elbassioni
L Nourine
ML Fredman
T Eiter
T Eiter
T Eiter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/09/2016
Field of study

International audienceThis problem concerns hypergraph dualization and generalization to poset dualization. A hypergraph H = (V, E) consists of a finite collection E of sets over a finite set V , i.e. E ⊆ P(V) (the powerset of V). The elements of E are called hyperedges, or simply edges. A hypergraph is said simple if none of its edges is contained within another. A transversal (or hitting set) of H is a set T ⊆ V that intersects every edge of E. A transversal is minimal if it does not contain any other transversal as a subset. The set of all minimal transversal of H is denoted by T r(H). The hypergraph (V, T r(H)) is called the transversal hypergraph of H. Given a simple hypergraph H, the hypergraph dualization problem (Trans-Enum for short) concerns the enumeration without repetitions of T r(H). The Trans-Enum problem can also be formulated as a dualization problem in posets. Let (P, ≤) be a poset (i.e. ≤ is a reflexive, antisymmetric, and transitive relation on the set P). For A ⊆ P , ↓ A (resp. ↑ A) is the downward (resp. upward) closure of A under the relation ≤ (i.e. ↓ A is an ideal and ↑ A a filter of (P, ≤)). Two antichains (B + , B −) of P are said to be dual if ↓ B + ∪ ↑ B − = P and ↓ B + ∩ ↑ B − = ∅. Given an implicit description of a poset P and an antichain B + (resp. B −) of P , the poset dualization problem (Dual-Enum for short) enumerates the set B − (resp. B +), denoted by Dual(B +) = B − (resp. Dual(B −) = B +). Notice that the function dual is self-dual or idempotent, i.e. Dual(Dual(B)) = B

HAL Clermont Université

Hal-Diderot

Neighborhood Inclusions for Minimal Dominating Sets Enumeration: Linear and Polynomial Delay Algorithms in P_7 - Free and P_8 - Free Chordal Graphs

Author: Defrain Oscar
Nourine Lhouari
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th International Symposium on Algorithms and Computation (ISAAC 2019)
Publication date: 01/01/2019
Field of study

In [M. M. Kant\'e, V. Limouzy, A. Mary, and L. Nourine. On the enumeration of minimal dominating sets and related notions. SIAM Journal on Discrete Mathematics, 28(4):1916-1929, 2014] the authors give an

O(n+m)

delay algorithm based on neighborhood inclusions for the enumeration of minimal dominating sets in split and

P_6

-free chordal graphs. In this paper, we investigate generalizations of this technique to

P_k

-free chordal graphs for larger integers

k

. In particular, we give

O(n+m)

and

O(n^3\cdot m)

delays algorithms in the classes of

P_7

-free and

P_8

-free chordal graphs. As for

P_k

-free chordal graphs for

k\geq 9

, we give evidence that such a technique is inefficient as a key step of the algorithm, namely the irredundant extension problem, becomes NP-complete.Comment: 16 pages, 3 figure

arXiv.org e-Print Archive

HAL Clermont Université

Dagstuhl Research Online Publication Server

Extended Dualization: Application to Maximal Pattern Mining

Author: Nourine Lhouari
Petit Jean-Marc
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

International audienceThe dualization in arbitrary posets is a well-studied problem in combinatorial enumeration and is a crucial step in many applications in logics, databases, artificial intelligence and pattern mining.The objective of this paper is to study reductions of the dualization problem on arbitrary posets to the dualization problem on boolean lattices, for which output quasi-polynomial time algorithms exist. Quasi-polynomial time algorithms are algorithms which run in no(logn) where n is the size of the input and output. We introduce convex embedding and poset reflection as key notions to characterize such reductions. As a consequence, we identify posets, which are not boolean lattices, for which the dualization problem remains in quasi-polynomial time and propose a classification of posets with respect to dualization.From these results, we study how they can be applied to maximal pattern mining problems. We deduce a new classification of pattern mining problems and we point out how known problems involving sequences and conjunctive queries patterns, fit into this classification. Finally, we explain how to adapt the seminal Dualize & Advance algorithm to deal with such patterns.As far as we know, this is the first contribution to explicit non-trivial reductions for studying the hardness of maximal pattern mining problems and to extend the Dualize & Advance algorithm for complex patterns

ZENODO

HAL Clermont Université

Hal-Diderot

Dualization on Partially Ordered Sets: Preliminary Results

Author: Nourine Lhouari
Petit Jean-Marc
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2015
Field of study

International audienceThe dualization problem on arbitrary posets is a crucial step in many applications in logics, databases, artificial intelligence and pattern mining. The objective of this paper is to study reductions of the dualization problem on arbitrary posets to the dualization problem on boolean lattices, for which output quasi-polynomial time algorithms exist. We introduce convex embedding and poset reflection as key notions to characterize such reductions. As a consequence, we identify posets, which are not boolean lattices, for which the dualization problem remains quasi-polynomial and propose a classification of posets with respect to dualization. As far as we know, this is the first contribution to explicit non-trivial reductions for studying the hardness of dualization problems on arbitrary posets

HAL Clermont Université

Hal-Diderot

The Efficient Discovery of Interesting Closed Pattern Collections

Author: Boley Mario
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Enumerating closed sets that are frequent in a given database is a fundamental data mining technique that is used, e.g., in the context of market basket analysis, fraud detection, or Web personalization. There are two complementing reasons for the importance of closed sets---one semantical and one algorithmic: closed sets provide a condensed basis for non-redundant collections of interesting local patterns, and they can be enumerated efficiently. For many databases, however, even the closed set collection can be way too large for further usage and correspondingly its computation time can be infeasibly long. In such cases, it is inevitable to focus on smaller collections of closed sets, and it is essential that these collections retain both: controlled semantics reflecting some notion of interestingness as well as efficient enumerability. This thesis discusses three different approaches to achieve this: constraint-based closed set extraction, pruning by quantifying the degree or strength of closedness, and controlled random generation of closed sets instead of exhaustive enumeration. For the original closed set family, efficient enumerability results from the fact that there is an inducing efficiently computable closure operator and that its fixpoints can be enumerated by an amortized polynomial number of closure computations. Perhaps surprisingly, it turns out that this connection does not generally hold for other constraint combinations, as the restricted domains induced by additional constraints can cause two things to happen: the fixpoints of the closure operator cannot be enumerated efficiently or an inducing closure operator does not even exist. This thesis gives, for the first time, a formal axiomatic characterization of constraint classes that allow to efficiently enumerate fixpoints of arbitrary closure operators as well as of constraint classes that guarantee the existence of a closure operator inducing the closed sets. As a complementary approach, the thesis generalizes the notion of closedness by quantifying its strength, i.e., the difference in supporting database records between a closed set and all its supersets. This gives rise to a measure of interestingness that is able to select long and thus particularly informative closed sets that are robust against noise and dynamic changes. Moreover, this measure is algorithmically sound because all closed sets with a minimum strength again form a closure system that can be enumerated efficiently and that directly ties into the results on constraint-based closed sets. In fact both approaches can easily be combined. In some applications, however, the resulting set of constrained closed sets is still intractably large or it is too difficult to find meaningful hard constraints at all (including values for their parameters). Therefore, the last part of this thesis presents an alternative algorithmic paradigm to the extraction of closed sets: instead of exhaustively listing a potentially exponential number of sets, randomly generate exactly the desired amount of them. By using the Markov chain Monte Carlo method, this generation can be performed according to any desired probability distribution that favors interesting patterns. This novel randomized approach complements traditional enumeration techniques (including those mentioned above): On the one hand, it is only applicable in scenarios that do not require deterministic guarantees for the output such as exploratory data analysis or global model construction. On the other hand, random closed set generation provides complete control over the number as well as the distribution of the produced sets.Das Aufzählen abgeschlossener Mengen (closed sets), die häufig in einer gegebenen Datenbank vorkommen, ist eine algorithmische Grundaufgabe im Data Mining, die z.B. in Warenkorbanalyse, Betrugserkennung oder Web-Personalisierung auftritt. Die Wichtigkeit abgeschlossener Mengen ist semantisch als auch algorithmisch begründet: Sie bilden eine nicht-redundante Basis zur Erzeugung von lokalen Mustern und können gleichzeitig effizient aufgezählt werden. Allerdings kann die Anzahl aller abgeschlossenen Mengen, und damit ihre Auflistungszeit, das Maß des effektiv handhabbaren oft deutlich übersteigen. In diesem Fall ist es unvermeidlich, kleinere Ausgabefamilien zu betrachten, und es ist essenziell, dass dabei beide o.g. Eigenschaften erhalten bleiben: eine kontrollierte Semantik im Sinne eines passenden Interessantheitsbegriffes sowie effiziente Aufzählbarkeit. Diese Arbeit stellt dazu drei Ansätze vor: das Einführen zusätzlicher Constraints, die Quantifizierung der Abgeschlossenheit und die kontrollierte zufällige Erzeugung einzelner Mengen anstelle von vollständiger Aufzählung. Die effiziente Aufzählbarkeit der ursprünglichen Familie abgeschlossener Mengen rührt daher, dass sie durch einen effizient berechenbaren Abschlussoperator erzeugt wird und dass desweiteren dessen Fixpunkte durch eine amortisiert polynomiell beschränkte Anzahl von Abschlussberechnungen aufgezählt werden können. Wie sich herausstellt ist dieser Zusammenhang im Allgemeinen nicht mehr gegeben, wenn die Funktionsdomäne durch Constraints einschränkt wird, d.h., dass die effiziente Aufzählung der Fixpunkte nicht mehr möglich ist oder ein erzeugender Abschlussoperator unter Umständen gar nicht existiert. Diese Arbeit gibt erstmalig eine axiomatische Charakterisierung von Constraint-Klassen, die die effiziente Fixpunktaufzählung von beliebigen Abschlussoperatoren erlauben, sowie von Constraint-Klassen, die die Existenz eines erzeugenden Abschlussoperators garantieren. Als ergänzenden Ansatz stellt die Dissertation eine Generalisierung bzw. Quantifizierung des Abgeschlossenheitsbegriffs vor, der auf der Differenz zwischen den Datenbankvorkommen einer Menge zu den Vorkommen all seiner Obermengen basiert. Mengen, die bezüglich dieses Begriffes stark abgeschlossen sind, weisen eine bestimmte Robustheit gegen Veränderungen der Eingabedaten auf. Desweiteren wird die gewünschte effiziente Aufzählbarkeit wiederum durch die Existenz eines effizient berechenbaren erzeugenden Abschlussoperators sichergestellt. Zusätzlich zu dieser algorithmischen Parallele zum Constraint-basierten Vorgehen, können beide Ansätze auch inhaltlich kombiniert werden. In manchen Anwendungen ist die Familie der abgeschlossenen Mengen, zu denen die beiden oben genannten Ansätze führen, allerdings immer noch zu groß bzw. ist es nicht möglich, sinnvolle harte Constraints und zugehörige Parameterwerte zu finden. Daher diskutiert diese Arbeit schließlich noch ein völlig anderes Paradigma zur Erzeugung abgeschlossener Mengen als vollständige Auflistung, nämlich die randomisierte Generierung einer Anzahl von Mengen, die exakt den gewünschten Vorgaben entspricht. Durch den Einsatz der Markov-Ketten-Monte-Carlo-Methode ist es möglich die Verteilung dieser Zufallserzeugung so zu steuern, dass das Ziehen interessanter Mengen begünstigt wird. Dieser neue Ansatz bildet eine sinnvolle Ergänzung zu herkömmlichen Techniken (einschließlich der oben genannten): Er ist zwar nur anwendbar, wenn keine deterministischen Garantien erforderlich sind, erlaubt aber andererseits eine vollständige Kontrolle über Anzahl und Verteilung der produzierten Mengen

グラフや超グラフに含まれる非巡回部分構造の列挙に関する研究

Author: 和佐州洋
Publication venue
Publication date: 24/03/2016
Field of study

On Maximal Cliques with Connectivity Constraints in Directed Graphs

Author: Conte Alessio
Uno Takeaki
Wasa Kunihiro
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th International Symposium on Algorithms and Computation (ISAAC 2017)
Publication date: 01/01/2017
Field of study

Finding communities in the form of cohesive subgraphs is a fundamental problem in network analysis. In domains that model networks as undirected graphs, communities are generally associated with dense subgraphs, and many community models have been proposed. Maximal cliques are arguably the most widely studied among such models, with early works dating back to the \u2760s, and a continuous stream of research up to the present. In domains that model networks as directed graphs, several approaches for community detection have been proposed, but there seems to be no clear model of cohesive subgraph, i.e., of what a community should look like. We extend the fundamental model of clique to directed graphs, adding the natural constraint of strong connectivity within the clique. We characterize the problem by giving a tight bound for the number of such cliques in a graph, and highlighting useful structural properties. We then exploit these properties to produce the first algorithm with polynomial delay for enumerating maximal strongly connected cliques

HAL Clermont Université

Dagstuhl Research Online Publication Server