Search CORE

1,921 research outputs found

Understanding the Frequency Distribution of Mechanically Stable Disk Packings

Author: Corey S. O’Hern
Guo-Jie Gao
Jerzy Bławzdziewicz
W. H. Press
Publication venue: 'American Physical Society (APS)'
Publication date: 08/06/2006
Field of study

Relative frequencies of mechanically stable (MS) packings of frictionless bidisperse disks are studied numerically in small systems. The packings are created by successively compressing or decompressing a system of soft purely repulsive disks, followed by energy minimization, until only infinitesimal particle overlaps remain. For systems of up to 14 particles most of the MS packings were generated. We find that the packings are not equally probable as has been assumed in recent thermodynamic descriptions of granular systems. Instead, the frequency distribution, averaged over each packing-fraction interval

\Delta \phi

, grows exponentially with increasing

\phi

. Moreover, within each packing-fraction interval MS packings occur with frequencies

f_k

that differ by many orders of magnitude. Also, key features of the frequency distribution do not change when we significantly alter the packing-generation algorithm--for example frequent packings remain frequent and rare ones remain rare. These results indicate that the frequency distribution of MS packings is strongly influenced by geometrical properties of the multidimensional configuration space. By adding thermal fluctuations to a set of the MS packings, we were able to examine a number of local features of configuration space near each packing including the time required for a given packing to break to a distinct one, which enabled us to estimate the energy barriers that separate one packing from another. We found a positive correlation between the packing frequencies and the heights of the lowest energy barriers

\epsilon_0

. We also examined displacement fluctuations away from the MS packings to correlate the size and shape of the local basins near each packing to the packing frequencies.Comment: 21 pages, 20 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

Flexible constrained sampling with guarantees for pattern mining

Author: A Giacometti
A Zimmermann
C Bucilă
CP Gomes
F Bonchi
Luc De Raedt
M Berlingerio
M Boley
MA Hasan
Matthijs van Leeuwen
S Ermon
S Nijssen
T Calders
T Guns
T Guns
Vladimir Dzyuba
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Pattern sampling has been proposed as a potential solution to the infamous pattern explosion. Instead of enumerating all patterns that satisfy the constraints, individual patterns are sampled proportional to a given quality measure. Several sampling algorithms have been proposed, but each of them has its limitations when it comes to 1) flexibility in terms of quality measures and constraints that can be used, and/or 2) guarantees with respect to sampling accuracy. We therefore present Flexics, the first flexible pattern sampler that supports a broad class of quality measures and constraints, while providing strong guarantees regarding sampling accuracy. To achieve this, we leverage the perspective on pattern mining as a constraint satisfaction problem and build upon the latest advances in sampling solutions in SAT as well as existing pattern mining algorithms. Furthermore, the proposed algorithm is applicable to a variety of pattern languages, which allows us to introduce and tackle the novel task of sampling sets of patterns. We introduce and empirically evaluate two variants of Flexics: 1) a generic variant that addresses the well-known itemset sampling task and the novel pattern set sampling task as well as a wide range of expressive constraints within these tasks, and 2) a specialized variant that exploits existing frequent itemset techniques to achieve substantial speed-ups. Experiments show that Flexics is both accurate and efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal (ECML/PKDD 2017 journal track

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

Reductions for Frequency-Based Data Mining Problems

Author: Miettinen Pauli
Neumann Stefan
Publication venue
Publication date: 01/01/2017
Field of study

Studying the computational complexity of problems is one of the - if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in the Proceedings of the 17th IEEE International Conference on Data Mining (ICDM'17

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Achieving New Upper Bounds for the Hypergraph Duality Problem through Logic

Author: Eiter T.
Eiter T.
Gaur D. R.
Gaur D. R.
Gottlob G.
Hagen M.
Henzinger T. A.
Kavvadias D. J.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2014
Field of study

The hypergraph duality problem DUAL is defined as follows: given two simple hypergraphs

\mathcal{G}

and

\mathcal{H}

, decide whether

\mathcal{H}

consists precisely of all minimal transversals of

\mathcal{G}

(in which case we say that

\mathcal{G}

is the dual of

\mathcal{H}

). This problem is equivalent to deciding whether two given non-redundant monotone DNFs are dual. It is known that non-DUAL, the complementary problem to DUAL, is in

\mathrm{GC}(\log^2 n,\mathrm{PTIME})

, where

\mathrm{GC}(f(n),\mathcal{C})

denotes the complexity class of all problems that after a nondeterministic guess of

O(f(n))

bits can be decided (checked) within complexity class

\mathcal{C}

. It was conjectured that non-DUAL is in

\mathrm{GC}(\log^2 n,\mathrm{LOGSPACE})

. In this paper we prove this conjecture and actually place the non-DUAL problem into the complexity class

\mathrm{GC}(\log^2 n,\mathrm{TC}^0)

which is a subclass of

\mathrm{GC}(\log^2 n,\mathrm{LOGSPACE})

. We here refer to the logtime-uniform version of

\mathrm{TC}^0

, which corresponds to

\mathrm{FO(COUNT)}

, i.e., first order logic augmented by counting quantifiers. We achieve the latter bound in two steps. First, based on existing problem decomposition methods, we develop a new nondeterministic algorithm for non-DUAL that requires to guess

O(\log^2 n)

bits. We then proceed by a logical analysis of this algorithm, allowing us to formulate its deterministic part in

\mathrm{FO(COUNT)}

. From this result, by the well known inclusion

\mathrm{TC}^0\subseteq\mathrm{LOGSPACE}

, it follows that DUAL belongs also to

\mathrm{DSPACE}[\log^2 n]

. Finally, by exploiting the principles on which the proposed nondeterministic algorithm is based, we devise a deterministic algorithm that, given two hypergraphs

\mathcal{G}

and

\mathcal{H}

, computes in quadratic logspace a transversal of

\mathcal{G}

missing in

\mathcal{H}

.Comment: Restructured the presentation in order to be the extended version of a paper that will shortly appear in SIAM Journal on Computin

arXiv.org e-Print Archive

CiteSeerX

Crossref

Open Research Exeter

Oxford University Research Archive

King's Research Portal

Discovery of the D-basis in binary tables based on hypergraph dualization

Author: Adaricheva Kira
Nation J.B.
Publication venue
Publication date: 01/01/2016
Field of study

Discovery of (strong) association rules, or implications, is an important task in data management, and it nds application in arti cial intelligence, data mining and the semantic web. We introduce a novel approach for the discovery of a speci c set of implications, called the D-basis, that provides a representation for a reduced binary table, based on the structure of its Galois lattice. At the core of the method are the D-relation de ned in the lattice theory framework, and the hypergraph dualization algorithm that allows us to e ectively produce the set of transversals for a given Sperner hypergraph. The latter algorithm, rst developed by specialists from Rutgers Center for Operations Research, has already found numerous applications in solving optimization problems in data base theory, arti cial intelligence and game theory. One application of the method is for analysis of gene expression data related to a particular phenotypic variable, and some initial testing is done for the data provided by the University of Hawaii Cancer Cente

Nazarbayev University Repository