1,921 research outputs found

    Understanding the Frequency Distribution of Mechanically Stable Disk Packings

    Full text link
    Relative frequencies of mechanically stable (MS) packings of frictionless bidisperse disks are studied numerically in small systems. The packings are created by successively compressing or decompressing a system of soft purely repulsive disks, followed by energy minimization, until only infinitesimal particle overlaps remain. For systems of up to 14 particles most of the MS packings were generated. We find that the packings are not equally probable as has been assumed in recent thermodynamic descriptions of granular systems. Instead, the frequency distribution, averaged over each packing-fraction interval Δϕ\Delta \phi, grows exponentially with increasing ϕ\phi. Moreover, within each packing-fraction interval MS packings occur with frequencies fkf_k that differ by many orders of magnitude. Also, key features of the frequency distribution do not change when we significantly alter the packing-generation algorithm--for example frequent packings remain frequent and rare ones remain rare. These results indicate that the frequency distribution of MS packings is strongly influenced by geometrical properties of the multidimensional configuration space. By adding thermal fluctuations to a set of the MS packings, we were able to examine a number of local features of configuration space near each packing including the time required for a given packing to break to a distinct one, which enabled us to estimate the energy barriers that separate one packing from another. We found a positive correlation between the packing frequencies and the heights of the lowest energy barriers ϵ0\epsilon_0. We also examined displacement fluctuations away from the MS packings to correlate the size and shape of the local basins near each packing to the packing frequencies.Comment: 21 pages, 20 figures, 1 tabl

    Flexible constrained sampling with guarantees for pattern mining

    Get PDF
    Pattern sampling has been proposed as a potential solution to the infamous pattern explosion. Instead of enumerating all patterns that satisfy the constraints, individual patterns are sampled proportional to a given quality measure. Several sampling algorithms have been proposed, but each of them has its limitations when it comes to 1) flexibility in terms of quality measures and constraints that can be used, and/or 2) guarantees with respect to sampling accuracy. We therefore present Flexics, the first flexible pattern sampler that supports a broad class of quality measures and constraints, while providing strong guarantees regarding sampling accuracy. To achieve this, we leverage the perspective on pattern mining as a constraint satisfaction problem and build upon the latest advances in sampling solutions in SAT as well as existing pattern mining algorithms. Furthermore, the proposed algorithm is applicable to a variety of pattern languages, which allows us to introduce and tackle the novel task of sampling sets of patterns. We introduce and empirically evaluate two variants of Flexics: 1) a generic variant that addresses the well-known itemset sampling task and the novel pattern set sampling task as well as a wide range of expressive constraints within these tasks, and 2) a specialized variant that exploits existing frequent itemset techniques to achieve substantial speed-ups. Experiments show that Flexics is both accurate and efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal (ECML/PKDD 2017 journal track

    Reductions for Frequency-Based Data Mining Problems

    Full text link
    Studying the computational complexity of problems is one of the - if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in the Proceedings of the 17th IEEE International Conference on Data Mining (ICDM'17

    Achieving New Upper Bounds for the Hypergraph Duality Problem through Logic

    Get PDF
    The hypergraph duality problem DUAL is defined as follows: given two simple hypergraphs G\mathcal{G} and H\mathcal{H}, decide whether H\mathcal{H} consists precisely of all minimal transversals of G\mathcal{G} (in which case we say that G\mathcal{G} is the dual of H\mathcal{H}). This problem is equivalent to deciding whether two given non-redundant monotone DNFs are dual. It is known that non-DUAL, the complementary problem to DUAL, is in GC(log2n,PTIME)\mathrm{GC}(\log^2 n,\mathrm{PTIME}), where GC(f(n),C)\mathrm{GC}(f(n),\mathcal{C}) denotes the complexity class of all problems that after a nondeterministic guess of O(f(n))O(f(n)) bits can be decided (checked) within complexity class C\mathcal{C}. It was conjectured that non-DUAL is in GC(log2n,LOGSPACE)\mathrm{GC}(\log^2 n,\mathrm{LOGSPACE}). In this paper we prove this conjecture and actually place the non-DUAL problem into the complexity class GC(log2n,TC0)\mathrm{GC}(\log^2 n,\mathrm{TC}^0) which is a subclass of GC(log2n,LOGSPACE)\mathrm{GC}(\log^2 n,\mathrm{LOGSPACE}). We here refer to the logtime-uniform version of TC0\mathrm{TC}^0, which corresponds to FO(COUNT)\mathrm{FO(COUNT)}, i.e., first order logic augmented by counting quantifiers. We achieve the latter bound in two steps. First, based on existing problem decomposition methods, we develop a new nondeterministic algorithm for non-DUAL that requires to guess O(log2n)O(\log^2 n) bits. We then proceed by a logical analysis of this algorithm, allowing us to formulate its deterministic part in FO(COUNT)\mathrm{FO(COUNT)}. From this result, by the well known inclusion TC0LOGSPACE\mathrm{TC}^0\subseteq\mathrm{LOGSPACE}, it follows that DUAL belongs also to DSPACE[log2n]\mathrm{DSPACE}[\log^2 n]. Finally, by exploiting the principles on which the proposed nondeterministic algorithm is based, we devise a deterministic algorithm that, given two hypergraphs G\mathcal{G} and H\mathcal{H}, computes in quadratic logspace a transversal of G\mathcal{G} missing in H\mathcal{H}.Comment: Restructured the presentation in order to be the extended version of a paper that will shortly appear in SIAM Journal on Computin

    Discovery of the D-basis in binary tables based on hypergraph dualization

    Get PDF
    Discovery of (strong) association rules, or implications, is an important task in data management, and it nds application in arti cial intelligence, data mining and the semantic web. We introduce a novel approach for the discovery of a speci c set of implications, called the D-basis, that provides a representation for a reduced binary table, based on the structure of its Galois lattice. At the core of the method are the D-relation de ned in the lattice theory framework, and the hypergraph dualization algorithm that allows us to e ectively produce the set of transversals for a given Sperner hypergraph. The latter algorithm, rst developed by specialists from Rutgers Center for Operations Research, has already found numerous applications in solving optimization problems in data base theory, arti cial intelligence and game theory. One application of the method is for analysis of gene expression data related to a particular phenotypic variable, and some initial testing is done for the data provided by the University of Hawaii Cancer Cente
    corecore