1,921 research outputs found
Understanding the Frequency Distribution of Mechanically Stable Disk Packings
Relative frequencies of mechanically stable (MS) packings of frictionless
bidisperse disks are studied numerically in small systems. The packings are
created by successively compressing or decompressing a system of soft purely
repulsive disks, followed by energy minimization, until only infinitesimal
particle overlaps remain. For systems of up to 14 particles most of the MS
packings were generated. We find that the packings are not equally probable as
has been assumed in recent thermodynamic descriptions of granular systems.
Instead, the frequency distribution, averaged over each packing-fraction
interval , grows exponentially with increasing . Moreover,
within each packing-fraction interval MS packings occur with frequencies
that differ by many orders of magnitude. Also, key features of the frequency
distribution do not change when we significantly alter the packing-generation
algorithm--for example frequent packings remain frequent and rare ones remain
rare. These results indicate that the frequency distribution of MS packings is
strongly influenced by geometrical properties of the multidimensional
configuration space. By adding thermal fluctuations to a set of the MS
packings, we were able to examine a number of local features of configuration
space near each packing including the time required for a given packing to
break to a distinct one, which enabled us to estimate the energy barriers that
separate one packing from another. We found a positive correlation between the
packing frequencies and the heights of the lowest energy barriers .
We also examined displacement fluctuations away from the MS packings to
correlate the size and shape of the local basins near each packing to the
packing frequencies.Comment: 21 pages, 20 figures, 1 tabl
Flexible constrained sampling with guarantees for pattern mining
Pattern sampling has been proposed as a potential solution to the infamous
pattern explosion. Instead of enumerating all patterns that satisfy the
constraints, individual patterns are sampled proportional to a given quality
measure. Several sampling algorithms have been proposed, but each of them has
its limitations when it comes to 1) flexibility in terms of quality measures
and constraints that can be used, and/or 2) guarantees with respect to sampling
accuracy. We therefore present Flexics, the first flexible pattern sampler that
supports a broad class of quality measures and constraints, while providing
strong guarantees regarding sampling accuracy. To achieve this, we leverage the
perspective on pattern mining as a constraint satisfaction problem and build
upon the latest advances in sampling solutions in SAT as well as existing
pattern mining algorithms. Furthermore, the proposed algorithm is applicable to
a variety of pattern languages, which allows us to introduce and tackle the
novel task of sampling sets of patterns. We introduce and empirically evaluate
two variants of Flexics: 1) a generic variant that addresses the well-known
itemset sampling task and the novel pattern set sampling task as well as a wide
range of expressive constraints within these tasks, and 2) a specialized
variant that exploits existing frequent itemset techniques to achieve
substantial speed-ups. Experiments show that Flexics is both accurate and
efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal
(ECML/PKDD 2017 journal track
Reductions for Frequency-Based Data Mining Problems
Studying the computational complexity of problems is one of the - if not the
- fundamental questions in computer science. Yet, surprisingly little is known
about the computational complexity of many central problems in data mining. In
this paper we study frequency-based problems and propose a new type of
reduction that allows us to compare the complexities of the maximal frequent
pattern mining problems in different domains (e.g. graphs or sequences). Our
results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader
range of data mining problems. Our results show that, by allowing constraints
in the pattern space, the complexities of many maximal frequent pattern mining
problems collapse. These problems include maximal frequent subgraphs in
labelled graphs, maximal frequent itemsets, and maximal frequent subsequences
with no repetitions. In addition to theoretical interest, our results might
yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in
the Proceedings of the 17th IEEE International Conference on Data Mining
(ICDM'17
Achieving New Upper Bounds for the Hypergraph Duality Problem through Logic
The hypergraph duality problem DUAL is defined as follows: given two simple
hypergraphs and , decide whether
consists precisely of all minimal transversals of (in which case
we say that is the dual of ). This problem is
equivalent to deciding whether two given non-redundant monotone DNFs are dual.
It is known that non-DUAL, the complementary problem to DUAL, is in
, where
denotes the complexity class of all problems that after a nondeterministic
guess of bits can be decided (checked) within complexity class
. It was conjectured that non-DUAL is in . In this paper we prove this conjecture and actually
place the non-DUAL problem into the complexity class which is a subclass of . We here refer to the logtime-uniform version of
, which corresponds to , i.e., first order
logic augmented by counting quantifiers. We achieve the latter bound in two
steps. First, based on existing problem decomposition methods, we develop a new
nondeterministic algorithm for non-DUAL that requires to guess
bits. We then proceed by a logical analysis of this algorithm, allowing us to
formulate its deterministic part in . From this result, by
the well known inclusion , it follows
that DUAL belongs also to . Finally, by exploiting
the principles on which the proposed nondeterministic algorithm is based, we
devise a deterministic algorithm that, given two hypergraphs and
, computes in quadratic logspace a transversal of
missing in .Comment: Restructured the presentation in order to be the extended version of
a paper that will shortly appear in SIAM Journal on Computin
Discovery of the D-basis in binary tables based on hypergraph dualization
Discovery of (strong) association rules, or implications, is an important
task in data management, and it nds application in arti cial intelligence,
data mining and the semantic web. We introduce a novel approach
for the discovery of a speci c set of implications, called the D-basis, that provides
a representation for a reduced binary table, based on the structure of
its Galois lattice. At the core of the method are the D-relation de ned in
the lattice theory framework, and the hypergraph dualization algorithm that
allows us to e ectively produce the set of transversals for a given Sperner hypergraph.
The latter algorithm, rst developed by specialists from Rutgers
Center for Operations Research, has already found numerous applications in
solving optimization problems in data base theory, arti cial intelligence and
game theory. One application of the method is for analysis of gene expression
data related to a particular phenotypic variable, and some initial testing is
done for the data provided by the University of Hawaii Cancer Cente
- …