9,589 research outputs found
Constraint-based Sequential Pattern Mining with Decision Diagrams
Constrained sequential pattern mining aims at identifying frequent patterns
on a sequential database of items while observing constraints defined over the
item attributes. We introduce novel techniques for constraint-based sequential
pattern mining that rely on a multi-valued decision diagram representation of
the database. Specifically, our representation can accommodate multiple item
attributes and various constraint types, including a number of non-monotone
constraints. To evaluate the applicability of our approach, we develop an
MDD-based prefix-projection algorithm and compare its performance against a
typical generate-and-check variant, as well as a state-of-the-art
constraint-based sequential pattern mining algorithm. Results show that our
approach is competitive with or superior to these other methods in terms of
scalability and efficiency.Comment: AAAI201
Parameterized Streaming Algorithms for Min-Ones d-SAT
In this work, we initiate the study of the Min-Ones d-SAT problem in the parameterized streaming model. An instance of the problem consists of a d-CNF formula F and an integer k, and the objective is to determine if F has a satisfying assignment which sets at most k variables to 1. In the parameterized streaming model, input is provided as a stream, just as in the usual streaming model. A key difference is that the bound on the read-write memory available to the algorithm is O(f(k) log n) (f: N -> N, a computable function) as opposed to the O(log n) bound of the usual streaming model. The other important difference is that the number of passes the algorithm makes over its input must be a (preferably small) function of k.
We design a (k + 1)-pass parameterized streaming algorithm that solves Min-Ones d-SAT (d >= 2) using space O((kd^(ck) + k^d)log n) (c > 0, a constant) and a (d + 1)^k-pass algorithm that uses space O(k log n). We also design a streaming kernelization for Min-Ones 2-SAT that makes (k + 2) passes and uses space O(k^6 log n) to produce a kernel with O(k^6) clauses.
To complement these positive results, we show that any k-pass algorithm for or Min-Ones d-SAT (d >= 2) requires space Omega(max{n^(1/k) / 2^k, log(n / k)}) on instances (F, k). This is achieved via a reduction from the streaming problem POT Pointer Chasing (Guha and McGregor [ICALP 2008]), which might be of independent interest. Given this, our (k + 1)-pass parameterized streaming algorithm is the best possible, inasmuch as the number of passes is concerned.
In contrast to the results of Fafianie and Kratsch [MFCS 2014] and Chitnis et al. [SODA 2015], who independently showed that there are 1-pass parameterized streaming algorithms for Vertex Cover (a restriction of Min-Ones 2-SAT), we show using lower bounds from Communication Complexity that for any d >= 1, a 1-pass streaming algorithm for Min-Ones d-SAT requires space Omega(n). This excludes the possibility of a 1-pass parameterized streaming algorithm for the problem. Additionally, we show that any p-pass algorithm for the problem requires space Omega(n/p)
Efficient Closed Pattern Mining in the Presence of Tough Block Constraints
In recent years, various constrained frequent pattern mining problem formulations and associated algorithms have been developed that enable the user to specify various itemsetbased constraints that better capture the underlying application requirements and characteristics. In this paper we introduce a new class of block constraints that determine the significance of an itemset pattern by considering the dense block that is formed by the pattern's items and its associated set of transactions. Block constraints provide a natural framework by which a number of important problems can be specified and make it possible to solve numerous problems on binary and real-valued datasets. However, developing computationally e#cient algorithms to find these block constraints poses a number of challenges as unlike the di#erent itemset-based constraints studied earlier, these block constraints are tough as they are neither anti-monotone, monotone, nor convertible. To overcome this problem, we introduce a new class of pruning methods that can be used to significantly reduce the overall search space and make it possible to develop computationally e#cient block constraint mining algorithms. We present an algorithm called CBMiner that takes advantage of these pruning methods to develop an algorithm for finding the closed itemsets that satisfy the block constraints. Our extensive performance study shows that CBMiner generates more concise result set and can be order(s) of magnitude faster than the traditional frequent closed itemset mining algorithms
The Inverse Shapley Value Problem
For a weighted voting scheme used by voters to choose between two
candidates, the \emph{Shapley-Shubik Indices} (or {\em Shapley values}) of
provide a measure of how much control each voter can exert over the overall
outcome of the vote. Shapley-Shubik indices were introduced by Lloyd Shapley
and Martin Shubik in 1954 \cite{SS54} and are widely studied in social choice
theory as a measure of the "influence" of voters. The \emph{Inverse Shapley
Value Problem} is the problem of designing a weighted voting scheme which
(approximately) achieves a desired input vector of values for the
Shapley-Shubik indices. Despite much interest in this problem no provably
correct and efficient algorithm was known prior to our work.
We give the first efficient algorithm with provable performance guarantees
for the Inverse Shapley Value Problem. For any constant \eps > 0 our
algorithm runs in fixed poly time (the degree of the polynomial is
independent of \eps) and has the following performance guarantee: given as
input a vector of desired Shapley values, if any "reasonable" weighted voting
scheme (roughly, one in which the threshold is not too skewed) approximately
matches the desired vector of values to within some small error, then our
algorithm explicitly outputs a weighted voting scheme that achieves this vector
of Shapley values to within error \eps. If there is a "reasonable" voting
scheme in which all voting weights are integers at most \poly(n) that
approximately achieves the desired Shapley values, then our algorithm runs in
time \poly(n) and outputs a weighted voting scheme that achieves the target
vector of Shapley values to within error $\eps=n^{-1/8}.
A Law of Large Numbers for Weighted Majority
Consider an election between two candidates in which the voters' choices are
random and independent and the probability of a voter choosing the first
candidate is . Condorcet's Jury Theorem which he derived from the weak
law of large numbers asserts that if the number of voters tends to infinity
then the probability that the first candidate will be elected tends to one. The
notion of influence of a voter or its voting power is relevant for extensions
of the weak law of large numbers for voting rules which are more general than
simple majority. In this paper we point out two different ways to extend the
classical notions of voting power and influences to arbitrary probability
distributions. The extension relevant to us is the ``effect'' of a voter, which
is a weighted version of the correlation between the voter's vote and the
election's outcomes. We prove an extension of the weak law of large numbers to
weighted majority games when all individual effects are small and show that
this result does not apply to any voting rule which is not based on weighted
majority
FPTAS for Counting Monotone CNF
A monotone CNF formula is a Boolean formula in conjunctive normal form where
each variable appears positively. We design a deterministic fully
polynomial-time approximation scheme (FPTAS) for counting the number of
satisfying assignments for a given monotone CNF formula when each variable
appears in at most clauses. Equivalently, this is also an FPTAS for
counting set covers where each set contains at most elements. If we allow
variables to appear in a maximum of clauses (or sets to contain
elements), it is NP-hard to approximate it. Thus, this gives a complete
understanding of the approximability of counting for monotone CNF formulas. It
is also an important step towards a complete characterization of the
approximability for all bounded degree Boolean #CSP problems. In addition, we
study the hypergraph matching problem, which arises naturally towards a
complete classification of bounded degree Boolean #CSP problems, and show an
FPTAS for counting 3D matchings of hypergraphs with maximum degree .
Our main technique is correlation decay, a powerful tool to design
deterministic FPTAS for counting problems defined by local constraints among a
number of variables. All previous uses of this design technique fall into two
categories: each constraint involves at most two variables, such as independent
set, coloring, and spin systems in general; or each variable appears in at most
two constraints, such as matching, edge cover, and holant problem in general.
The CNF problems studied here have more complicated structures than these
problems and require new design and proof techniques. As it turns out, the
technique we developed for the CNF problem also works for the hypergraph
matching problem. We believe that it may also find applications in other CSP or
more general counting problems.Comment: 24 pages, 2 figures. version 1=>2: minor edits, highlighted the
picture of set cover/packing, and an implication of our previous result in 3D
matchin
- …