9,589 research outputs found

    Constraint-based Sequential Pattern Mining with Decision Diagrams

    Full text link
    Constrained sequential pattern mining aims at identifying frequent patterns on a sequential database of items while observing constraints defined over the item attributes. We introduce novel techniques for constraint-based sequential pattern mining that rely on a multi-valued decision diagram representation of the database. Specifically, our representation can accommodate multiple item attributes and various constraint types, including a number of non-monotone constraints. To evaluate the applicability of our approach, we develop an MDD-based prefix-projection algorithm and compare its performance against a typical generate-and-check variant, as well as a state-of-the-art constraint-based sequential pattern mining algorithm. Results show that our approach is competitive with or superior to these other methods in terms of scalability and efficiency.Comment: AAAI201

    Parameterized Streaming Algorithms for Min-Ones d-SAT

    Get PDF
    In this work, we initiate the study of the Min-Ones d-SAT problem in the parameterized streaming model. An instance of the problem consists of a d-CNF formula F and an integer k, and the objective is to determine if F has a satisfying assignment which sets at most k variables to 1. In the parameterized streaming model, input is provided as a stream, just as in the usual streaming model. A key difference is that the bound on the read-write memory available to the algorithm is O(f(k) log n) (f: N -> N, a computable function) as opposed to the O(log n) bound of the usual streaming model. The other important difference is that the number of passes the algorithm makes over its input must be a (preferably small) function of k. We design a (k + 1)-pass parameterized streaming algorithm that solves Min-Ones d-SAT (d >= 2) using space O((kd^(ck) + k^d)log n) (c > 0, a constant) and a (d + 1)^k-pass algorithm that uses space O(k log n). We also design a streaming kernelization for Min-Ones 2-SAT that makes (k + 2) passes and uses space O(k^6 log n) to produce a kernel with O(k^6) clauses. To complement these positive results, we show that any k-pass algorithm for or Min-Ones d-SAT (d >= 2) requires space Omega(max{n^(1/k) / 2^k, log(n / k)}) on instances (F, k). This is achieved via a reduction from the streaming problem POT Pointer Chasing (Guha and McGregor [ICALP 2008]), which might be of independent interest. Given this, our (k + 1)-pass parameterized streaming algorithm is the best possible, inasmuch as the number of passes is concerned. In contrast to the results of Fafianie and Kratsch [MFCS 2014] and Chitnis et al. [SODA 2015], who independently showed that there are 1-pass parameterized streaming algorithms for Vertex Cover (a restriction of Min-Ones 2-SAT), we show using lower bounds from Communication Complexity that for any d >= 1, a 1-pass streaming algorithm for Min-Ones d-SAT requires space Omega(n). This excludes the possibility of a 1-pass parameterized streaming algorithm for the problem. Additionally, we show that any p-pass algorithm for the problem requires space Omega(n/p)

    Efficient Closed Pattern Mining in the Presence of Tough Block Constraints

    Get PDF
    In recent years, various constrained frequent pattern mining problem formulations and associated algorithms have been developed that enable the user to specify various itemsetbased constraints that better capture the underlying application requirements and characteristics. In this paper we introduce a new class of block constraints that determine the significance of an itemset pattern by considering the dense block that is formed by the pattern's items and its associated set of transactions. Block constraints provide a natural framework by which a number of important problems can be specified and make it possible to solve numerous problems on binary and real-valued datasets. However, developing computationally e#cient algorithms to find these block constraints poses a number of challenges as unlike the di#erent itemset-based constraints studied earlier, these block constraints are tough as they are neither anti-monotone, monotone, nor convertible. To overcome this problem, we introduce a new class of pruning methods that can be used to significantly reduce the overall search space and make it possible to develop computationally e#cient block constraint mining algorithms. We present an algorithm called CBMiner that takes advantage of these pruning methods to develop an algorithm for finding the closed itemsets that satisfy the block constraints. Our extensive performance study shows that CBMiner generates more concise result set and can be order(s) of magnitude faster than the traditional frequent closed itemset mining algorithms

    The Inverse Shapley Value Problem

    Full text link
    For ff a weighted voting scheme used by nn voters to choose between two candidates, the nn \emph{Shapley-Shubik Indices} (or {\em Shapley values}) of ff provide a measure of how much control each voter can exert over the overall outcome of the vote. Shapley-Shubik indices were introduced by Lloyd Shapley and Martin Shubik in 1954 \cite{SS54} and are widely studied in social choice theory as a measure of the "influence" of voters. The \emph{Inverse Shapley Value Problem} is the problem of designing a weighted voting scheme which (approximately) achieves a desired input vector of values for the Shapley-Shubik indices. Despite much interest in this problem no provably correct and efficient algorithm was known prior to our work. We give the first efficient algorithm with provable performance guarantees for the Inverse Shapley Value Problem. For any constant \eps > 0 our algorithm runs in fixed poly(n)(n) time (the degree of the polynomial is independent of \eps) and has the following performance guarantee: given as input a vector of desired Shapley values, if any "reasonable" weighted voting scheme (roughly, one in which the threshold is not too skewed) approximately matches the desired vector of values to within some small error, then our algorithm explicitly outputs a weighted voting scheme that achieves this vector of Shapley values to within error \eps. If there is a "reasonable" voting scheme in which all voting weights are integers at most \poly(n) that approximately achieves the desired Shapley values, then our algorithm runs in time \poly(n) and outputs a weighted voting scheme that achieves the target vector of Shapley values to within error $\eps=n^{-1/8}.

    A Law of Large Numbers for Weighted Majority

    Get PDF
    Consider an election between two candidates in which the voters' choices are random and independent and the probability of a voter choosing the first candidate is p>1/2p>1/2. Condorcet's Jury Theorem which he derived from the weak law of large numbers asserts that if the number of voters tends to infinity then the probability that the first candidate will be elected tends to one. The notion of influence of a voter or its voting power is relevant for extensions of the weak law of large numbers for voting rules which are more general than simple majority. In this paper we point out two different ways to extend the classical notions of voting power and influences to arbitrary probability distributions. The extension relevant to us is the ``effect'' of a voter, which is a weighted version of the correlation between the voter's vote and the election's outcomes. We prove an extension of the weak law of large numbers to weighted majority games when all individual effects are small and show that this result does not apply to any voting rule which is not based on weighted majority

    FPTAS for Counting Monotone CNF

    Full text link
    A monotone CNF formula is a Boolean formula in conjunctive normal form where each variable appears positively. We design a deterministic fully polynomial-time approximation scheme (FPTAS) for counting the number of satisfying assignments for a given monotone CNF formula when each variable appears in at most 55 clauses. Equivalently, this is also an FPTAS for counting set covers where each set contains at most 55 elements. If we allow variables to appear in a maximum of 66 clauses (or sets to contain 66 elements), it is NP-hard to approximate it. Thus, this gives a complete understanding of the approximability of counting for monotone CNF formulas. It is also an important step towards a complete characterization of the approximability for all bounded degree Boolean #CSP problems. In addition, we study the hypergraph matching problem, which arises naturally towards a complete classification of bounded degree Boolean #CSP problems, and show an FPTAS for counting 3D matchings of hypergraphs with maximum degree 44. Our main technique is correlation decay, a powerful tool to design deterministic FPTAS for counting problems defined by local constraints among a number of variables. All previous uses of this design technique fall into two categories: each constraint involves at most two variables, such as independent set, coloring, and spin systems in general; or each variable appears in at most two constraints, such as matching, edge cover, and holant problem in general. The CNF problems studied here have more complicated structures than these problems and require new design and proof techniques. As it turns out, the technique we developed for the CNF problem also works for the hypergraph matching problem. We believe that it may also find applications in other CSP or more general counting problems.Comment: 24 pages, 2 figures. version 1=>2: minor edits, highlighted the picture of set cover/packing, and an implication of our previous result in 3D matchin
    • …
    corecore