263,529 research outputs found

    K-tree: Large Scale Document Clustering

    Get PDF
    We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.Comment: 2 pages, SIGIR 200

    Fast Algorithms for Constructing Maximum Entropy Summary Trees

    Full text link
    Karloff? and Shirley recently proposed summary trees as a new way to visualize large rooted trees (Eurovis 2013) and gave algorithms for generating a maximum-entropy k-node summary tree of an input n-node rooted tree. However, the algorithm generating optimal summary trees was only pseudo-polynomial (and worked only for integral weights); the authors left open existence of a olynomial-time algorithm. In addition, the authors provided an additive approximation algorithm and a greedy heuristic, both working on real weights. This paper shows how to construct maximum entropy k-node summary trees in time O(k^2 n + n log n) for real weights (indeed, as small as the time bound for the greedy heuristic given previously); how to speed up the approximation algorithm so that it runs in time O(n + (k^4/eps?) log(k/eps?)), and how to speed up the greedy algorithm so as to run in time O(kn + n log n). Altogether, these results make summary trees a much more practical tool than before.Comment: 17 pages, 4 figures. Extended version of paper appearing in ICALP 201

    Approximating Directed Steiner Problems via Tree Embedding

    Get PDF
    In the k-edge connected directed Steiner tree (k-DST) problem, we are given a directed graph G on n vertices with edge-costs, a root vertex r, a set of h terminals T and an integer k. The goal is to find a min-cost subgraph H of G that connects r to each terminal t by k edge-disjoint r,t-paths. This problem includes as special cases the well-known directed Steiner tree (DST) problem (the case k = 1) and the group Steiner tree (GST) problem. Despite having been studied and mentioned many times in literature, e.g., by Feldman et al. [SODA'09, JCSS'12], by Cheriyan et al. [SODA'12, TALG'14] and by Laekhanukit [SODA'14], there was no known non-trivial approximation algorithm for k-DST for k >= 2 even in the special case that an input graph is directed acyclic and has a constant number of layers. If an input graph is not acyclic, the complexity status of k-DST is not known even for a very strict special case that k= 2 and |T| = 2. In this paper, we make a progress toward developing a non-trivial approximation algorithm for k-DST. We present an O(D k^{D-1} log n)-approximation algorithm for k-DST on directed acyclic graphs (DAGs) with D layers, which can be extended to a special case of k-DST on "general graphs" when an instance has a D-shallow optimal solution, i.e., there exist k edge-disjoint r,t-paths, each of length at most D, for every terminal t. For the case k= 1 (DST), our algorithm yields an approximation ratio of O(D log h), thus implying an O(log^3 h)-approximation algorithm for DST that runs in quasi-polynomial-time (due to the height-reduction of Zelikovsky [Algorithmica'97]). Consequently, as our algorithm works for general graphs, we obtain an O(D k^{D-1} log n)-approximation algorithm for a D-shallow instance of the k-edge-connected directed Steiner subgraph problem, where we wish to connect every pair of terminals by k-edge-disjoint paths

    A sufficiently fast algorithm for finding close to optimal clique trees

    Get PDF
    AbstractWe offer an algorithm that finds a clique tree such that the size of the largest clique is at most (2α+1)k where k is the size of the largest clique in a clique tree in which this size is minimized and α is the approximation ratio of an α-approximation algorithm for the 3-way vertex cut problem. When α=4/3, our algorithm's complexity is O(24.67kn·poly(n)) and it errs by a factor of 3.67 where poly(n) is the running time of linear programming. This algorithm is extended to find clique trees in which the state space of the largest clique is bounded. When k=O(logn), our algorithm yields a polynomial inference algorithm for Bayesian networks

    On Finding the Adams Consensus Tree

    Get PDF
    This paper presents a fast algorithm for finding the Adams consensus tree of a set of conflicting phylogenetic trees with identical leaf labels, for the first time improving the time complexity of a widely used algorithm invented by Adams in 1972 [1]. Our algorithm applies the centroid path decomposition technique [9] in a new way to traverse the input trees\u27 centroid paths in unison, and runs in O(k n log n) time, where k is the number of input trees and n is the size of the leaf label set. (In comparison, the old algorithm from 1972 has a worst-case running time of O(k n^2).) For the special case of k = 2, an even faster algorithm running in O(n cdot frac{log n}{loglog n}) time is provided, which relies on an extension of the wavelet tree-based technique by Bose et al. [6] for orthogonal range counting on a grid. Our extended wavelet tree data structure also supports truncated range maximum queries efficiently and may be of independent interest to algorithm designers

    Stackelberg Network Pricing Games

    Get PDF
    We study a multi-player one-round game termed Stackelberg Network Pricing Game, in which a leader can set prices for a subset of mm priceable edges in a graph. The other edges have a fixed cost. Based on the leader's decision one or more followers optimize a polynomial-time solvable combinatorial minimization problem and choose a minimum cost solution satisfying their requirements based on the fixed costs and the leader's prices. The leader receives as revenue the total amount of prices paid by the followers for priceable edges in their solutions, and the problem is to find revenue maximizing prices. Our model extends several known pricing problems, including single-minded and unit-demand pricing, as well as Stackelberg pricing for certain follower problems like shortest path or minimum spanning tree. Our first main result is a tight analysis of a single-price algorithm for the single follower game, which provides a (1+ϵ)logm(1+\epsilon) \log m-approximation for any ϵ>0\epsilon >0. This can be extended to provide a (1+ϵ)(logk+logm)(1+\epsilon)(\log k + \log m)-approximation for the general problem and kk followers. The latter result is essentially best possible, as the problem is shown to be hard to approximate within \mathcal{O(\log^\epsilon k + \log^\epsilon m). If followers have demands, the single-price algorithm provides a (1+ϵ)m2(1+\epsilon)m^2-approximation, and the problem is hard to approximate within \mathcal{O(m^\epsilon) for some ϵ>0\epsilon >0. Our second main result is a polynomial time algorithm for revenue maximization in the special case of Stackelberg bipartite vertex cover, which is based on non-trivial max-flow and LP-duality techniques. Our results can be extended to provide constant-factor approximations for any constant number of followers
    corecore