34,336 research outputs found

    Approximation algorithms for stochastic and risk-averse optimization

    Full text link
    We present improved approximation algorithms in stochastic optimization. We prove that the multi-stage stochastic versions of covering integer programs (such as set cover and vertex cover) admit essentially the same approximation algorithms as their standard (non-stochastic) counterparts; this improves upon work of Swamy \& Shmoys which shows an approximability that depends multiplicatively on the number of stages. We also present approximation algorithms for facility location and some of its variants in the 22-stage recourse model, improving on previous approximation guarantees. We give a 2.29752.2975-approximation algorithm in the standard polynomial-scenario model and an algorithm with an expected per-scenario 2.49572.4957-approximation guarantee, which is applicable to the more general black-box distribution model.Comment: Extension of a SODA'07 paper. To appear in SIAM J. Discrete Mat

    DisC Diversity: Result Diversification based on Dissimilarity and Coverage

    Full text link
    Recently, result diversification has attracted a lot of attention as a means to improve the quality of results retrieved by user queries. In this paper, we propose a new, intuitive definition of diversity called DisC diversity. A DisC diverse subset of a query result contains objects such that each object in the result is represented by a similar object in the diverse subset and the objects in the diverse subset are dissimilar to each other. We show that locating a minimum DisC diverse subset is an NP-hard problem and provide heuristics for its approximation. We also propose adapting DisC diverse subsets to a different degree of diversification. We call this operation zooming. We present efficient implementations of our algorithms based on the M-tree, a spatial index structure, and experimentally evaluate their performance.Comment: To appear at the 39th International Conference on Very Large Data Bases (VLDB), August 26-31, 2013, Riva del Garda, Trento, Ital

    Representation Learning for Clustering: A Statistical Framework

    Full text link
    We address the problem of communicating domain knowledge from a user to the designer of a clustering algorithm. We propose a protocol in which the user provides a clustering of a relatively small random sample of a data set. The algorithm designer then uses that sample to come up with a data representation under which kk-means clustering results in a clustering (of the full data set) that is aligned with the user's clustering. We provide a formal statistical model for analyzing the sample complexity of learning a clustering representation with this paradigm. We then introduce a notion of capacity of a class of possible representations, in the spirit of the VC-dimension, showing that classes of representations that have finite such dimension can be successfully learned with sample size error bounds, and end our discussion with an analysis of that dimension for classes of representations induced by linear embeddings.Comment: To be published in Proceedings of UAI 201

    Improved approximation of arbitrary shapes in dem simulations with multi-spheres

    Get PDF
    DEM simulations are originally made for spherical particles only. But most of real particles are anything but not spherical. Due to this problem, the multi-sphere method was invented. It provides the possibility to clump several spheres together to create complex shape structures. The proposed algorithm offers a novel method to create multi-sphere clumps for the given arbitrary shapes. Especially the use of modern clustering algorithms, from the field of computational intelligence, achieve satisfactory results. The clustering is embedded into an optimisation algorithm which uses a pre-defined criterion. A mostly unaided algorithm with only a few input and hyperparameters is able to approximate arbitrary shapes

    Approximating Clustering of Fingerprint Vectors with Missing Values

    Full text link
    The problem of clustering fingerprint vectors is an interesting problem in Computational Biology that has been proposed in (Figureroa et al. 2004). In this paper we show some improvements in closing the gaps between the known lower bounds and upper bounds on the approximability of some variants of the biological problem. Namely we are able to prove that the problem is APX-hard even when each fingerprint contains only two unknown position. Moreover we have studied some variants of the orginal problem, and we give two 2-approximation algorithm for the IECMV and OECMV problems when the number of unknown entries for each vector is at most a constant.Comment: 13 pages, 4 figure

    Motif Clustering and Overlapping Clustering for Social Network Analysis

    Full text link
    Motivated by applications in social network community analysis, we introduce a new clustering paradigm termed motif clustering. Unlike classical clustering, motif clustering aims to minimize the number of clustering errors associated with both edges and certain higher order graph structures (motifs) that represent "atomic units" of social organizations. Our contributions are two-fold: We first introduce motif correlation clustering, in which the goal is to agnostically partition the vertices of a weighted complete graph so that certain predetermined "important" social subgraphs mostly lie within the same cluster, while "less relevant" social subgraphs are allowed to lie across clusters. We then proceed to introduce the notion of motif covers, in which the goal is to cover the vertices of motifs via the smallest number of (near) cliques in the graph. Motif cover algorithms provide a natural solution for overlapping clustering and they also play an important role in latent feature inference of networks. For both motif correlation clustering and its extension introduced via the covering problem, we provide hardness results, algorithmic solutions and community detection results for two well-studied social networks
    corecore