34,169 research outputs found

    The matching relaxation for a class of generalized set partitioning problems

    Full text link
    This paper introduces a discrete relaxation for the class of combinatorial optimization problems which can be described by a set partitioning formulation under packing constraints. We present two combinatorial relaxations based on computing maximum weighted matchings in suitable graphs. Besides providing dual bounds, the relaxations are also used on a variable reduction technique and a matheuristic. We show how that general method can be tailored to sample applications, and also perform a successful computational evaluation with benchmark instances of a problem in maritime logistics.Comment: 33 pages. A preliminary (4-page) version of this paper was presented at CTW 2016 (Cologne-Twente Workshop on Graphs and Combinatorial Optimization), with proceedings on Electronic Notes in Discrete Mathematic

    Explain3D: Explaining Disagreements in Disjoint Datasets

    Get PDF
    Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries' differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently

    Partitioning Complex Networks via Size-constrained Clustering

    Full text link
    The most commonly used method to tackle the graph partitioning problem in practice is the multilevel approach. During a coarsening phase, a multilevel graph partitioning algorithm reduces the graph size by iteratively contracting nodes and edges until the graph is small enough to be partitioned by some other algorithm. A partition of the input graph is then constructed by successively transferring the solution to the next finer graph and applying a local search algorithm to improve the current solution. In this paper, we describe a novel approach to partition graphs effectively especially if the networks have a highly irregular structure. More precisely, our algorithm provides graph coarsening by iteratively contracting size-constrained clusterings that are computed using a label propagation algorithm. The same algorithm that provides the size-constrained clusterings can also be used during uncoarsening as a fast and simple local search algorithm. Depending on the algorithm's configuration, we are able to compute partitions of very high quality outperforming all competitors, or partitions that are comparable to the best competitor in terms of quality, hMetis, while being nearly an order of magnitude faster on average. The fastest configuration partitions the largest graph available to us with 3.3 billion edges using a single machine in about ten minutes while cutting less than half of the edges than the fastest competitor, kMetis

    Clustering and Community Detection with Imbalanced Clusters

    Full text link
    Spectral clustering methods which are frequently used in clustering and community detection applications are sensitive to the specific graph constructions particularly when imbalanced clusters are present. We show that ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced cluster sizes since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced cluster sizes. Our approach parameterizes a family of graphs by adaptively modulating node degrees on a fixed node set, yielding a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach and demonstrate the superiority of our method through experiments on synthetic and real datasets for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted to IEEE TSIP

    Spectral Clustering with Imbalanced Data

    Full text link
    Spectral clustering is sensitive to how graphs are constructed from data particularly when proximal and imbalanced clusters are present. We show that Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced data since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced data. Our approach parameterizes a family of graphs, by adaptively modulating node degrees on a fixed node set, to yield a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach. We demonstrate the superiority of our method through unsupervised and semi-supervised experiments on synthetic and real data sets.Comment: 24 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1302.513
    corecore