34,169 research outputs found
The matching relaxation for a class of generalized set partitioning problems
This paper introduces a discrete relaxation for the class of combinatorial
optimization problems which can be described by a set partitioning formulation
under packing constraints. We present two combinatorial relaxations based on
computing maximum weighted matchings in suitable graphs. Besides providing dual
bounds, the relaxations are also used on a variable reduction technique and a
matheuristic. We show how that general method can be tailored to sample
applications, and also perform a successful computational evaluation with
benchmark instances of a problem in maritime logistics.Comment: 33 pages. A preliminary (4-page) version of this paper was presented
at CTW 2016 (Cologne-Twente Workshop on Graphs and Combinatorial
Optimization), with proceedings on Electronic Notes in Discrete Mathematic
Explain3D: Explaining Disagreements in Disjoint Datasets
Data plays an important role in applications, analytic processes, and many
aspects of human activity. As data grows in size and complexity, we are met
with an imperative need for tools that promote understanding and explanations
over data-related operations. Data management research on explanations has
focused on the assumption that data resides in a single dataset, under one
common schema. But the reality of today's data is that it is frequently
un-integrated, coming from different sources with different schemas. When
different datasets provide different answers to semantically similar questions,
understanding the reasons for the discrepancies is challenging and cannot be
handled by the existing single-dataset solutions.
In this paper, we propose Explain3D, a framework for explaining the
disagreements across disjoint datasets (3D). Explain3D focuses on identifying
the reasons for the differences in the results of two semantically similar
queries operating on two datasets with potentially different schemas. Our
framework leverages the queries to perform a semantic mapping across the
relevant parts of their provenance; discrepancies in this mapping point to
causes of the queries' differences. Exploiting the queries gives Explain3D an
edge over traditional schema matching and record linkage techniques, which are
query-agnostic. Our work makes the following contributions: (1) We formalize
the problem of deriving optimal explanations for the differences of the results
of semantically similar queries over disjoint datasets. (2) We design a 3-stage
framework for solving the optimal explanation problem. (3) We develop a
smart-partitioning optimizer that improves the efficiency of the framework by
orders of magnitude. (4)~We experiment with real-world and synthetic data to
demonstrate that Explain3D can derive precise explanations efficiently
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Partitioning Complex Networks via Size-constrained Clustering
The most commonly used method to tackle the graph partitioning problem in
practice is the multilevel approach. During a coarsening phase, a multilevel
graph partitioning algorithm reduces the graph size by iteratively contracting
nodes and edges until the graph is small enough to be partitioned by some other
algorithm. A partition of the input graph is then constructed by successively
transferring the solution to the next finer graph and applying a local search
algorithm to improve the current solution.
In this paper, we describe a novel approach to partition graphs effectively
especially if the networks have a highly irregular structure. More precisely,
our algorithm provides graph coarsening by iteratively contracting
size-constrained clusterings that are computed using a label propagation
algorithm. The same algorithm that provides the size-constrained clusterings
can also be used during uncoarsening as a fast and simple local search
algorithm.
Depending on the algorithm's configuration, we are able to compute partitions
of very high quality outperforming all competitors, or partitions that are
comparable to the best competitor in terms of quality, hMetis, while being
nearly an order of magnitude faster on average. The fastest configuration
partitions the largest graph available to us with 3.3 billion edges using a
single machine in about ten minutes while cutting less than half of the edges
than the fastest competitor, kMetis
Clustering and Community Detection with Imbalanced Clusters
Spectral clustering methods which are frequently used in clustering and
community detection applications are sensitive to the specific graph
constructions particularly when imbalanced clusters are present. We show that
ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced cluster sizes since they tend to emphasize cut sizes over cut
values. We propose a graph partitioning problem that seeks minimum cut
partitions under minimum size constraints on partitions to deal with imbalanced
cluster sizes. Our approach parameterizes a family of graphs by adaptively
modulating node degrees on a fixed node set, yielding a set of parameter
dependent cuts reflecting varying levels of imbalance. The solution to our
problem is then obtained by optimizing over these parameters. We present
rigorous limit cut analysis results to justify our approach and demonstrate the
superiority of our method through experiments on synthetic and real datasets
for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted
to IEEE TSIP
Spectral Clustering with Imbalanced Data
Spectral clustering is sensitive to how graphs are constructed from data
particularly when proximal and imbalanced clusters are present. We show that
Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced data since they tend to emphasize cut sizes over cut values. We
propose a graph partitioning problem that seeks minimum cut partitions under
minimum size constraints on partitions to deal with imbalanced data. Our
approach parameterizes a family of graphs, by adaptively modulating node
degrees on a fixed node set, to yield a set of parameter dependent cuts
reflecting varying levels of imbalance. The solution to our problem is then
obtained by optimizing over these parameters. We present rigorous limit cut
analysis results to justify our approach. We demonstrate the superiority of our
method through unsupervised and semi-supervised experiments on synthetic and
real data sets.Comment: 24 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1302.513
- …