35 research outputs found
Optimum matchings in weighted bipartite graphs
Given an integer weighted bipartite graph we consider the problems of finding all the edges that occur in
some minimum weight matching of maximum cardinality and enumerating all the
minimum weight perfect matchings. Moreover, we construct a subgraph of
which depends on an -optimal solution of the dual linear program
associated to the assignment problem on that allows us to reduced
this problems to their unweighed variants on . For instance, when
has a perfect matching and we have an -optimal solution of the dual
linear program associated to the assignment problem on , we solve the
problem of finding all the edges that occur in some minimum weight perfect
matching in linear time on the number of edges. Therefore, starting from
scratch we get an algorithm that solves this problem in time
, where , , and .Comment: 11 page
Robust Assignments via Ear Decompositions and Randomized Rounding
Many real-life planning problems require making a priori decisions before all
parameters of the problem have been revealed. An important special case of such
problem arises in scheduling problems, where a set of tasks needs to be
assigned to the available set of machines or personnel (resources), in a way
that all tasks have assigned resources, and no two tasks share the same
resource. In its nominal form, the resulting computational problem becomes the
\emph{assignment problem} on general bipartite graphs.
This paper deals with a robust variant of the assignment problem modeling
situations where certain edges in the corresponding graph are \emph{vulnerable}
and may become unavailable after a solution has been chosen. The goal is to
choose a minimum-cost collection of edges such that if any vulnerable edge
becomes unavailable, the remaining part of the solution contains an assignment
of all tasks.
We present approximation results and hardness proofs for this type of
problems, and establish several connections to well-known concepts from
matching theory, robust optimization and LP-based techniques.Comment: Full version of ICALP 2016 pape
Revisiting distance-based record linkage for privacy-preserving release of statistical datasets
Statistical Disclosure Control (SDC, for short) studies the problem of privacy-preserving data publishing in cases where the data is expected to be used for statistical analysis. An original dataset T containing sensitive information is transformed into a sanitized version T' which is released to the public. Both utility and privacy aspects are very important in this setting. For utility, T' must allow data miners or statisticians to obtain similar results to those which would have been obtained from the original dataset T. For privacy, T' must significantly reduce the ability of an adversary to infer sensitive information on the data subjects in T. One of the main a-posteriori measures that the SDC community has considered up to now when analyzing the privacy offered by a given protection method is the Distance-Based Record Linkage (DBRL) risk measure. In this work, we argue that the classical DBRL risk measure is insufficient. For this reason, we introduce the novel Global Distance-Based Record Linkage (GDBRL) risk measure. We claim that this new measure must be evaluated alongside the classical DBRL measure in order to better assess the risk in publishing T' instead of T. After that, we describe how this new measure can be computed by the data owner and discuss the scalability of those computations. We conclude by extensive experimentation where we compare the risk assessments offered by our novel measure as well as by the classical one, using well-known SDC protection methods. Those experiments validate our hypothesis that the GDBRL risk measure issues, in many cases, higher risk assessments than the classical DBRL measure. In other words, relying solely on the classical DBRL measure for risk assessment might be misleading, as the true risk may be in fact higher. Hence, we strongly recommend that the SDC community considers the new GDBRL risk measure as an additional measure when analyzing the privacy offered by SDC protection algorithms.Postprint (author's final draft
Gene Orthology Inference via Large-Scale Rearrangements for Partially Assembled Genomes
Recently we developed a gene orthology inference tool based on genome rearrangements (Journal of Bioinformatics and Computational Biology 19:6, 2021). Given a set of genomes our method first computes all pairwise gene similarities. Then it runs pairwise ILP comparisons to compute optimal gene matchings, which minimize, by taking the similarities into account, the weighted rearrangement distance between the analyzed genomes (a problem that is NP-hard). The gene matchings are then integrated into gene families in the final step. Although the ILP is quite efficient and could conceptually analyze genomes that are not completely assembled but split in several contigs, our tool failed in completing that task. The main reason is that each ILP pairwise comparison includes an optimal capping that connects each end of a linear segment of one genome to an end of a linear segment in the other genome, producing an exponential increase of the search space.
In this work, we design and implement a heuristic capping algorithm that replaces the optimal capping by clustering (based on their gene content intersections) the linear segments into m ? 1 subsets, whose ends are capped independently. Furthermore, in each subset, instead of allowing all possible connections, we let only the ends of content-related segments be connected. Although there is no guarantee that m is much bigger than one, and with the possible side effect of resulting in sub-optimal instead of optimal gene matchings, the heuristic works very well in practice, from both the speed performance and the quality of computed solutions. Our experiments on real data show that we can now efficiently analyze fruit fly genomes with unfinished assemblies distributed in hundreds or even thousands of contigs, obtaining orthologies that are more similar to FlyBase orthologies when compared to orthologies computed by other inference tools. Moreover, for complete assemblies the version with heuristic capping reports orthologies that are very similar to the orthologies computed by the optimal version of our tool. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities
Relative-Interior Solution for (Incomplete) Linear Assignment Problem with Applications to Quadratic Assignment Problem
We study the set of optimal solutions of the dual linear programming
formulation of the linear assignment problem (LAP) to propose a method for
computing a solution from the relative interior of this set. Assuming that an
arbitrary dual-optimal solution and an optimal assignment are available (for
which many efficient algorithms already exist), our method computes a
relative-interior solution in linear time. Since LAP occurs as a subproblem in
the linear programming relaxation of quadratic assignment problem (QAP), we
employ our method as a new component in the family of dual-ascent algorithms
that provide bounds on the optimal value of QAP. To make our results applicable
to incomplete QAP, which is of interest in practical use-cases, we also provide
a linear-time reduction from incomplete LAP to complete LAP along with a
mapping that preserves optimality and membership in the relative interior. Our
experiments on publicly available benchmarks indicate that our approach with
relative-interior solution is frequently capable of providing superior bounds
and otherwise is at least comparable
Optimal assignment problem on record linkage
We present an application of the Hungarian Method, an optimal assignment graph theory algorithm, to record linkage in order to improve the disclosure risk assessment. We should note that Hungarian Method has O(n^3) complexity; three different methods are presented to reduce its computational cost