656 research outputs found
Knowledge Refinement via Rule Selection
In several different applications, including data transformation and entity
resolution, rules are used to capture aspects of knowledge about the
application at hand. Often, a large set of such rules is generated
automatically or semi-automatically, and the challenge is to refine the
encapsulated knowledge by selecting a subset of rules based on the expected
operational behavior of the rules on available data. In this paper, we carry
out a systematic complexity-theoretic investigation of the following rule
selection problem: given a set of rules specified by Horn formulas, and a pair
of an input database and an output database, find a subset of the rules that
minimizes the total error, that is, the number of false positive and false
negative errors arising from the selected rules. We first establish
computational hardness results for the decision problems underlying this
minimization problem, as well as upper and lower bounds for its
approximability. We then investigate a bi-objective optimization version of the
rule selection problem in which both the total error and the size of the
selected rules are taken into account. We show that testing for membership in
the Pareto front of this bi-objective optimization problem is DP-complete.
Finally, we show that a similar DP-completeness result holds for a bi-level
optimization version of the rule selection problem, where one minimizes first
the total error and then the size
Entity-Linking via Graph-Distance Minimization
Entity-linking is a natural-language-processing task that consists in
identifying the entities mentioned in a piece of text, linking each to an
appropriate item in some knowledge base; when the knowledge base is Wikipedia,
the problem comes to be known as wikification (in this case, items are
wikipedia articles). One instance of entity-linking can be formalized as an
optimization problem on the underlying concept graph, where the quantity to be
optimized is the average distance between chosen items. Inspired by this
application, we define a new graph problem which is a natural variant of the
Maximum Capacity Representative Set. We prove that our problem is NP-hard for
general graphs; nonetheless, under some restrictive assumptions, it turns out
to be solvable in linear time. For the general case, we propose two heuristics:
one tries to enforce the above assumptions and another one is based on the
notion of hitting distance; we show experimentally how these approaches perform
with respect to some baselines on a real-world dataset.Comment: In Proceedings GRAPHITE 2014, arXiv:1407.7671. The second and third
authors were supported by the EU-FET grant NADINE (GA 288956
The Power of Linear Programming for Valued CSPs
A class of valued constraint satisfaction problems (VCSPs) is characterised
by a valued constraint language, a fixed set of cost functions on a finite
domain. An instance of the problem is specified by a sum of cost functions from
the language with the goal to minimise the sum. This framework includes and
generalises well-studied constraint satisfaction problems (CSPs) and maximum
constraint satisfaction problems (Max-CSPs).
Our main result is a precise algebraic characterisation of valued constraint
languages whose instances can be solved exactly by the basic linear programming
relaxation. Using this result, we obtain tractability of several novel and
previously widely-open classes of VCSPs, including problems over valued
constraint languages that are: (1) submodular on arbitrary lattices; (2)
bisubmodular (also known as k-submodular) on arbitrary finite domains; (3)
weakly (and hence strongly) tree-submodular on arbitrary trees.Comment: Corrected a few typo
Approximating max-min linear programs with local algorithms
A local algorithm is a distributed algorithm where each node must operate
solely based on the information that was available at system startup within a
constant-size neighbourhood of the node. We study the applicability of local
algorithms to max-min LPs where the objective is to maximise subject to for each and
for each . Here , , and the support sets , ,
and have bounded size. In the distributed setting,
each agent is responsible for choosing the value of , and the
communication network is a hypergraph where the sets and
constitute the hyperedges. We present inapproximability results for a
wide range of structural assumptions; for example, even if and
are bounded by some constants larger than 2, there is no local approximation
scheme. To contrast the negative results, we present a local approximation
algorithm which achieves good approximation ratios if we can bound the relative
growth of the vertex neighbourhoods in .Comment: 16 pages, 2 figure
Matchings with lower quotas: Algorithms and complexity
We study a natural generalization of the maximum weight many-to-one matching problem. We are given an undirected bipartite graph G=(A∪˙P,E)G=(A∪˙P,E) with weights on the edges in E, and with lower and upper quotas on the vertices in P. We seek a maximum weight many-to-one matching satisfying two sets of constraints: vertices in A are incident to at most one matching edge, while vertices in P are either unmatched or they are incident to a number of matching edges between their lower and upper quota. This problem, which we call maximum weight many-to-one matching with lower and upper quotas (WMLQ), has applications to the assignment of students to projects within university courses, where there are constraints on the minimum and maximum numbers of students that must be assigned to each project. In this paper, we provide a comprehensive analysis of the complexity of WMLQ from the viewpoints of classical polynomial time algorithms, fixed-parameter tractability, as well as approximability. We draw the line between NPNP-hard and polynomially tractable instances in terms of degree and quota constraints and provide efficient algorithms to solve the tractable ones. We further show that the problem can be solved in polynomial time for instances with bounded treewidth; however, the corresponding runtime is exponential in the treewidth with the maximum upper quota umaxumax as basis, and we prove that this dependence is necessary unless FPT=W[1]FPT=W[1]. The approximability of WMLQ is also discussed: we present an approximation algorithm for the general case with performance guarantee umax+1umax+1, which is asymptotically best possible unless P=NPP=NP. Finally, we elaborate on how most of our positive results carry over to matchings in arbitrary graphs with lower quotas
Optimal Algorithms for Scheduling under Time-of-Use Tariffs
We consider a natural generalization of classical scheduling problems in
which using a time unit for processing a job causes some time-dependent cost
which must be paid in addition to the standard scheduling cost. We study the
scheduling objectives of minimizing the makespan and the sum of (weighted)
completion times. It is not difficult to derive a polynomial-time algorithm for
preemptive scheduling to minimize the makespan on unrelated machines. The
problem of minimizing the total (weighted) completion time is considerably
harder, even on a single machine. We present a polynomial-time algorithm that
computes for any given sequence of jobs an optimal schedule, i.e., the optimal
set of time-slots to be used for scheduling jobs according to the given
sequence. This result is based on dynamic programming using a subtle analysis
of the structure of optimal solutions and a potential function argument. With
this algorithm, we solve the unweighted problem optimally in polynomial time.
For the more general problem, in which jobs may have individual weights, we
develop a polynomial-time approximation scheme (PTAS) based on a dual
scheduling approach introduced for scheduling on a machine of varying speed. As
the weighted problem is strongly NP-hard, our PTAS is the best possible
approximation we can hope for.Comment: 17 pages; A preliminary version of this paper with a subset of
results appeared in the Proceedings of MFCS 201
Max flow vitality in general and -planar graphs
The \emph{vitality} of an arc/node of a graph with respect to the maximum
flow between two fixed nodes and is defined as the reduction of the
maximum flow caused by the removal of that arc/node. In this paper we address
the issue of determining the vitality of arcs and/or nodes for the maximum flow
problem. We show how to compute the vitality of all arcs in a general
undirected graph by solving only max flow instances and, In
-planar graphs (directed or undirected) we show how to compute the vitality
of all arcs and all nodes in worst-case time. Moreover, after
determining the vitality of arcs and/or nodes, and given a planar embedding of
the graph, we can determine the vitality of a `contiguous' set of arcs/nodes in
time proportional to the size of the set.Comment: 12 pages, 3 figure
- …