656 research outputs found

    Knowledge Refinement via Rule Selection

    Full text link
    In several different applications, including data transformation and entity resolution, rules are used to capture aspects of knowledge about the application at hand. Often, a large set of such rules is generated automatically or semi-automatically, and the challenge is to refine the encapsulated knowledge by selecting a subset of rules based on the expected operational behavior of the rules on available data. In this paper, we carry out a systematic complexity-theoretic investigation of the following rule selection problem: given a set of rules specified by Horn formulas, and a pair of an input database and an output database, find a subset of the rules that minimizes the total error, that is, the number of false positive and false negative errors arising from the selected rules. We first establish computational hardness results for the decision problems underlying this minimization problem, as well as upper and lower bounds for its approximability. We then investigate a bi-objective optimization version of the rule selection problem in which both the total error and the size of the selected rules are taken into account. We show that testing for membership in the Pareto front of this bi-objective optimization problem is DP-complete. Finally, we show that a similar DP-completeness result holds for a bi-level optimization version of the rule selection problem, where one minimizes first the total error and then the size

    Entity-Linking via Graph-Distance Minimization

    Get PDF
    Entity-linking is a natural-language-processing task that consists in identifying the entities mentioned in a piece of text, linking each to an appropriate item in some knowledge base; when the knowledge base is Wikipedia, the problem comes to be known as wikification (in this case, items are wikipedia articles). One instance of entity-linking can be formalized as an optimization problem on the underlying concept graph, where the quantity to be optimized is the average distance between chosen items. Inspired by this application, we define a new graph problem which is a natural variant of the Maximum Capacity Representative Set. We prove that our problem is NP-hard for general graphs; nonetheless, under some restrictive assumptions, it turns out to be solvable in linear time. For the general case, we propose two heuristics: one tries to enforce the above assumptions and another one is based on the notion of hitting distance; we show experimentally how these approaches perform with respect to some baselines on a real-world dataset.Comment: In Proceedings GRAPHITE 2014, arXiv:1407.7671. The second and third authors were supported by the EU-FET grant NADINE (GA 288956

    The Power of Linear Programming for Valued CSPs

    Full text link
    A class of valued constraint satisfaction problems (VCSPs) is characterised by a valued constraint language, a fixed set of cost functions on a finite domain. An instance of the problem is specified by a sum of cost functions from the language with the goal to minimise the sum. This framework includes and generalises well-studied constraint satisfaction problems (CSPs) and maximum constraint satisfaction problems (Max-CSPs). Our main result is a precise algebraic characterisation of valued constraint languages whose instances can be solved exactly by the basic linear programming relaxation. Using this result, we obtain tractability of several novel and previously widely-open classes of VCSPs, including problems over valued constraint languages that are: (1) submodular on arbitrary lattices; (2) bisubmodular (also known as k-submodular) on arbitrary finite domains; (3) weakly (and hence strongly) tree-submodular on arbitrary trees.Comment: Corrected a few typo

    Approximating max-min linear programs with local algorithms

    Full text link
    A local algorithm is a distributed algorithm where each node must operate solely based on the information that was available at system startup within a constant-size neighbourhood of the node. We study the applicability of local algorithms to max-min LPs where the objective is to maximise minkvckvxv\min_k \sum_v c_{kv} x_v subject to vaivxv1\sum_v a_{iv} x_v \le 1 for each ii and xv0x_v \ge 0 for each vv. Here ckv0c_{kv} \ge 0, aiv0a_{iv} \ge 0, and the support sets Vi={v:aiv>0}V_i = \{v : a_{iv} > 0 \}, Vk={v:ckv>0}V_k = \{v : c_{kv}>0 \}, Iv={i:aiv>0}I_v = \{i : a_{iv} > 0 \} and Kv={k:ckv>0}K_v = \{k : c_{kv} > 0 \} have bounded size. In the distributed setting, each agent vv is responsible for choosing the value of xvx_v, and the communication network is a hypergraph H\mathcal{H} where the sets VkV_k and ViV_i constitute the hyperedges. We present inapproximability results for a wide range of structural assumptions; for example, even if Vi|V_i| and Vk|V_k| are bounded by some constants larger than 2, there is no local approximation scheme. To contrast the negative results, we present a local approximation algorithm which achieves good approximation ratios if we can bound the relative growth of the vertex neighbourhoods in H\mathcal{H}.Comment: 16 pages, 2 figure

    Matchings with lower quotas: Algorithms and complexity

    Get PDF
    We study a natural generalization of the maximum weight many-to-one matching problem. We are given an undirected bipartite graph G=(A∪˙P,E)G=(A∪˙P,E) with weights on the edges in E, and with lower and upper quotas on the vertices in P. We seek a maximum weight many-to-one matching satisfying two sets of constraints: vertices in A are incident to at most one matching edge, while vertices in P are either unmatched or they are incident to a number of matching edges between their lower and upper quota. This problem, which we call maximum weight many-to-one matching with lower and upper quotas (WMLQ), has applications to the assignment of students to projects within university courses, where there are constraints on the minimum and maximum numbers of students that must be assigned to each project. In this paper, we provide a comprehensive analysis of the complexity of WMLQ from the viewpoints of classical polynomial time algorithms, fixed-parameter tractability, as well as approximability. We draw the line between NPNP-hard and polynomially tractable instances in terms of degree and quota constraints and provide efficient algorithms to solve the tractable ones. We further show that the problem can be solved in polynomial time for instances with bounded treewidth; however, the corresponding runtime is exponential in the treewidth with the maximum upper quota umaxumax as basis, and we prove that this dependence is necessary unless FPT=W[1]FPT=W[1]. The approximability of WMLQ is also discussed: we present an approximation algorithm for the general case with performance guarantee umax+1umax+1, which is asymptotically best possible unless P=NPP=NP. Finally, we elaborate on how most of our positive results carry over to matchings in arbitrary graphs with lower quotas

    Optimal Algorithms for Scheduling under Time-of-Use Tariffs

    Get PDF
    We consider a natural generalization of classical scheduling problems in which using a time unit for processing a job causes some time-dependent cost which must be paid in addition to the standard scheduling cost. We study the scheduling objectives of minimizing the makespan and the sum of (weighted) completion times. It is not difficult to derive a polynomial-time algorithm for preemptive scheduling to minimize the makespan on unrelated machines. The problem of minimizing the total (weighted) completion time is considerably harder, even on a single machine. We present a polynomial-time algorithm that computes for any given sequence of jobs an optimal schedule, i.e., the optimal set of time-slots to be used for scheduling jobs according to the given sequence. This result is based on dynamic programming using a subtle analysis of the structure of optimal solutions and a potential function argument. With this algorithm, we solve the unweighted problem optimally in polynomial time. For the more general problem, in which jobs may have individual weights, we develop a polynomial-time approximation scheme (PTAS) based on a dual scheduling approach introduced for scheduling on a machine of varying speed. As the weighted problem is strongly NP-hard, our PTAS is the best possible approximation we can hope for.Comment: 17 pages; A preliminary version of this paper with a subset of results appeared in the Proceedings of MFCS 201

    Max flow vitality in general and stst-planar graphs

    Full text link
    The \emph{vitality} of an arc/node of a graph with respect to the maximum flow between two fixed nodes ss and tt is defined as the reduction of the maximum flow caused by the removal of that arc/node. In this paper we address the issue of determining the vitality of arcs and/or nodes for the maximum flow problem. We show how to compute the vitality of all arcs in a general undirected graph by solving only 2(n1)2(n-1) max flow instances and, In stst-planar graphs (directed or undirected) we show how to compute the vitality of all arcs and all nodes in O(n)O(n) worst-case time. Moreover, after determining the vitality of arcs and/or nodes, and given a planar embedding of the graph, we can determine the vitality of a `contiguous' set of arcs/nodes in time proportional to the size of the set.Comment: 12 pages, 3 figure
    corecore