    A Double Exponential Lower Bound for the Distinct Vectors Problem

    In the (binary) Distinct Vectors problem we are given a binary matrix A with pairwise different rows and want to select at most k columns such that, restricting the matrix to these columns, all rows are still pairwise different. A result by Froese et al. [JCSS] implies a 2^2^(O(k)) * poly(|A|)-time brute-force algorithm for Distinct Vectors. We show that this running time bound is essentially optimal by showing that there is a constant c such that the existence of an algorithm solving Distinct Vectors with running time 2^(O(2^(ck))) * poly(|A|) would contradict the Exponential Time Hypothesis

    Optimal Discretization is Fixed-parameter Tractable

    Given two disjoint sets W1W_1 and W2W_2 of points in the plane, the Optimal Discretization problem asks for the minimum size of a family of horizontal and vertical lines that separate W1W_1 from W2W_2, that is, in every region into which the lines partition the plane there are either only points of W1W_1, or only points of W2W_2, or the region is empty. Equivalently, Optimal Discretization can be phrased as a task of discretizing continuous variables: we would like to discretize the range of xx-coordinates and the range of yy-coordinates into as few segments as possible, maintaining that no pair of points from W1×W2W_1 \times W_2 are projected onto the same pair of segments under this discretization. We provide a fixed-parameter algorithm for the problem, parameterized by the number of lines in the solution. Our algorithm works in time 2O(k2logk)nO(1)2^{O(k^2 \log k)} n^{O(1)}, where kk is the bound on the number of lines to find and nn is the number of points in the input. Our result answers in positive a question of Bonnet, Giannopolous, and Lampis [IPEC 2017] and of Froese (PhD thesis, 2018) and is in contrast with the known intractability of two closely related generalizations: the Rectangle Stabbing problem and the generalization in which the selected lines are not required to be axis-parallel.Comment: Accepted to ACM-SIAM Symposium on Discrete Algorithms (SODA 2021

    Fine-Grained Complexity Analysis of Some Combinatorial Data Science Problems

    This thesis is concerned with analyzing the computational complexity of NP-hard problems related to data science. For most of the problems considered in this thesis, the computational complexity has not been intensively studied before. We focus on the complexity of computing exact problem solutions and conduct a detailed analysis identifying tractable special cases. To this end, we adopt a parameterized viewpoint in which we spot several parameters which describe properties of a specific problem instance that allow to solve the instance efficiently. We develop specialized algorithms whose running times are polynomial if the corresponding parameter value is constant. We also investigate in which cases the problems remain intractable even for small parameter values. We thereby chart the border between tractability and intractability for some practically motivated problems which yields a better understanding of their computational complexity. In particular, we consider the following problems. General Position Subset Selection is the problem to select a maximum number of points in general position from a given set of points in the plane. Point sets in general position are well-studied in geometry and play a role in data visualization. We prove several computational hardness results and show how polynomial-time data reduction can be applied to solve the problem if the sought number of points in general position is very small or very large. The Distinct Vectors problem asks to select a minimum number of columns in a given matrix such that all rows in the selected submatrix are pairwise distinct. This problem is motivated by combinatorial feature selection. We prove a complexity dichotomy with respect to combinations of the minimum and the maximum pairwise Hamming distance of the rows for binary input matrices, thus separating polynomial-time solvable from NP-hard cases. Co-Clustering is a well-known matrix clustering problem in data mining where the goal is to partition a matrix into homogenous submatrices. We conduct an extensive multivariate complexity analysis revealing several NP-hard and some polynomial-time solvable and fixed-parameter tractable cases. The generic F-free Editing problem is a graph modification problem in which a given graph has to be modified by a minimum number of edge modifications such that it does not contain any induced subgraph isomorphic to the graph F. We consider three special cases of this problem: The graph clustering problem Cluster Editing with applications in machine learning, the Triangle Deletion problem which is motivated by network cluster analysis, and Feedback Arc Set in Tournaments with applications in rank aggregation. We introduce a new parameterization by the number of edge modifications above a lower bound derived from a packing of induced forbidden subgraphs and show fixed-parameter tractability for all of the three above problems with respect to this parameter. Moreover, we prove several NP-hardness results for other variants of F-free Editing for a constant parameter value. The problem DTW-Mean is to compute a mean time series of a given sample of time series with respect to the dynamic time warping distance. This is a fundamental problem in time series analysis the complexity of which is unknown. We give an exact exponential-time algorithm for DTW-Mean and prove polynomial-time solvability for the special case of binary time series

    Elements of dynamic and 2-SAT programming: paths, trees, and cuts

    This thesis presents faster (in terms of worst-case running times) exact algorithms for special cases of graph problems through dynamic programming and 2-SAT programming. Dynamic programming describes the procedure of breaking down a problem recursively into overlapping subproblems, that is, subproblems with common subsubproblems. Given optimal solutions to these subproblems, the dynamic program then combines them into an optimal solution for the original problem. 2-SAT programming refers to the procedure of reducing a problem to a set of 2-SAT formulas, that is, boolean formulas in conjunctive normal form in which each clause contains at most two literals. Computing whether such a formula is satisfiable (and computing a satisfying truth assignment, if one exists) takes linear time in the formula length. Hence, when satisfying truth assignments to some 2-SAT formulas correspond to a solution of the original problem and all formulas can be computed efficiently, that is, in polynomial time in the input size of the original problem, then the original problem can be solved in polynomial time. We next describe our main results. Diameter asks for the maximal distance between any two vertices in a given undirected graph. It is arguably among the most fundamental graph parameters. We provide both positive and negative parameterized results for distance-from-triviality-type parameters and parameter combinations that were observed to be small in real-world applications. In Length-Bounded Cut, we search for a bounded-size set of edges that intersects all paths between two given vertices of at most some given length. We confirm a conjecture from the literature by providing a polynomial-time algorithm for proper interval graphs which is based on dynamic programming. k-Disjoint Shortest Paths is the problem of finding (vertex-)disjoint paths between given vertex terminals such that each of these paths is a shortest path between the respective terminals. Its complexity for constant k > 2 has been an open problem for over 20 years. Using dynamic programming, we show that k-Disjoint Shortest Paths can be solved in polynomial time for each constant k. The problem Tree Containment asks whether a phylogenetic tree T is contained in a phylogenetic network N. A phylogenetic network (or tree) is a leaf-labeled single-source directed acyclic graph (or tree) in which each vertex has in-degree at most one or out-degree at most one. The problem stems from computational biology in the context of the tree of life (the history of speciation). We introduce a particular variant that resembles certain types of uncertainty in the input. We show that if each leaf label occurs at most twice in a phylogenetic tree N, then the problem can be solved in polynomial time and if labels can occur up to three times, then the problem becomes NP-hard. Lastly, Reachable Object is the problem of deciding whether there is a sequence of rational trades of objects among agents such that a given agent can obtain a certain object. A rational trade is a swap of objects between two agents where both agents profit from the swap, that is, they receive objects they prefer over the objects they trade away. This problem can be seen as a natural generalization of the well-known and well-studied Housing Market problem where the agents are arranged in a graph and only neighboring agents can trade objects. We prove a dichotomy result that states that the problem is polynomial-time solvable if each agent prefers at most two objects over its initially held object and it is NP-hard if each agent prefers at most three objects over its initially held object. We also provide a polynomial-time 2-SAT program for the case where the graph of agents is a cycle

    Algorithmic aspects of resource allocation and multiwinner voting: theory and experiments

    This thesis is concerned with investigating elements of computational social choice in the light of real-world applications. We contribute to a better understanding of the areas of fair allocation and multiwinner voting. For both areas, inspired by real-world scenarios, we propose several new notions and extensions of existing models. Then, we analyze the complexity of answering the computational questions raised by the introduced concepts. To this end, we look through the lens of parameterized complexity. We identify different parameters which describe natural features specific to the computational problems we investigate. Exploiting the parameters, we successfully develop efficient algorithms for spe- cific cases of the studied problems. We complement our analysis by showing which parameters presumably cannot be utilized for seeking efficient algorithms. Thereby, we provide comprehensive pictures of the computational complexity of the studied problems. Specifically, we concentrate on four topics that we present below, grouped by our two areas of interest. For all but one topic, we present experimental studies based on implementations of newly developed algorithms. We first focus on fair allocation of indivisible resources. In this setting, we consider a collection of indivisible resources and a group of agents. Each agent reports its utility evaluation of every resource and the task is to “fairly” allocate the resources such that each resource is allocated to at most one agent. We concentrate on the two following issues regarding this scenario. The social context in fair allocation of indivisible resources. In many fair allocation settings, it is unlikely that every agent knows all other agents. For example, consider a scenario where the agents represent employees of a large corporation. It is highly unlikely that every employee knows every other employee. Motivated by such settings, we come up with a new model of graph envy-freeness by adapting the classical envy-freeness notion to account for social relations of agents modeled as social networks. We show that if the given social network of agents is simple (for example, if it is a directed acyclic graph), then indeed we can sometimes find fair allocations efficiently. However, we contrast tractability results with showing NP-hardness for several cases, including those in which the given social network has a constant degree. Fair allocations among few agents with bounded rationality. Bounded rationality is the idea that humans, due to cognitive limitations, tend to simplify problems that they face. One of its emanations is that human agents usually tend to report simple utilities over the resources that they want to allocate; for example, agents may categorize the available resources only into two groups of desirable and undesirable ones. Applying techniques for solving integer linear programs, we show that exploiting bounded rationality leads to efficient algorithms for finding envy-free and Pareto-efficient allocations, assuming a small number of agents. Further, we demonstrate that our result actually forms a framework that can be applied to a number of different fairness concepts like envy-freeness up to one good or envy-freeness up to any good. This way, we obtain efficient algorithms for a number of fair allocation problems (assuming few agents with bounded rationality). We also empirically show that our technique is applicable in practice. Further, we study multiwinner voting, where we are given a collection of voters and their preferences over a set of candidates. The outcome of a multiwinner voting rule is a group (or a set of groups in case of ties) of candidates that reflect the voters’ preferences best according to some objective. In this context, we investigate the following themes. The robustness of election outcomes. We study how robust outcomes of multiwinner elections are against possible mistakes made by voters. Assuming that each voter casts a ballot in a form of a ranking of candidates, we represent a mistake by a swap of adjacent candidates in a ballot. We find that for rules such as SNTV, k-Approval, and k-Borda, it is computationally easy to find the minimum number of swaps resulting in a change of an outcome. This task is, however, NP-hard for STV and the Chamberlin-Courant rule. We conclude our study of robustness with experimentally studying the average number of random swaps leading to a change of an outcome for several rules. Strategic voting in multiwinner elections. We ask whether a given group of cooperating voters can manipulate an election outcome in a favorable way. We focus on the k-Approval voting rule and we show that the computational complexity of answering the posed question has a rich structure. We spot several cases for which our problem is polynomial-time solvable. However, we also identify NP-hard cases. For several of them, we show how to circumvent the hardness by fixed-parameter tractability. We also present experimental studies indicating that our algorithms are applicable in practice