    The multicolored graph realization problem

    We introduce the multicolored graph realization problem (MGR). The input to this problem is a colored graph (G, φ), i.e., a graph G together with a coloring φ on its vertices. We associate each colored graph (G, φ) with a cluster graph (Gφ ) in which, after collapsing all vertices with the same color to a node, we remove multiple edges and self-loops. A set of vertices S is multicolored when S has exactly one vertex from each color class. The MGR problem is to decide whether there is a multicolored set S so that, after identifying each vertex in S with its color class, G[S] coincides with Gφ . The MGR problem is related to the well-known class of generalized network problems, most of which are NP-hard, like the generalized Minimum Spanning Tree problem. The MGR is a generalization of the multicolored clique problem, which is known to be W [1]-hard when parameterized by the number of colors. Thus, MGR remains W [1]-hard, when parameterized by the size of the cluster graph. These results imply that the MGR problem is W [1]-hard when parameterized by any graph parameter on Gφ , among which lies treewidth. Consequently, we look at the instances of the problem in which both the number of color classes and the treewidth of Gφ are unbounded. We consider three natural such graph classes: chordal graphs, convex bipartite graphs and 2-dimensional grid graphs. We show that MGR is NP-complete when Gφ is either chordal, biconvex bipartite, complete bipartite or a 2-dimensional grid. Our reductions show that the problem remains hard even when the maximum number of vertices in a color class is 3. In the case of the grid, the hardness holds even for graphs with bounded degree. We provide a complexity dichotomy with respect to cluster size .J. Díaz and M. Serna are partially supported by funds from the Spanish Agencia Estatal de Investigación under grant PID2020-112581GB-C21 (MOTION), and from AGAUR under grant 2017-SGR-786 (ALBCOM). Ö. Y. Diner is partially supported by the Scientific and Technological Research Council Tübitak under project BIDEB 2219-1059B191802095 and by Kadir Has University under project 2018-BAP-08. O. Serra is supported by the Spanish Agencia Estatal de Investigación under grant PID2020-113082GB-I00.Peer ReviewedPostprint (published version

    Multifactorial Evolutionary Algorithm For Clustered Minimum Routing Cost Problem

    Minimum Routing Cost Clustered Tree Problem (CluMRCT) is applied in various fields in both theory and application. Because the CluMRCT is NP-Hard, the approximate approaches are suitable to find the solution for this problem. Recently, Multifactorial Evolutionary Algorithm (MFEA) has emerged as one of the most efficient approximation algorithms to deal with many different kinds of problems. Therefore, this paper studies to apply MFEA for solving CluMRCT problems. In the proposed MFEA, we focus on crossover and mutation operators which create a valid solution of CluMRCT problem in two levels: first level constructs spanning trees for graphs in clusters while the second level builds a spanning tree for connecting among clusters. To reduce the consuming resources, we will also introduce a new method of calculating the cost of CluMRCT solution. The proposed algorithm is experimented on numerous types of datasets. The experimental results demonstrate the effectiveness of the proposed algorithm, partially on large instance

    Combinatorial and Probabilistic Approaches to Motif Recognition

    Short substrings of genomic data that are responsible for biological processes, such as gene expression, are referred to as motifs. Motifs with the same function may not entirely match, due to mutation events at a few of the motif positions. Allowing for non-exact occurrences significantly complicates their discovery. Given a number of DNA strings, the motif recognition problem is the task of detecting motif instances in every given sequence without knowledge of the position of the instances or the pattern shared by these substrings. We describe a novel approach to motif recognition, and provide theoretical and experimental results that demonstrate its efficiency and accuracy. Our algorithm, MCL-WMR, builds an edge-weighted graph model of the given motif recognition problem and uses a graph clustering algorithm to quickly determine important subgraphs that need to be searched further for valid motifs. By considering a weighted graph model, we narrow the search dramatically to smaller problems that can be solved with significantly less computation. The Closest String problem is a subproblem of motif recognition, and it is NP-hard. We give a linear-time algorithm for a restricted version of the Closest String problem, and an efficient polynomial-time heuristic that solves the general problem with high probability. We initiate the study of the smoothed complexity of the Closest String problem, which in turn explains our empirical results that demonstrate the great capability of our probabilistic heuristic. Important to this analysis is the introduction of a perturbation model of the Closest String instances within which we provide a probabilistic analysis of our algorithm. The smoothed analysis suggests reasons why a well-known fixed parameter tractable algorithm solves Closest String instances extremely efficiently in practice. Although the Closest String model is robust to the oversampling of strings in the input, it is severely affected by the existence of outliers. We propose a refined model, the Closest String with Outliers problem, to overcome this limitation. A systematic parameterized complexity analysis accompanies the introduction of this problem, providing a surprising insight into the sensitivity of this problem to slightly different parameterizations. Through the application of probabilistic and combinatorial insights into the Closest String problem, we develop sMCL-WMR, a program that is much faster than its predecessor MCL-WMR. We apply and adapt sMCL-WMR and MCL-WMR to analyze the promoter regions of the canola seed-coat. Our results identify important regions of the canola genome that are responsible for specific biological activities. This knowledge may be used in the long-term aim of developing crop varieties with specific biological characteristics, such as being disease-resistant

    On the Complexity of Community-aware Network Sparsification

    Network sparsification is the task of reducing the number of edges of a given graph while preserving some crucial graph property. In community-aware network sparsification, the preserved property concerns the subgraphs that are induced by the communities of the graph which are given as vertex subsets. This is formalized in the Π\Pi-Network Sparsification problem: given an edge-weighted graph GG, a collection ZZ of cc subsets of V(G)V(G) (communities), and two numbers ℓ,b\ell, b, the question is whether there exists a spanning subgraph G′G' of GG with at most ℓ\ell edges of total weight at most bb such that G′[C]G'[C] fulfills Π\Pi for each community CC. Here, we consider two graph properties Π\Pi: the connectivity property (Connectivity NWS) and the property of having a spanning star (Stars NWS). Since both problems are NP-hard, we study their parameterized and fine-grained complexity. We provide a tight 2Ω(n2+c)poly(n+∣Z∣)2^{\Omega(n^2+c)} poly(n+|Z|)-time running time lower bound based on the ETH for both problems, where nn is the number of vertices in GG. The lower bound holds even in the restricted case when all communities have size at most 4, GG is a clique, and every edge has unit weight. For the connectivity property, the unit weight case with GG being a clique is the well-studied problem of computing a hypergraph support with a minimum number of edges. We then study the complexity of both problems parameterized by the feedback edge number tt of the solution graph G′G'. For Stars NWS, we present an XP-algorithm for tt. This answers an open question by Korach and Stern [Disc. Appl. Math. '08] who asked for the existence of polynomial-time algorithms for t=0t=0. In contrast, we show for Connectivity NWS that known polynomial-time algorithms for t=0t=0 [Korach and Stern, Math. Program. '03; Klemz et al., SWAT '14] cannot be extended by showing that Connectivity NWS is NP-hard for t=1t=1

    Efficient Algorithms for Graph-Theoretic and Geometric Problems

    This thesis studies several different algorithmic problems in graph theory and in geometry. The applications of the problems studied range from circuit design optimization to fast matrix multiplication. First, we study a graph-theoretical model of the so called ''firefighter problem''. The objective is to save as much as possible of an area by appropriately placing firefighters. We provide both new exact algorithms for the case of general graphs as well as approximation algorithms for the case of planar graphs. Next, we study drawing graphs within a given polygon in the plane. We present asymptotically tight upper and lower bounds for this problem Further, we study the problem of Subgraph Isormorphism, which amounts to decide if an input graph (pattern) is isomorphic to a subgraph of another input graph (host graph). We show several new bounds on the time complexity of detecting small pattern graphs. Among other things, we provide a new framework for detection by testing polynomials for non-identity with zero. Finally, we study the problem of partitioning a 3D histogram into a minimum number of 3D boxes and it's applications to efficient computation of matrix products for positive integer matrices. We provide an efficient approximation algorithm for the partitioning problem and several algorithms for integer matrix multiplication. The multiplication algorithms are explicitly or implicitly based on an interpretation of positive integer matrices as 3D histograms and their partitions

    Fixed-parameter algorithms for some combinatorial problems in bioinformatics

    Fixed-parameterized algorithmics has been developed in 1990s as an approach to solve NP-hard problem optimally in a guaranteed running time. It offers a new opportunity to solve NP-hard problems exactly even on large problem instances. In this thesis, we apply fixed-parameter algorithms to cope with three NP-hard problems in bioinformatics: Flip Consensus Tree Problem is a combinatorial problem arising in computational phylogenetics. Using the formulation of the Flip Consensus Tree Problem as a graph-modification problem, we present a set of data reduction rules and two fixed-parameter algorithms with respect to the number of modifications. Additionally, we discuss several heuristic improvements to accelerate the running time of our algorithms in practice. We also report computational results on phylogenetic data. Weighted Cluster Editing Problem is a graph-modification problem, that arises in computational biology when clustering objects with respect to a given similarity or distance measure. We present one of our fixed-parameter algorithms with respect to the minimum modification cost and describe the idea of our fastest algorithm for this problem and its unweighted counterpart. Bond Order Assignment Problem asks for a bond order assignment of a molecule graph that minimizes a penalty function. We prove several complexity results on this problem and give two exact fixed-parameter algorithms for the problem. Our algorithms base on the dynamic programming approach on a tree decomposition of the molecule graph. Our algorithms are fixed-parameter with respect to the treewidth of the molecule graph and the maximum atom valence. We implemented one of our algorithms with several heuristic improvements and evaluate our algorithm on a set of real molecule graphs. It turns out that our algorithm is very fast on this dataset and even outperforms a heuristic algorithm that is usually used in practice