31 research outputs found

    Order independent structural alignment of circularly permuted proteins

    Full text link
    Circular permutation connects the N and C termini of a protein and concurrently cleaves elsewhere in the chain, providing an important mechanism for generating novel protein fold and functions. However, their in genomes is unknown because current detection methods can miss many occurances, mistaking random repeats as circular permutation. Here we develop a method for detecting circularly permuted proteins from structural comparison. Sequence order independent alignment of protein structures can be regarded as a special case of the maximum-weight independent set problem, which is known to be computationally hard. We develop an efficient approximation algorithm by repeatedly solving relaxations of an appropriate intermediate integer programming formulation, we show that the approximation ratio is much better then the theoretical worst case ratio of r=1/4r = 1/4. Circularly permuted proteins reported in literature can be identified rapidly with our method, while they escape the detection by publicly available servers for structural alignment.Comment: 5 pages, 3 figures, Accepted by IEEE-EMBS 2004 Conference Proceeding

    Inapproximability of maximal strip recovery

    Get PDF
    In comparative genomic, the first step of sequence analysis is usually to decompose two or more genomes into syntenic blocks that are segments of homologous chromosomes. For the reliable recovery of syntenic blocks, noise and ambiguities in the genomic maps need to be removed first. Maximal Strip Recovery (MSR) is an optimization problem proposed by Zheng, Zhu, and Sankoff for reliably recovering syntenic blocks from genomic maps in the midst of noise and ambiguities. Given dd genomic maps as sequences of gene markers, the objective of \msr{d} is to find dd subsequences, one subsequence of each genomic map, such that the total length of syntenic blocks in these subsequences is maximized. For any constant d≄2d \ge 2, a polynomial-time 2d-approximation for \msr{d} was previously known. In this paper, we show that for any d≄2d \ge 2, \msr{d} is APX-hard, even for the most basic version of the problem in which all gene markers are distinct and appear in positive orientation in each genomic map. Moreover, we provide the first explicit lower bounds on approximating \msr{d} for all d≄2d \ge 2. In particular, we show that \msr{d} is NP-hard to approximate within Ω(d/log⁥d)\Omega(d/\log d). From the other direction, we show that the previous 2d-approximation for \msr{d} can be optimized into a polynomial-time algorithm even if dd is not a constant but is part of the input. We then extend our inapproximability results to several related problems including \cmsr{d}, \gapmsr{\delta}{d}, and \gapcmsr{\delta}{d}.Comment: A preliminary version of this paper appeared in two parts in the Proceedings of the 20th International Symposium on Algorithms and Computation (ISAAC 2009) and the Proceedings of the 4th International Frontiers of Algorithmics Workshop (FAW 2010

    Linear-vertex kernel for the problem of packing r-stars into a graph without long induced paths

    Get PDF
    Let integers r≄2r\ge 2 and d≄3d\ge 3 be fixed. Let Gd{\cal G}_d be the set of graphs with no induced path on dd vertices. We study the problem of packing kk vertex-disjoint copies of K1,rK_{1,r} (k≄2k\ge 2) into a graph GG from parameterized preprocessing, i.e., kernelization, point of view. We show that every graph G∈GdG\in {\cal G}_d can be reduced, in polynomial time, to a graph Gâ€Č∈GdG'\in {\cal G}_d with O(k)O(k) vertices such that GG has at least kk vertex-disjoint copies of K1,rK_{1,r} if and only if Gâ€ČG' has. Such a result is known for arbitrary graphs GG when r=2r=2 and we conjecture that it holds for every r≄2r\ge 2

    Core congestion is inherent in hyperbolic networks

    Full text link
    We investigate the impact the negative curvature has on the traffic congestion in large-scale networks. We prove that every Gromov hyperbolic network GG admits a core, thus answering in the positive a conjecture by Jonckheere, Lou, Bonahon, and Baryshnikov, Internet Mathematics, 7 (2011) which is based on the experimental observation by Narayan and Saniee, Physical Review E, 84 (2011) that real-world networks with small hyperbolicity have a core congestion. Namely, we prove that for every subset XX of vertices of a ÎŽ\delta-hyperbolic graph GG there exists a vertex mm of GG such that the disk D(m,4ÎŽ)D(m,4 \delta) of radius 4ÎŽ4 \delta centered at mm intercepts at least one half of the total flow between all pairs of vertices of XX, where the flow between two vertices x,y∈Xx,y\in X is carried by geodesic (or quasi-geodesic) (x,y)(x,y)-paths. A set SS intercepts the flow between two nodes xx and yy if SS intersect every shortest path between xx and yy. Differently from what was conjectured by Jonckheere et al., we show that mm is not (and cannot be) the center of mass of XX but is a node close to the median of XX in the so-called injective hull of XX. In case of non-uniform traffic between nodes of XX (in this case, the unit flow exists only between certain pairs of nodes of XX defined by a commodity graph RR), we prove a primal-dual result showing that for any ρ>5ÎŽ\rho>5\delta the size of a ρ\rho-multi-core (i.e., the number of disks of radius ρ\rho) intercepting all pairs of RR is upper bounded by the maximum number of pairwise (ρ−3ÎŽ)(\rho-3\delta)-apart pairs of RR

    On tree-constrained matchings and generalizations

    Get PDF

    A Graph-Theoretic Barcode Ordering Model for Linked-Reads

    Get PDF
    Considering a set of intervals on the real line, an interval graph records these intervals as nodes and their intersections as edges. Identifying (i.e. merging) pairs of nodes in an interval graph results in a multiple-interval graph. Given only the nodes and the edges of the multiple-interval graph without knowing the underlying intervals, we are interested in the following questions. Can one determine how many intervals correspond to each node? Can one compute a walk over the multiple-interval graph nodes that reflects the ordering of the original intervals? These questions are closely related to linked-read DNA sequencing, where barcodes are assigned to long molecules whose intersection graph forms an interval graph. Each barcode may correspond to multiple molecules, which complicates downstream analysis, and corresponds to the identification of nodes of the corresponding interval graph. Resolving the above graph-theoretic problems would facilitate analyses of linked-reads sequencing data, through enabling the conceptual separation of barcodes into molecules and providing, through the molecules order, a skeleton for accurately assembling the genome. Here, we propose a framework that takes as input an arbitrary intersection graph (such as an overlap graph of barcodes) and constructs a heuristic approximation of the ordering of the original intervals

    On Tree-Constrained Matchings and Generalizations

    Get PDF
    We consider the following \textsc{Tree-Constrained Bipartite Matching} problem: Given two rooted trees T1=(V1,E1)T_1=(V_1,E_1), T2=(V2,E2)T_2=(V_2,E_2) and a weight function w:V1×V2↩R+w: V_1\times V_2 \mapsto \mathbb{R}_+, find a maximum weight matching M\mathcal{M} between nodes of the two trees, such that none of the matched nodes is an ancestor of another matched node in either of the trees. This generalization of the classical bipartite matching problem appears, for example, in the computational analysis of live cell video data. We show that the problem is APX\mathcal{APX}-hard and thus, unless P=NP\mathcal{P} = \mathcal{NP}, disprove a previous claim that it is solvable in polynomial time. Furthermore, we give a 22-approximation algorithm based on a combination of the local ratio technique and a careful use of the structure of basic feasible solutions of a natural LP-relaxation, which we also show to have an integrality gap of 2−o(1)2-o(1). In the second part of the paper, we consider a natural generalization of the problem, where trees are replaced by partially ordered sets (posets). We show that the local ratio technique gives a 2kρ2k\rho-approximation for the kk-dimensional matching generalization of the problem, in which the maximum number of incomparable elements below (or above) any given element in each poset is bounded by ρ\rho. We finally give an almost matching integrality gap example, and an inapproximability result showing that the dependence on ρ\rho is most likely unavoidable

    Recognizing Unit Multiple Intervals Is Hard

    Get PDF
    Multiple interval graphs are a well-known generalization of interval graphs introduced in the 1970s to deal with situations arising naturally in scheduling and allocation. A d-interval is the union of d intervals on the real line, and a graph is a d-interval graph if it is the intersection graph of d-intervals. In particular, it is a unit d-interval graph if it admits a d-interval representation where every interval has unit length. Whereas it has been known for a long time that recognizing 2-interval graphs and other related classes such as 2-track interval graphs is NP-complete, the complexity of recognizing unit 2-interval graphs remains open. Here, we settle this question by proving that the recognition of unit 2-interval graphs is also NP-complete. Our proof technique uses a completely different approach from the other hardness results of recognizing related classes. Furthermore, we extend the result for unit d-interval graphs for any d â©Ÿ 2, which does not follow directly in graph recognition problems -as an example, it took almost 20 years to close the gap between d = 2 and d > 2 for the recognition of d-track interval graphs. Our result has several implications, including that recognizing (x, 
, x) d-interval graphs and depth r unit 2-interval graphs is NP-complete for every x â©Ÿ 11 and every r â©Ÿ 4