31 research outputs found
Order independent structural alignment of circularly permuted proteins
Circular permutation connects the N and C termini of a protein and
concurrently cleaves elsewhere in the chain, providing an important mechanism
for generating novel protein fold and functions. However, their in genomes is
unknown because current detection methods can miss many occurances, mistaking
random repeats as circular permutation. Here we develop a method for detecting
circularly permuted proteins from structural comparison. Sequence order
independent alignment of protein structures can be regarded as a special case
of the maximum-weight independent set problem, which is known to be
computationally hard. We develop an efficient approximation algorithm by
repeatedly solving relaxations of an appropriate intermediate integer
programming formulation, we show that the approximation ratio is much better
then the theoretical worst case ratio of . Circularly permuted
proteins reported in literature can be identified rapidly with our method,
while they escape the detection by publicly available servers for structural
alignment.Comment: 5 pages, 3 figures, Accepted by IEEE-EMBS 2004 Conference Proceeding
Inapproximability of maximal strip recovery
In comparative genomic, the first step of sequence analysis is usually to
decompose two or more genomes into syntenic blocks that are segments of
homologous chromosomes. For the reliable recovery of syntenic blocks, noise and
ambiguities in the genomic maps need to be removed first. Maximal Strip
Recovery (MSR) is an optimization problem proposed by Zheng, Zhu, and Sankoff
for reliably recovering syntenic blocks from genomic maps in the midst of noise
and ambiguities. Given genomic maps as sequences of gene markers, the
objective of \msr{d} is to find subsequences, one subsequence of each
genomic map, such that the total length of syntenic blocks in these
subsequences is maximized. For any constant , a polynomial-time
2d-approximation for \msr{d} was previously known. In this paper, we show that
for any , \msr{d} is APX-hard, even for the most basic version of the
problem in which all gene markers are distinct and appear in positive
orientation in each genomic map. Moreover, we provide the first explicit lower
bounds on approximating \msr{d} for all . In particular, we show that
\msr{d} is NP-hard to approximate within . From the other
direction, we show that the previous 2d-approximation for \msr{d} can be
optimized into a polynomial-time algorithm even if is not a constant but is
part of the input. We then extend our inapproximability results to several
related problems including \cmsr{d}, \gapmsr{\delta}{d}, and
\gapcmsr{\delta}{d}.Comment: A preliminary version of this paper appeared in two parts in the
Proceedings of the 20th International Symposium on Algorithms and Computation
(ISAAC 2009) and the Proceedings of the 4th International Frontiers of
Algorithmics Workshop (FAW 2010
Linear-vertex kernel for the problem of packing r-stars into a graph without long induced paths
Let integers and be fixed. Let be the set of
graphs with no induced path on vertices. We study the problem of packing
vertex-disjoint copies of () into a graph from
parameterized preprocessing, i.e., kernelization, point of view. We show that
every graph can be reduced, in polynomial time, to a graph
with vertices such that has at least
vertex-disjoint copies of if and only if has. Such a result is
known for arbitrary graphs when and we conjecture that it holds for
every
Core congestion is inherent in hyperbolic networks
We investigate the impact the negative curvature has on the traffic
congestion in large-scale networks. We prove that every Gromov hyperbolic
network admits a core, thus answering in the positive a conjecture by
Jonckheere, Lou, Bonahon, and Baryshnikov, Internet Mathematics, 7 (2011) which
is based on the experimental observation by Narayan and Saniee, Physical Review
E, 84 (2011) that real-world networks with small hyperbolicity have a core
congestion. Namely, we prove that for every subset of vertices of a
-hyperbolic graph there exists a vertex of such that the
disk of radius centered at intercepts at least
one half of the total flow between all pairs of vertices of , where the flow
between two vertices is carried by geodesic (or quasi-geodesic)
-paths. A set intercepts the flow between two nodes and if
intersect every shortest path between and . Differently from what
was conjectured by Jonckheere et al., we show that is not (and cannot be)
the center of mass of but is a node close to the median of in the
so-called injective hull of . In case of non-uniform traffic between nodes
of (in this case, the unit flow exists only between certain pairs of nodes
of defined by a commodity graph ), we prove a primal-dual result showing
that for any the size of a -multi-core (i.e., the number
of disks of radius ) intercepting all pairs of is upper bounded by
the maximum number of pairwise -apart pairs of
A Graph-Theoretic Barcode Ordering Model for Linked-Reads
Considering a set of intervals on the real line, an interval graph records these intervals as nodes and their intersections as edges. Identifying (i.e. merging) pairs of nodes in an interval graph results in a multiple-interval graph. Given only the nodes and the edges of the multiple-interval graph without knowing the underlying intervals, we are interested in the following questions. Can one determine how many intervals correspond to each node? Can one compute a walk over the multiple-interval graph nodes that reflects the ordering of the original intervals? These questions are closely related to linked-read DNA sequencing, where barcodes are assigned to long molecules whose intersection graph forms an interval graph. Each barcode may correspond to multiple molecules, which complicates downstream analysis, and corresponds to the identification of nodes of the corresponding interval graph. Resolving the above graph-theoretic problems would facilitate analyses of linked-reads sequencing data, through enabling the conceptual separation of barcodes into molecules and providing, through the molecules order, a skeleton for accurately assembling the genome. Here, we propose a framework that takes as input an arbitrary intersection graph (such as an overlap graph of barcodes) and constructs a heuristic approximation of the ordering of the original intervals
On Tree-Constrained Matchings and Generalizations
We consider the following \textsc{Tree-Constrained Bipartite Matching} problem: Given two rooted trees , and a weight function , find a maximum weight matching between nodes of the two trees, such that none of the matched nodes is an ancestor of another matched node in either of the trees. This generalization of the classical bipartite matching problem appears, for example, in the computational analysis of live cell video data. We show that the problem is -hard and thus, unless , disprove a previous claim that it is solvable in polynomial time. Furthermore, we give a -approximation algorithm based on a combination of the local ratio technique and a careful use of the structure of basic feasible solutions of a natural LP-relaxation, which we also show to have an integrality gap of .
In the second part of the paper, we consider a natural generalization of the problem, where trees are replaced by partially ordered sets (posets). We show that the local ratio technique gives a -approximation for the -dimensional matching generalization of the problem, in which the maximum number of incomparable elements below (or above) any given element in each poset is bounded by . We finally give an almost matching integrality gap example, and an inapproximability result showing that the dependence on is most likely unavoidable
Recognizing Unit Multiple Intervals Is Hard
Multiple interval graphs are a well-known generalization of interval graphs introduced in the 1970s to deal with situations arising naturally in scheduling and allocation. A d-interval is the union of d intervals on the real line, and a graph is a d-interval graph if it is the intersection graph of d-intervals. In particular, it is a unit d-interval graph if it admits a d-interval representation where every interval has unit length. Whereas it has been known for a long time that recognizing 2-interval graphs and other related classes such as 2-track interval graphs is NP-complete, the complexity of recognizing unit 2-interval graphs remains open. Here, we settle this question by proving that the recognition of unit 2-interval graphs is also NP-complete. Our proof technique uses a completely different approach from the other hardness results of recognizing related classes. Furthermore, we extend the result for unit d-interval graphs for any d â©Ÿ 2, which does not follow directly in graph recognition problems -as an example, it took almost 20 years to close the gap between d = 2 and d > 2 for the recognition of d-track interval graphs. Our result has several implications, including that recognizing (x, âŠ, x) d-interval graphs and depth r unit 2-interval graphs is NP-complete for every x â©Ÿ 11 and every r â©Ÿ 4