10,332 research outputs found
An Efficient Algorithm For Chinese Postman Walk on Bi-directed de Bruijn Graphs
Sequence assembly from short reads is an important problem in biology. It is
known that solving the sequence assembly problem exactly on a bi-directed de
Bruijn graph or a string graph is intractable. However finding a Shortest
Double stranded DNA string (SDDNA) containing all the k-long words in the reads
seems to be a good heuristic to get close to the original genome. This problem
is equivalent to finding a cyclic Chinese Postman (CP) walk on the underlying
un-weighted bi-directed de Bruijn graph built from the reads. The Chinese
Postman walk Problem (CPP) is solved by reducing it to a general bi-directed
flow on this graph which runs in O(|E|2 log2(|V |)) time. In this paper we show
that the cyclic CPP on bi-directed graphs can be solved without reducing it to
bi-directed flow. We present a ?(p(|V | + |E|) log(|V |) + (dmaxp)3) time
algorithm to solve the cyclic CPP on a weighted bi-directed de Bruijn graph,
where p = max{|{v|din(v) - dout(v) > 0}|, |{v|din(v) - dout(v) < 0}|} and dmax
= max{|din(v) - dout(v)}. Our algorithm performs asymptotically better than the
bidirected flow algorithm when the number of imbalanced nodes p is much less
than the nodes in the bi-directed graph. From our experimental results on
various datasets, we have noticed that the value of p/|V | lies between 0.08%
and 0.13% with 95% probability
The Fast Heuristic Algorithms and Post-Processing Techniques to Design Large and Low-Cost Communication Networks
It is challenging to design large and low-cost communication networks. In
this paper, we formulate this challenge as the prize-collecting Steiner Tree
Problem (PCSTP). The objective is to minimize the costs of transmission routes
and the disconnected monetary or informational profits. Initially, we note that
the PCSTP is MAX SNP-hard. Then, we propose some post-processing techniques to
improve suboptimal solutions to PCSTP. Based on these techniques, we propose
two fast heuristic algorithms: the first one is a quasilinear time heuristic
algorithm that is faster and consumes less memory than other algorithms; and
the second one is an improvement of a stateof-the-art polynomial time heuristic
algorithm that can find high-quality solutions at a speed that is only inferior
to the first one. We demonstrate the competitiveness of our heuristic
algorithms by comparing them with the state-of-the-art ones on the largest
existing benchmark instances (169 800 vertices and 338 551 edges). Moreover, we
generate new instances that are even larger (1 000 000 vertices and 10 000 000
edges) to further demonstrate their advantages in large networks. The
state-ofthe-art algorithms are too slow to find high-quality solutions for
instances of this size, whereas our new heuristic algorithms can do this in
around 6 to 45s on a personal computer. Ultimately, we apply our
post-processing techniques to update the bestknown solution for a notoriously
difficult benchmark instance to show that they can improve near-optimal
solutions to PCSTP. In conclusion, we demonstrate the usefulness of our
heuristic algorithms and post-processing techniques for designing large and
low-cost communication networks
Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff
The relative ease of collaborative data science and analysis has led to a
proliferation of many thousands or millions of of the same datasets
in many scientific and commercial domains, acquired or constructed at various
stages of data analysis across many users, and often over long periods of time.
Managing, storing, and recreating these dataset versions is a non-trivial task.
The fundamental challenge here is the : the more
storage we use, the faster it is to recreate or retrieve versions, while the
less storage we use, the slower it is to recreate or retrieve versions. Despite
the fundamental nature of this problem, there has been a surprisingly little
amount of work on it. In this paper, we study this trade-off in a principled
manner: we formulate six problems under various settings, trading off these
quantities in various ways, demonstrate that most of the problems are
intractable, and propose a suite of inexpensive heuristics drawing from
techniques in delay-constrained scheduling, and spanning tree literature, to
solve these problems. We have built a prototype version management system, that
aims to serve as a foundation to our DATAHUB system for facilitating
collaborative data science. We demonstrate, via extensive experiments, that our
proposed heuristics provide efficient solutions in practical dataset versioning
scenarios
A Tutorial on Clique Problems in Communications and Signal Processing
Since its first use by Euler on the problem of the seven bridges of
K\"onigsberg, graph theory has shown excellent abilities in solving and
unveiling the properties of multiple discrete optimization problems. The study
of the structure of some integer programs reveals equivalence with graph theory
problems making a large body of the literature readily available for solving
and characterizing the complexity of these problems. This tutorial presents a
framework for utilizing a particular graph theory problem, known as the clique
problem, for solving communications and signal processing problems. In
particular, the paper aims to illustrate the structural properties of integer
programs that can be formulated as clique problems through multiple examples in
communications and signal processing. To that end, the first part of the
tutorial provides various optimal and heuristic solutions for the maximum
clique, maximum weight clique, and -clique problems. The tutorial, further,
illustrates the use of the clique formulation through numerous contemporary
examples in communications and signal processing, mainly in maximum access for
non-orthogonal multiple access networks, throughput maximization using index
and instantly decodable network coding, collision-free radio frequency
identification networks, and resource allocation in cloud-radio access
networks. Finally, the tutorial sheds light on the recent advances of such
applications, and provides technical insights on ways of dealing with mixed
discrete-continuous optimization problems
On the Complexity of Spill Everywhere under SSA Form
Compilation for embedded processors can be either aggressive (time consuming
cross-compilation) or just in time (embedded and usually dynamic). The
heuristics used in dynamic compilation are highly constrained by limited
resources, time and memory in particular. Recent results on the SSA form open
promising directions for the design of new register allocation heuristics for
embedded systems and especially for embedded compilation. In particular,
heuristics based on tree scan with two separated phases -- one for spilling,
then one for coloring/coalescing -- seem good candidates for designing
memory-friendly, fast, and competitive register allocators. Still, also because
of the side effect on power consumption, the minimization of loads and stores
overhead (spilling problem) is an important issue. This paper provides an
exhaustive study of the complexity of the ``spill everywhere'' problem in the
context of the SSA form. Unfortunately, conversely to our initial hopes, many
of the questions we raised lead to NP-completeness results. We identify some
polynomial cases but that are impractical in JIT context. Nevertheless, they
can give hints to simplify formulations for the design of aggressive
allocators.Comment: 10 page
- …