3,046 research outputs found
Spontaneous Expulsion of Giant Lipid Vesicles Induced by Laser Tweezers
Irradiation of a giant unilamellar lipid bilayer vesicle with a focused laser
spot leads to a tense pressurized state which persists indefinitely after laser
shutoff. If the vesicle contains another object it can then be gently and
continuously expelled from the tense outer vesicle. Remarkably, the inner
object can be almost as large as the parent vesicle; its volume is replaced
during the exit process. We offer a qualitative theoretical model to explain
these and related phenomena. The main hypothesis is that the laser trap pulls
in lipid and ejects it in the form of submicron objects, whose osmotic activity
then drives the expulsion.Comment: Plain TeX file; uses harvmac and epsf; .ps available at
http://dept.physics.upenn.edu/~nelson/expulsion.p
RLZAP: Relative Lempel-Ziv with Adaptive Pointers
Relative Lempel-Ziv (RLZ) is a popular algorithm for compressing databases of
genomes from individuals of the same species when fast random access is
desired. With Kuruppu et al.'s (SPIRE 2010) original implementation, a
reference genome is selected and then the other genomes are greedily parsed
into phrases exactly matching substrings of the reference. Deorowicz and
Grabowski (Bioinformatics, 2011) pointed out that letting each phrase end with
a mismatch character usually gives better compression because many of the
differences between individuals' genomes are single-nucleotide substitutions.
Ferrada et al. (SPIRE 2014) then pointed out that also using relative pointers
and run-length compressing them usually gives even better compression. In this
paper we generalize Ferrada et al.'s idea to handle well also short insertions,
deletions and multi-character substitutions. We show experimentally that our
generalization achieves better compression than Ferrada et al.'s implementation
with comparable random-access times
Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts
We study the approximate string matching and regular expression matching
problem for the case when the text to be searched is compressed with the
Ziv-Lempel adaptive dictionary compression schemes. We present a time-space
trade-off that leads to algorithms improving the previously known complexities
for both problems. In particular, we significantly improve the space bounds,
which in practical applications are likely to be a bottleneck
Efficient LZ78 factorization of grammar compressed text
We present an efficient algorithm for computing the LZ78 factorization of a
text, where the text is represented as a straight line program (SLP), which is
a context free grammar in the Chomsky normal form that generates a single
string. Given an SLP of size representing a text of length , our
algorithm computes the LZ78 factorization of in time
and space, where is the number of resulting LZ78 factors.
We also show how to improve the algorithm so that the term in the
time and space complexities becomes either , where is the length of the
longest LZ78 factor, or where is a quantity
which depends on the amount of redundancy that the SLP captures with respect to
substrings of of a certain length. Since where
is the alphabet size, the latter is asymptotically at least as fast as
a linear time algorithm which runs on the uncompressed string when is
constant, and can be more efficient when the text is compressible, i.e. when
and are small.Comment: SPIRE 201
Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries
We present the first thorough practical study of the Lempel-Ziv-78 and the
Lempel-Ziv-Welch computation based on trie data structures. With a careful
selection of trie representations we can beat well-tuned popular trie data
structures like Judy, m-Bonsai or Cedar
Optimizing XML Compression
The eXtensible Markup Language (XML) provides a powerful and flexible means
of encoding and exchanging data. As it turns out, its main advantage as an
encoding format (namely, its requirement that all open and close markup tags
are present and properly balanced) yield also one of its main disadvantages:
verbosity. XML-conscious compression techniques seek to overcome this drawback.
Many of these techniques first separate XML structure from the document
content, and then compress each independently. Further compression gains can be
realized by identifying and compressing together document content that is
highly similar, thereby amortizing the storage costs of auxiliary information
required by the chosen compression algorithm. Additionally, the proper choice
of compression algorithm is an important factor not only for the achievable
compression gain, but also for access performance. Hence, choosing a
compression configuration that optimizes compression gain requires one to
determine (1) a partitioning strategy for document content, and (2) the best
available compression algorithm to apply to each set within this partition. In
this paper, we show that finding an optimal compression configuration with
respect to compression gain is an NP-hard optimization problem. This problem
remains intractable even if one considers a single compression algorithm for
all content. We also describe an approximation algorithm for selecting a
partitioning strategy for document content based on the branch-and-bound
paradigm.Comment: 16 pages, extended version of paper accepted for XSym 200
Faster subsequence recognition in compressed strings
Computation on compressed strings is one of the key approaches to processing
massive data sets. We consider local subsequence recognition problems on
strings compressed by straight-line programs (SLP), which is closely related to
Lempel--Ziv compression. For an SLP-compressed text of length , and an
uncompressed pattern of length , C{\'e}gielski et al. gave an algorithm for
local subsequence recognition running in time . We improve
the running time to . Our algorithm can also be used to
compute the longest common subsequence between a compressed text and an
uncompressed pattern in time ; the same problem with a
compressed pattern is known to be NP-hard
A Faster Implementation of Online Run-Length Burrows-Wheeler Transform
Run-length encoding Burrows-Wheeler Transformed strings, resulting in
Run-Length BWT (RLBWT), is a powerful tool for processing highly repetitive
strings. We propose a new algorithm for online RLBWT working in run-compressed
space, which runs in time and bits of space, where
is the length of input string received so far and is the number of runs
in the BWT of the reversed . We improve the state-of-the-art algorithm for
online RLBWT in terms of empirical construction time. Adopting the dynamic list
for maintaining a total order, we can replace rank queries in a dynamic wavelet
tree on a run-length compressed string by the direct comparison of labels in a
dynamic list. The empirical result for various benchmarks show the efficiency
of our algorithm, especially for highly repetitive strings.Comment: In Proc. IWOCA201
- …