3,249 research outputs found
Serially-regulated biological networks fully realize a constrained set of functions
We show that biological networks with serial regulation (each node regulated
by at most one other node) are constrained to {\it direct functionality}, in
which the sign of the effect of an environmental input on a target species
depends only on the direct path from the input to the target, even when there
is a feedback loop allowing for multiple interaction pathways. Using a
stochastic model for a set of small transcriptional regulatory networks that
have been studied experimentally, we further find that all networks can achieve
all functions permitted by this constraint under reasonable settings of
biochemical parameters. This underscores the functional versatility of the
networks.Comment: 9 pages, 3 figure
Faster subsequence recognition in compressed strings
Computation on compressed strings is one of the key approaches to processing
massive data sets. We consider local subsequence recognition problems on
strings compressed by straight-line programs (SLP), which is closely related to
Lempel--Ziv compression. For an SLP-compressed text of length , and an
uncompressed pattern of length , C{\'e}gielski et al. gave an algorithm for
local subsequence recognition running in time . We improve
the running time to . Our algorithm can also be used to
compute the longest common subsequence between a compressed text and an
uncompressed pattern in time ; the same problem with a
compressed pattern is known to be NP-hard
RLZAP: Relative Lempel-Ziv with Adaptive Pointers
Relative Lempel-Ziv (RLZ) is a popular algorithm for compressing databases of
genomes from individuals of the same species when fast random access is
desired. With Kuruppu et al.'s (SPIRE 2010) original implementation, a
reference genome is selected and then the other genomes are greedily parsed
into phrases exactly matching substrings of the reference. Deorowicz and
Grabowski (Bioinformatics, 2011) pointed out that letting each phrase end with
a mismatch character usually gives better compression because many of the
differences between individuals' genomes are single-nucleotide substitutions.
Ferrada et al. (SPIRE 2014) then pointed out that also using relative pointers
and run-length compressing them usually gives even better compression. In this
paper we generalize Ferrada et al.'s idea to handle well also short insertions,
deletions and multi-character substitutions. We show experimentally that our
generalization achieves better compression than Ferrada et al.'s implementation
with comparable random-access times
Who Chooses Open-Source Software?
Economists and legal scholars have debated the reasons people adopt opensource software, and accordingly whether and to what extent the open-source model can scale, replacing proprietary rights as a primary means of production. In this Article, we use the release by a biotechnology company of similar software under both proprietary and open-source licenses to investigate who uses open-source software and why. We find that academic users are somewhat more likely to adopt open-source software than private firms. We find only modest differences in the willingness of open-source users to modify or improve existing programs. And we find that users of open-source software often make business decisions that seem indifferent to the norms of opensource distribution. Our findings cast some doubt on the penetration of the open-source ethos beyond traditional software markets
Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts
We study the approximate string matching and regular expression matching
problem for the case when the text to be searched is compressed with the
Ziv-Lempel adaptive dictionary compression schemes. We present a time-space
trade-off that leads to algorithms improving the previously known complexities
for both problems. In particular, we significantly improve the space bounds,
which in practical applications are likely to be a bottleneck
Information in Infinite Ensembles of Infinitely-Wide Neural Networks
In this preliminary work, we study the generalization properties of infinite
ensembles of infinitely-wide neural networks. Amazingly, this model family
admits tractable calculations for many information-theoretic quantities. We
report analytical and empirical investigations in the search for signals that
correlate with generalization.Comment: 2nd Symposium on Advances in Approximate Bayesian Inference, 201
Optimizing XML Compression
The eXtensible Markup Language (XML) provides a powerful and flexible means
of encoding and exchanging data. As it turns out, its main advantage as an
encoding format (namely, its requirement that all open and close markup tags
are present and properly balanced) yield also one of its main disadvantages:
verbosity. XML-conscious compression techniques seek to overcome this drawback.
Many of these techniques first separate XML structure from the document
content, and then compress each independently. Further compression gains can be
realized by identifying and compressing together document content that is
highly similar, thereby amortizing the storage costs of auxiliary information
required by the chosen compression algorithm. Additionally, the proper choice
of compression algorithm is an important factor not only for the achievable
compression gain, but also for access performance. Hence, choosing a
compression configuration that optimizes compression gain requires one to
determine (1) a partitioning strategy for document content, and (2) the best
available compression algorithm to apply to each set within this partition. In
this paper, we show that finding an optimal compression configuration with
respect to compression gain is an NP-hard optimization problem. This problem
remains intractable even if one considers a single compression algorithm for
all content. We also describe an approximation algorithm for selecting a
partitioning strategy for document content based on the branch-and-bound
paradigm.Comment: 16 pages, extended version of paper accepted for XSym 200
Efficient LZ78 factorization of grammar compressed text
We present an efficient algorithm for computing the LZ78 factorization of a
text, where the text is represented as a straight line program (SLP), which is
a context free grammar in the Chomsky normal form that generates a single
string. Given an SLP of size representing a text of length , our
algorithm computes the LZ78 factorization of in time
and space, where is the number of resulting LZ78 factors.
We also show how to improve the algorithm so that the term in the
time and space complexities becomes either , where is the length of the
longest LZ78 factor, or where is a quantity
which depends on the amount of redundancy that the SLP captures with respect to
substrings of of a certain length. Since where
is the alphabet size, the latter is asymptotically at least as fast as
a linear time algorithm which runs on the uncompressed string when is
constant, and can be more efficient when the text is compressible, i.e. when
and are small.Comment: SPIRE 201
Protein-DNA computation by stochastic assembly cascade
The assembly of RecA on single-stranded DNA is measured and interpreted as a
stochastic finite-state machine that is able to discriminate fine differences
between sequences, a basic computational operation. RecA filaments efficiently
scan DNA sequence through a cascade of random nucleation and disassembly events
that is mechanistically similar to the dynamic instability of microtubules.
This iterative cascade is a multistage kinetic proofreading process that
amplifies minute differences, even a single base change. Our measurements
suggest that this stochastic Turing-like machine can compute certain integral
transforms.Comment: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC129313/
http://www.pnas.org/content/99/18/11589.abstrac
- …