126,234 research outputs found
Lightweight LCP Construction for Very Large Collections of Strings
The longest common prefix array is a very advantageous data structure that,
combined with the suffix array and the Burrows-Wheeler transform, allows to
efficiently compute some combinatorial properties of a string useful in several
applications, especially in biological contexts. Nowadays, the input data for
many problems are big collections of strings, for instance the data coming from
"next-generation" DNA sequencing (NGS) technologies. In this paper we present
the first lightweight algorithm (called extLCP) for the simultaneous
computation of the longest common prefix array and the Burrows-Wheeler
transform of a very large collection of strings having any length. The
computation is realized by performing disk data accesses only via sequential
scans, and the total disk space usage never needs more than twice the output
size, excluding the disk space required for the input. Moreover, extLCP allows
to compute also the suffix array of the strings of the collection, without any
other further data structure is needed. Finally, we test our algorithm on real
data and compare our results with another tool capable to work in external
memory on large collections of strings.Comment: This manuscript version is made available under the CC-BY-NC-ND 4.0
license http://creativecommons.org/licenses/by-nc-nd/4.0/ The final version
of this manuscript is in press in Journal of Discrete Algorithm
Memory Augmented Control Networks
Planning problems in partially observable environments cannot be solved
directly with convolutional networks and require some form of memory. But, even
memory networks with sophisticated addressing schemes are unable to learn
intelligent reasoning satisfactorily due to the complexity of simultaneously
learning to access memory and plan. To mitigate these challenges we introduce
the Memory Augmented Control Network (MACN). The proposed network architecture
consists of three main parts. The first part uses convolutions to extract
features and the second part uses a neural network-based planning module to
pre-plan in the environment. The third part uses a network controller that
learns to store those specific instances of past information that are necessary
for planning. The performance of the network is evaluated in discrete grid
world environments for path planning in the presence of simple and complex
obstacles. We show that our network learns to plan and can generalize to new
environments
A structural analysis of the A5/1 state transition graph
We describe efficient algorithms to analyze the cycle structure of the graph
induced by the state transition function of the A5/1 stream cipher used in GSM
mobile phones and report on the results of the implementation. The analysis is
performed in five steps utilizing HPC clusters, GPGPU and external memory
computation. A great reduction of this huge state transition graph of 2^64
nodes is achieved by focusing on special nodes in the first step and removing
leaf nodes that can be detected with limited effort in the second step. This
step does not break the overall structure of the graph and keeps at least one
node on every cycle. In the third step the nodes of the reduced graph are
connected by weighted edges. Since the number of nodes is still huge an
efficient bitslice approach is presented that is implemented with NVIDIA's CUDA
framework and executed on several GPUs concurrently. An external memory
algorithm based on the STXXL library and its parallel pipelining feature
further reduces the graph in the fourth step. The result is a graph containing
only cycles that can be further analyzed in internal memory to count the number
and size of the cycles. This full analysis which previously would take months
can now be completed within a few days and allows to present structural results
for the full graph for the first time. The structure of the A5/1 graph deviates
notably from the theoretical results for random mappings.Comment: In Proceedings GRAPHITE 2012, arXiv:1210.611
GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine
Recent studies showed that single-machine graph processing systems can be as
highly competitive as cluster-based approaches on large-scale problems. While
several out-of-core graph processing systems and computation models have been
proposed, the high disk I/O overhead could significantly reduce performance in
many practical cases. In this paper, we propose GraphMP to tackle big graph
analytics on a single machine. GraphMP achieves low disk I/O overhead with
three techniques. First, we design a vertex-centric sliding window (VSW)
computation model to avoid reading and writing vertices on disk. Second, we
propose a selective scheduling method to skip loading and processing
unnecessary edge shards on disk. Third, we use a compressed edge cache
mechanism to fully utilize the available memory of a machine to reduce the
amount of disk accesses for edges. Extensive evaluations have shown that
GraphMP could outperform state-of-the-art systems such as GraphChi, X-Stream
and GridGraph by 31.6x, 54.5x and 23.1x respectively, when running popular
graph applications on a billion-vertex graph
- …