126,234 research outputs found

    Lightweight LCP Construction for Very Large Collections of Strings

    Full text link
    The longest common prefix array is a very advantageous data structure that, combined with the suffix array and the Burrows-Wheeler transform, allows to efficiently compute some combinatorial properties of a string useful in several applications, especially in biological contexts. Nowadays, the input data for many problems are big collections of strings, for instance the data coming from "next-generation" DNA sequencing (NGS) technologies. In this paper we present the first lightweight algorithm (called extLCP) for the simultaneous computation of the longest common prefix array and the Burrows-Wheeler transform of a very large collection of strings having any length. The computation is realized by performing disk data accesses only via sequential scans, and the total disk space usage never needs more than twice the output size, excluding the disk space required for the input. Moreover, extLCP allows to compute also the suffix array of the strings of the collection, without any other further data structure is needed. Finally, we test our algorithm on real data and compare our results with another tool capable to work in external memory on large collections of strings.Comment: This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ The final version of this manuscript is in press in Journal of Discrete Algorithm

    Memory Augmented Control Networks

    Full text link
    Planning problems in partially observable environments cannot be solved directly with convolutional networks and require some form of memory. But, even memory networks with sophisticated addressing schemes are unable to learn intelligent reasoning satisfactorily due to the complexity of simultaneously learning to access memory and plan. To mitigate these challenges we introduce the Memory Augmented Control Network (MACN). The proposed network architecture consists of three main parts. The first part uses convolutions to extract features and the second part uses a neural network-based planning module to pre-plan in the environment. The third part uses a network controller that learns to store those specific instances of past information that are necessary for planning. The performance of the network is evaluated in discrete grid world environments for path planning in the presence of simple and complex obstacles. We show that our network learns to plan and can generalize to new environments

    A structural analysis of the A5/1 state transition graph

    Full text link
    We describe efficient algorithms to analyze the cycle structure of the graph induced by the state transition function of the A5/1 stream cipher used in GSM mobile phones and report on the results of the implementation. The analysis is performed in five steps utilizing HPC clusters, GPGPU and external memory computation. A great reduction of this huge state transition graph of 2^64 nodes is achieved by focusing on special nodes in the first step and removing leaf nodes that can be detected with limited effort in the second step. This step does not break the overall structure of the graph and keeps at least one node on every cycle. In the third step the nodes of the reduced graph are connected by weighted edges. Since the number of nodes is still huge an efficient bitslice approach is presented that is implemented with NVIDIA's CUDA framework and executed on several GPUs concurrently. An external memory algorithm based on the STXXL library and its parallel pipelining feature further reduces the graph in the fourth step. The result is a graph containing only cycles that can be further analyzed in internal memory to count the number and size of the cycles. This full analysis which previously would take months can now be completed within a few days and allows to present structural results for the full graph for the first time. The structure of the A5/1 graph deviates notably from the theoretical results for random mappings.Comment: In Proceedings GRAPHITE 2012, arXiv:1210.611

    GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine

    Full text link
    Recent studies showed that single-machine graph processing systems can be as highly competitive as cluster-based approaches on large-scale problems. While several out-of-core graph processing systems and computation models have been proposed, the high disk I/O overhead could significantly reduce performance in many practical cases. In this paper, we propose GraphMP to tackle big graph analytics on a single machine. GraphMP achieves low disk I/O overhead with three techniques. First, we design a vertex-centric sliding window (VSW) computation model to avoid reading and writing vertices on disk. Second, we propose a selective scheduling method to skip loading and processing unnecessary edge shards on disk. Third, we use a compressed edge cache mechanism to fully utilize the available memory of a machine to reduce the amount of disk accesses for edges. Extensive evaluations have shown that GraphMP could outperform state-of-the-art systems such as GraphChi, X-Stream and GridGraph by 31.6x, 54.5x and 23.1x respectively, when running popular graph applications on a billion-vertex graph
    corecore