29,627 research outputs found
Fine-sorting One-dimensional Particle-In-Cell Algorithm with Monte-Carlo Collisions on a Graphics Processing Unit
Particle-in-cell (PIC) simulations with Monte-Carlo collisions are used in
plasma science to explore a variety of kinetic effects. One major problem is
the long run-time of such simulations. Even on modern computer systems, PIC
codes take a considerable amount of time for convergence. Most of the
computations can be massively parallelized, since particles behave
independently of each other within one time step. Current graphics processing
units (GPUs) offer an attractive means for execution of the parallelized code.
In this contribution we show a one-dimensional PIC code running on Nvidia GPUs
using the CUDA environment. A distinctive feature of the code is that size of
the cells that the code uses to sort the particles with respect to their
coordinates is comparable to size of the grid cells used for discretization of
the electric field. Hence, we call the corresponding algorithm "fine-sorting".
Implementation details and optimization of the code are discussed and the
speed-up compared to classical CPU approaches is computed
Handling Massive N-Gram Datasets Efficiently
This paper deals with the two fundamental problems concerning the handling of
large n-gram language models: indexing, that is compressing the n-gram strings
and associated satellite data without compromising their retrieval speed; and
estimation, that is computing the probability distribution of the strings from
a large textual source. Regarding the problem of indexing, we describe
compressed, exact and lossless data structures that achieve, at the same time,
high space reductions and no time degradation with respect to state-of-the-art
solutions and related software packages. In particular, we present a compressed
trie data structure in which each word following a context of fixed length k,
i.e., its preceding k words, is encoded as an integer whose value is
proportional to the number of words that follow such context. Since the number
of words following a given context is typically very small in natural
languages, we lower the space of representation to compression levels that were
never achieved before. Despite the significant savings in space, our technique
introduces a negligible penalty at query time. Regarding the problem of
estimation, we present a novel algorithm for estimating modified Kneser-Ney
language models, that have emerged as the de-facto choice for language modeling
in both academia and industry, thanks to their relatively low perplexity
performance. Estimating such models from large textual sources poses the
challenge of devising algorithms that make a parsimonious use of the disk. The
state-of-the-art algorithm uses three sorting steps in external memory: we show
an improved construction that requires only one sorting step thanks to
exploiting the properties of the extracted n-gram strings. With an extensive
experimental analysis performed on billions of n-grams, we show an average
improvement of 4.5X on the total running time of the state-of-the-art approach.Comment: Published in ACM Transactions on Information Systems (TOIS), February
2019, Article No: 2
Dynamic Ordered Sets with Exponential Search Trees
We introduce exponential search trees as a novel technique for converting
static polynomial space search structures for ordered sets into fully-dynamic
linear space data structures.
This leads to an optimal bound of O(sqrt(log n/loglog n)) for searching and
updating a dynamic set of n integer keys in linear space. Here searching an
integer y means finding the maximum key in the set which is smaller than or
equal to y. This problem is equivalent to the standard text book problem of
maintaining an ordered set (see, e.g., Cormen, Leiserson, Rivest, and Stein:
Introduction to Algorithms, 2nd ed., MIT Press, 2001).
The best previous deterministic linear space bound was O(log n/loglog n) due
Fredman and Willard from STOC 1990. No better deterministic search bound was
known using polynomial space.
We also get the following worst-case linear space trade-offs between the
number n, the word length w, and the maximal key U < 2^w: O(min{loglog n+log
n/log w, (loglog n)(loglog U)/(logloglog U)}). These trade-offs are, however,
not likely to be optimal.
Our results are generalized to finger searching and string searching,
providing optimal results for both in terms of n.Comment: Revision corrects some typoes and state things better for
applications in subsequent paper
A Grammar Compression Algorithm based on Induced Suffix Sorting
We introduce GCIS, a grammar compression algorithm based on the induced
suffix sorting algorithm SAIS, introduced by Nong et al. in 2009. Our solution
builds on the factorization performed by SAIS during suffix sorting. We
construct a context-free grammar on the input string which can be further
reduced into a shorter string by substituting each substring by its
correspondent factor. The resulting grammar is encoded by exploring some
redundancies, such as common prefixes between suffix rules, which are sorted
according to SAIS framework. When compared to well-known compression tools such
as Re-Pair and 7-zip, our algorithm is competitive and very effective at
handling repetitive string regarding compression ratio, compression and
decompression running time
Deterministic sub-linear space LCE data structures with efficient construction
Given a string of symbols, a longest common extension query
asks for the length of the longest common prefix of the
th and th suffixes of . LCE queries have several important
applications in string processing, perhaps most notably to suffix sorting.
Recently, Bille et al. (J. Discrete Algorithms 25:42-50, 2014, Proc. CPM 2015:
65-76) described several data structures for answering LCE queries that offers
a space-time trade-off between data structure size and query time. In
particular, for a parameter , their best deterministic
solution is a data structure of size which allows LCE queries to be
answered in time. However, the construction time for all
deterministic versions of their data structure is quadratic in . In this
paper, we propose a deterministic solution that achieves a similar space-time
trade-off of query time using
space, but significantly improve the construction time to
.Comment: updated titl
Space-Efficient Re-Pair Compression
Re-Pair is an effective grammar-based compression scheme achieving strong
compression rates in practice. Let , , and be the text length,
alphabet size, and dictionary size of the final grammar, respectively. In their
original paper, the authors show how to compute the Re-Pair grammar in expected
linear time and words of working space on top
of the text. In this work, we propose two algorithms improving on the space of
their original solution. Our model assumes a memory word of bits and a re-writable input text composed by such words. Our
first algorithm runs in expected time and uses
words of space on top of the text for any parameter
chosen in advance. Our second algorithm runs in expected
time and improves the space to words
- …