747 research outputs found
08081 Abstracts Collection -- Data Structures
From February 17th to 22nd 2008, the Dagstuhl Seminar 08081 ``Data Structures\u27\u27 was held in the International Conference and Research Center (IBFI),
Schloss Dagstuhl. It brought together 49 researchers from four continents to discuss recent developments concerning data structures in terms of research but also in terms of new technologies that impact how data can be stored, updated,
and retrieved.
During the seminar a fair number of participants presented their current
research. There was discussion of ongoing work, and in addition an open problem
session was held. This paper first describes the seminar topics and goals in general, then gives the minutes of the open problem session, and concludes with
abstracts of the presentations given during the seminar.
Where appropriate and available, links to extended abstracts or full papers are provided
Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential
Emerging computer architectures will feature drastically decreased flops/byte
(ratio of peak processing rate to memory bandwidth) as highlighted by recent
studies on Exascale architectural trends. Further, flops are getting cheaper
while the energy cost of data movement is increasingly dominant. The
understanding and characterization of data locality properties of computations
is critical in order to guide efforts to enhance data locality. Reuse distance
analysis of memory address traces is a valuable tool to perform data locality
characterization of programs. A single reuse distance analysis can be used to
estimate the number of cache misses in a fully associative LRU cache of any
size, thereby providing estimates on the minimum bandwidth requirements at
different levels of the memory hierarchy to avoid being bandwidth bound.
However, such an analysis only holds for the particular execution order that
produced the trace. It cannot estimate potential improvement in data locality
through dependence preserving transformations that change the execution
schedule of the operations in the computation. In this article, we develop a
novel dynamic analysis approach to characterize the inherent locality
properties of a computation and thereby assess the potential for data locality
enhancement via dependence preserving transformations. The execution trace of a
code is analyzed to extract a computational directed acyclic graph (CDAG) of
the data dependences. The CDAG is then partitioned into convex subsets, and the
convex partitioning is used to reorder the operations in the execution trace to
enhance data locality. The approach enables us to go beyond reuse distance
analysis of a single specific order of execution of the operations of a
computation in characterization of its data locality properties. It can serve a
valuable role in identifying promising code regions for manual transformation,
as well as assessing the effectiveness of compiler transformations for data
locality enhancement. We demonstrate the effectiveness of the approach using a
number of benchmarks, including case studies where the potential shown by the
analysis is exploited to achieve lower data movement costs and better
performance.Comment: Transaction on Architecture and Code Optimization (2014
10091 Abstracts Collection -- Data Structures
From February 28th to March 5th 2010, the Dagstuhl Seminar 10091 "Data
Structures" was held in Schloss Dagstuhl~--~Leibniz Center for
Informatics. It brought together 45 international researchers to
discuss recent developments concerning data structures in terms of
research, but also in terms of new technologies that impact how data
can be stored, updated, and retrieved. During the seminar a fair
number of participants presented their current research and open
problems where discussed. This document first briefly describes the
seminar topics and then gives the abstracts of the presentations given
during the seminar
Cache-oblivious index for approximate string matching
This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((nlog kn)B) disk pages and finds all k-error matches with O((|P|+occ)B+log knloglog Bn) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω (|P|+occ+poly(logn)) I/Os. The second index reduces the space to O((nlogn)B) disk pages, and the I/O complexity is O((|P|+occ)B+log k(k+1)nloglogn) . © 2011 Elsevier B.V. All rights reserved.postprin
Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems
Two emerging hardware trends will dominate the database system technology in
the near future: increasing main memory capacities of several TB per server and
massively parallel multi-core processing. Many algorithmic and control
techniques in current database technology were devised for disk-based systems
where I/O dominated the performance. In this work we take a new look at the
well-known sort-merge join which, so far, has not been in the focus of research
in scalable massively parallel multi-core data processing as it was deemed
inferior to hash joins. We devise a suite of new massively parallel sort-merge
(MPSM) join algorithms that are based on partial partition-based sorting.
Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a
hard to parallelize final merge step to create one complete sort order. Rather
they work on the independently created runs in parallel. This way our MPSM
algorithms are NUMA-affine as all the sorting is carried out on local memory
partitions. An extensive experimental evaluation on a modern 32-core machine
with one TB of main memory proves the competitive performance of MPSM on large
main memory databases with billions of objects. It scales (almost) linearly in
the number of employed cores and clearly outperforms competing hash join
proposals - in particular it outperforms the "cutting-edge" Vectorwise parallel
query engine by a factor of four.Comment: VLDB201
Acceleration of Computational Geometry Algorithms for High Performance Computing Based Geo-Spatial Big Data Analysis
Geo-Spatial computing and data analysis is the branch of computer science that deals with real world location-based data. Computational geometry algorithms are algorithms that process geometry/shapes and is one of the pillars of geo-spatial computing. Real world map and location-based data can be huge in size and the data structures used to process them extremely big leading to huge computational costs. Furthermore, Geo-Spatial datasets are growing on all V’s (Volume, Variety, Value, etc.) and are becoming larger and more complex to process in-turn demanding more computational resources. High Performance Computing is a way to breakdown the problem in ways that it can run in parallel on big computers with massive processing power and hence reduce the computing time delivering the same results but much faster.This dissertation explores different techniques to accelerate the processing of computational geometry algorithms and geo-spatial computing like using Many-core Graphics Processing Units (GPU), Multi-core Central Processing Units (CPU), Multi-node setup with Message Passing Interface (MPI), Cache optimizations, Memory and Communication optimizations, load balancing, Algorithmic Modifications, Directive based parallelization with OpenMP or OpenACC and Vectorization with compiler intrinsic (AVX). This dissertation has applied at least one of the mentioned techniques to the following problems. Novel method to parallelize plane sweep based geometric intersection for GPU with directives is presented. Parallelization of plane sweep based Voronoi construction, parallelization of Segment tree construction, Segment tree queries and Segment tree-based operations has been presented. Spatial autocorrelation, computation of getis-ord hotspots are also presented. Acceleration performance and speedup results are presented in each corresponding chapter
What Storage Access Privacy is Achievable with Small Overhead?
Oblivious RAM (ORAM) and private information retrieval (PIR) are classic
cryptographic primitives used to hide the access pattern to data whose storage
has been outsourced to an untrusted server. Unfortunately, both primitives
require considerable overhead compared to plaintext access. For large-scale
storage infrastructure with highly frequent access requests, the degradation in
response time and the exorbitant increase in resource costs incurred by either
ORAM or PIR prevent their usage. In an ideal scenario, a privacy-preserving
storage protocols with small overhead would be implemented for these heavily
trafficked storage systems to avoid negatively impacting either performance
and/or costs. In this work, we study the problem of the best $\mathit{storage\
access\ privacy}\mathit{small\ overhead}\mathit{differential\ privacy\ access}\mathit{oblivious\ access}\epsilon = \Omega(\log n)\epsilon = \Theta(\log n)O(1)\epsilon = \Theta(\log n)O(\log\log n)$
overhead. This construction uses a new oblivious, two-choice hashing scheme
that may be of independent interest.Comment: To appear at PODS'1
Parallel Longest Increasing Subsequence and van Emde Boas Trees
This paper studies parallel algorithms for the longest increasing subsequence
(LIS) problem. Let be the input size and be the LIS length of the
input. Sequentially, LIS is a simple problem that can be solved using dynamic
programming (DP) in work. However, parallelizing LIS is a
long-standing challenge. We are unaware of any parallel LIS algorithm that has
optimal work and non-trivial parallelism (i.e., or
span).
This paper proposes a parallel LIS algorithm that costs work,
span, and space, and is much simpler than the previous
parallel LIS algorithms. We also generalize the algorithm to a weighted version
of LIS, which maximizes the weighted sum for all objects in an increasing
subsequence. To achieve a better work bound for the weighted LIS algorithm, we
designed parallel algorithms for the van Emde Boas (vEB) tree, which has the
same structure as the sequential vEB tree, and supports work-efficient parallel
batch insertion, deletion, and range queries.
We also implemented our parallel LIS algorithms. Our implementation is
light-weighted, efficient, and scalable. On input size , our LIS
algorithm outperforms a highly-optimized sequential algorithm (with cost) on inputs with . Our algorithm is also much faster
than the best existing parallel implementation by Shen et al. (2022) on all
input instances.Comment: to be published in Proceedings of the 35th ACM Symposium on
Parallelism in Algorithms and Architectures (SPAA '23
- …