19,794 research outputs found
Towards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data Management
Spreadsheet software is the tool of choice for interactive ad-hoc data
management, with adoption by billions of users. However, spreadsheets are not
scalable, unlike database systems. On the other hand, database systems, while
highly scalable, do not support interactivity as a first-class primitive. We
are developing DataSpread, to holistically integrate spreadsheets as a
front-end interface with databases as a back-end datastore, providing
scalability to spreadsheets, and interactivity to databases, an integration we
term presentational data management (PDM). In this paper, we make a first step
towards this vision: developing a storage engine for PDM, studying how to
flexibly represent spreadsheet data within a database and how to support and
maintain access by position. We first conduct an extensive survey of
spreadsheet use to motivate our functional requirements for a storage engine
for PDM. We develop a natural set of mechanisms for flexibly representing
spreadsheet data and demonstrate that identifying the optimal representation is
NP-Hard; however, we develop an efficient approach to identify the optimal
representation from an important and intuitive subclass of representations. We
extend our mechanisms with positional access mechanisms that don't suffer from
cascading update issues, leading to constant time access and modification
performance. We evaluate these representations on a workload of typical
spreadsheets and spreadsheet operations, providing up to 20% reduction in
storage, and up to 50% reduction in formula evaluation time
Distributed-Memory Breadth-First Search on Massive Graphs
This chapter studies the problem of traversing large graphs using the
breadth-first search order on distributed-memory supercomputers. We consider
both the traditional level-synchronous top-down algorithm as well as the
recently discovered direction optimizing algorithm. We analyze the performance
and scalability trade-offs in using different local data structures such as CSR
and DCSC, enabling in-node multithreading, and graph decompositions such as 1D
and 2D decomposition.Comment: arXiv admin note: text overlap with arXiv:1104.451
Succinct Indexable Dictionaries with Applications to Encoding -ary Trees, Prefix Sums and Multisets
We consider the {\it indexable dictionary} problem, which consists of storing
a set for some integer , while supporting the
operations of \Rank(x), which returns the number of elements in that are
less than if , and -1 otherwise; and \Select(i) which returns
the -th smallest element in . We give a data structure that supports both
operations in O(1) time on the RAM model and requires bits to store a set of size , where {\cal B}(n,m) = \ceil{\lg
{m \choose n}} is the minimum number of bits required to store any -element
subset from a universe of size . Previous dictionaries taking this space
only supported (yes/no) membership queries in O(1) time. In the cell probe
model we can remove the additive term in the space bound,
answering a question raised by Fich and Miltersen, and Pagh.
We present extensions and applications of our indexable dictionary data
structure, including:
An information-theoretically optimal representation of a -ary cardinal
tree that supports standard operations in constant time,
A representation of a multiset of size from in bits that supports (appropriate generalizations of) \Rank
and \Select operations in constant time, and
A representation of a sequence of non-negative integers summing up to
in bits that supports prefix sum queries in constant
time.Comment: Final version of SODA 2002 paper; supersedes Leicester Tech report
2002/1
- …