Search CORE

13,887 research outputs found

Write-limited sorts and joins for persistent memory

Author: Chen S.
Kevin L.
Kim H.
Myers D.
Qureshi M. K.
Publication venue: 'VLDB Endowment'
Publication date: 01/01/2014
Field of study

To mitigate the impact of the widening gap between the memory needs of CPUs and what standard memory technology can deliver, system architects have introduced a new class of memory technology termed persistent memory. Persistent memory is byteaddressable, but exhibits asymmetric I/O: writes are typically one order of magnitude more expensive than reads. Byte addressability combined with I/O asymmetry render the performance profile of persistent memory unique. Thus, it becomes imperative to find new ways to seamlessly incorporate it into database systems. We do so in the context of query processing. We focus on the fundamental operations of sort and join processing. We introduce the notion of write-limited algorithms that effectively minimize the I/O cost. We give a high-level API that enables the system to dynamically optimize the workflow of the algorithms; or, alternatively, allows the developer to tune the write profile of the algorithms. We present four different techniques to incorporate persistent memory into the database processing stack in light of this API. We have implemented and extensively evaluated all our proposals. Our results show that the algorithms deliver on their promise of I/O-minimality and tunable performance. We showcase the merits and deficiencies of each implementation technique, thus taking a solid first step towards incorporating persistent memory into query processing. 1

CiteSeerX

Crossref

Edinburgh Research Explorer

Reordering Rows for Better Compression: Beyond the Lexicographic Order

Author: Gutarra Eduardo
Kaser Owen
Lemire Daniel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2012
Field of study

Sorting database tables before compressing them improves the compression rate. Can we do better than the lexicographical order? For minimizing the number of runs in a run-length encoding compression scheme, the best approaches to row-ordering are derived from traveling salesman heuristics, although there is a significant trade-off between running time and compression. A new heuristic, Multiple Lists, which is a variant on Nearest Neighbor that trades off compression for a major running-time speedup, is a good option for very large tables. However, for some compression schemes, it is more important to generate long runs rather than few runs. For this case, another novel heuristic, Vortex, is promising. We find that we can improve run-length encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%: these gains are on top of the gains due to lexicographically sorting the table. We prove that the new row reordering is optimal (within 10%) at minimizing the runs of identical values within columns, in a few cases.Comment: to appear in ACM TOD

arXiv.org e-Print Archive

R-libre

Crossref

Conclave: secure multi-party computation on big data (extended TR)

Author: Araki Toshinori
Beaver Donald
Beaver Donald
Boyle Elette
Faber Sky
Furukawa Jun
Gascón Adrià
Goldreich Oded
Hamlin Ariel
He Xi
Hirschman Albert O.
Ion Mihaela
Jagomägis Roman
Jónsson Kristján Valur
Kamara Seny
Narayan Arjun
U.S. Census Bureau
Yao Andrew C.
Yu Yuan
Zaharia Matei
Zheng Wenting
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Secure Multi-Party Computation (MPC) allows mutually distrusting parties to run joint computations without revealing private data. Current MPC algorithms scale poorly with data size, which makes MPC on "big data" prohibitively slow and inhibits its practical use. Many relational analytics queries can maintain MPC's end-to-end security guarantee without using cryptographic MPC techniques for all operations. Conclave is a query compiler that accelerates such queries by transforming them into a combination of data-parallel, local cleartext processing and small MPC steps. When parties trust others with specific subsets of the data, Conclave applies new hybrid MPC-cleartext protocols to run additional steps outside of MPC and improve scalability further. Our Conclave prototype generates code for cleartext processing in Python and Spark, and for secure MPC using the Sharemind and Obliv-C frameworks. Conclave scales to data sets between three and six orders of magnitude larger than state-of-the-art MPC frameworks support on their own. Thanks to its hybrid protocols, Conclave also substantially outperforms SMCQL, the most similar existing system.Comment: Extended technical report for EuroSys 2019 pape

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)