Search CORE

27 research outputs found

Optimal Hashing in External Memory

Author: Conway Alex
Shilane Philip
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)
Publication date: 01/01/2018
Field of study

Hash tables are a ubiquitous class of dictionary data structures. However, standard hash table implementations do not translate well into the external memory model, because they do not incorporate locality for insertions. Iacono and Patrasu established an update/query tradeoff curve for external-hash tables: a hash table that performs insertions in O(lambda/B) amortized IOs requires Omega(log_lambda N) expected IOs for queries, where N is the number of items that can be stored in the data structure, B is the size of a memory transfer, M is the size of memory, and lambda is a tuning parameter. They provide a complicated hashing data structure, which we call the IP hash table, that meets this curve for lambda that is Omega(log log M + log_M N). In this paper, we present a simpler external-memory hash table, the Bundle of Arrays Hash Table (BOA), that is optimal for a narrower range of lambda. The simplicity of BOAs allows them to be readily modified to achieve the following results: - A new external-memory data structure, the Bundle of Trees Hash Table (BOT), that matches the performance of the IP hash table, while retaining some of the simplicity of the BOAs. - The Cache-Oblivious Bundle of Trees Hash Table (COBOT), the first cache-oblivious hash table. This data structure matches the optimality of BOTs and IP hash tables over the same range of lambda

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Assert(!Defined(Sequential I/O))

Author: Cheng Li
Darren Sawyer
Fred Douglis
Hyong Shim
Philip Shilane
Publication venue: USENIX Association
Publication date: 01/01/2014
Field of study

The term sequential I/O is widely used in systems research with the intuitive understanding that it means consecutive access. From a survey of the literature, though, this intuitive understanding has translated into numerous, inconsistent definitions. Since sequential I/O is such a fundamental concept in systems research, we believe that a sequentiality metric should allow us to compare access patterns in a meaningful way. We explore access properties that could be incorporated into potential metrics for sequential I/O including: access size, gaps between accesses, multi-stream, and inter-arrival time. We then analyze hundreds of largescale storage traces and discuss how potential metrics compare. Interestingly, we find I/O traces considered highly sequential by one metric can be highly random to another metric. We further demonstrate that many plausible metrics are weakly correlated, though metrics weighted by size have more consistency. While there may not be a single metric for sequential I/O that is best in all cases, we believe systems researchers should more carefully consider, and state, which definition they use

CiteSeerX

A survey and classification of storage deduplication systems

Author: Anand Ashok
Arcangeli Andrea
Berliner Brian
Bolosky William J.
Broder Andrei
Chen Feng
Chute Christopher
Clements Austin T.
Collberg Christian
Debnath Biplob
Dong Wei
Douglis Fred
Douglis Fred
Dubnicki Cezary
Dutch
El-Shimi Ahmed
Eshghi Kave
Guo Fanglu
Gupta Aayush
Hong Bo
José Pereira
João Paulo
Kruus Erik
Liguori Anthony
Lillibridge Mark
Lu Guanlin
Manber Udi
Milos Grzegorz
Nath Partho
Ng Chun-Ho
Quinlan Sean
Rhea Sean
Shilane Philip
Srinivasan Kiran
Suzaki Kuniyasu
Tarasov Vasily
Ungureanu Cristian
Wright Jeff
Xia Wen
You Lawrence
Zhu Benjamin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2014
Field of study

The automatic elimination of duplicate data in a storage system commonly known as deduplication is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid state disks, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development. The first contribution of this paper is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope. This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.This work is funded by the European Regional Development Fund (EDRF) through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the Fundacao para a Ciencia e a Tecnologia (FCT; Portuguese Foundation for Science and Technology) within project RED FCOMP-01-0124-FEDER-010156 and the FCT by PhD scholarship SFRH-BD-71372-2010

Universidade do Minho: RepositoriUM

Crossref

Distinctive regions of 3d surfaces

Author: Philip Shilane
Thomas Funkhouser
Publication venue
Publication date: 01/01/2007
Field of study

Selecting the most important regions of a surface is useful for shape matching and a variety of applications in computer graphics and geometric modeling. While previous research has analyzed geometric properties of meshes in isolation, we select regions that distinguish a shape from objects of a different type. Our approach to analyzing distinctive regions is based on performing a shape-based search using each region as a query into a database. Distinctive regions of a surface have shape consistent with objects of the same type and different from objects of other types. We demonstrate the utility of detecting distinctive surface regions for shape matching and other graphics applications including mesh visualization, icon generation, and mesh simplification

CiteSeerX