Search CORE

40 research outputs found

Dynamic Data Structures for Document Collections and Graphs

Author: Munro J. Ian
Nekrich Yakov
Vitter Jeffrey Scott
Publication venue
Publication date: 19/03/2015
Field of study

In the dynamic indexing problem, we must maintain a changing collection of text documents so that we can efficiently support insertions, deletions, and pattern matching queries. We are especially interested in developing efficient data structures that store and query the documents in compressed form. All previous compressed solutions to this problem rely on answering rank and select queries on a dynamic sequence of symbols. Because of the lower bound in [Fredman and Saks, 1989], answering rank queries presents a bottleneck in compressed dynamic indexing. In this paper we show how this lower bound can be circumvented using our new framework. We demonstrate that the gap between static and dynamic variants of the indexing problem can be almost closed. Our method is based on a novel framework for adding dynamism to static compressed data structures. Our framework also applies more generally to dynamizing other problems. We show, for example, how our framework can be applied to develop compressed representations of dynamic graphs and binary relations

arXiv.org e-Print Archive

CiteSeerX

Crossref

Competitive Parallel Disk Prefetching and Buffer Management

Author: Barve Rakesh
Kallahalla Mahesh
Varman Peter J.
Vitter Jeffrey Scott
Publication venue: 'Elsevier BV'
Publication date: 21/03/2011
Field of study

We provide a competitive analysis framework for online prefetching and buffer management algorithms in parallel I/O systems, using a read-once model of block references. This has widespread applicability to key I/O-bound applications such as external merging and concurrent playback of multiple video streams. Two realistic lookahead models, global lookahead and local lookahead, are defined. Algorithms NOM and GREED based on these two forms of lookahead are analyzed for shared buffer and distributed buffer configurations, both of which occur frequently in existing systems. An important aspect of our work is that we show how to implement both the models of lookahead in practice using the simple techniques of forecasting and flushing. Given a -disk parallel I/O system and a globally shared I/O buffer that can hold upto disk blocks, we derive a lower bound of on the competitive ratio of any deterministic online prefetching algorithm with lookahead. NOM is shown to match the lower bound using global -block lookahead. In contrast, using only local lookahead results in an competitive ratio. When the buffer is distributed into portions of blocks each, the algorithm GREED based on local lookahead is shown to be optimal, and NOM is within a constant factor of optimal. Thus we provide a theoretical basis for the intuition that global lookahead is more valuable for prefetching in the case of a shared buffer configuration whereas it is enough to provide local lookahead in case of the distributed configuration. Finally, we analyze the performance of these algorithms for reference strings generated by a uniformly-random stochastic process and we show that they achieve the minimal expected number of I/Os. These results also give bounds on the worst-case expected performance of algorithms which employ randomization in the data layout

KU ScholarWorks

Cylindrical Static and Kinetic Binary Space Partitions

Author: Agarwal Pankaj K.
Guibas Leonidas J.
Murali T. M.
Vitter Jeffrey Scott
Publication venue: 'Elsevier BV'
Publication date: 01/01/2000
Field of study

P. K. Agarwal, L. Guibas, T. M. Murali, and J. S. Vitter. “Cylindrical Static and Kinetic Binary Space Partitions,” Computational Geometry, 16(2), 2000, 103–127. An extended abstract appears in Proceedings of the 13th Annual ACM Symposium on Computational Geometry (SCG ’97), Nice, France, June 1997, 39–48

Elsevier - Publisher Connector

KU ScholarWorks

Cylindrical Static and Kinetic Binary Space Partitions

Author: Agarwal Pankaj K.
Guibas Leonidas J.
Murali T. M.
Vitter Jeffrey Scott
Publication venue: 'Elsevier BV'
Publication date: 21/03/2011
Field of study

KU ScholarWorks

Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays

Author: Chakrabarti K.
Daniel Lemire
Geffner S.
Gray J.
Lemire D.
Li B.-C.
Moerkotte G.
Owen Kaser
Schmidt R. R.
Scott D.
Vitter J. S.
Zhou F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2008
Field of study

Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array of size n and we want to compute the first N local moments, for some constant N. Without precomputation, this requires O(n) time. We develop a sequence of algorithms of increasing sophistication that use precomputation and additional buffer space to speed up queries. The simpler algorithms partition the I/O array into consecutive ranges called bins, and they are applicable not only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM, etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A more sophisticated approach uses hierarchical buffering and has a logarithmic time complexity (O(b log_b n)), when using N hierarchical buffers of size n/b. Using Overlapped Bin Buffering, we show that only a single buffer is needed, as with wavelet-based algorithms, but using much less storage. Applications exist in multidimensional and statistical databases over massive data sets, interactive image processing, and visualization

arXiv.org e-Print Archive

R-libre

Crossref

Efficient Update of Indexes for Dynamically Changing Web Documents

Author: B. Brewington
D.E. Knuth
E. Ukkonen
I.H. Witten
J. Cho
J. Zobel
Jeffrey Scott Vitter
Lipyeow Lim
Min Wang
R. Baeza-Yates
R.A. Baeza-Yates
R.S. Boyer
Ramesh Agarwal
S. Lawrence
Sriram Padmanabhan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Water Event Categorization Using Sub-Metered Water and Coincident Electricity Data

Author: J. Scott Vitter
Michael Webber
Publication venue: 'MDPI AG'
Publication date: 01/05/2018
Field of study

This study evaluated the potential for data from dedicated water sub-meters and circuit-level electricity gauges to support accurate water end-use disaggregation tools. A supervised learning algorithm was trained to categorize end-use events from an existing database consisting of features related to whole-home and hot water use. Additional features were defined based on dedicated irrigation metering and circuit-level electricity gauges on major water appliances. Support vector machine classifiers were trained and tested on portions of the database using multiple feature combinations, and then externally validated on water event data collected under dissimilar conditions from a demonstration house in Austin, Texas, USA. On the testing data, a trained classifier achieved true positive rates for occurrences and volume exceeding 95% for most categories and 93% for toilet events. Performance for faucet events was less than 90%. Initial results suggest that dedicated sub-meters and circuit-level electricity gauges can facilitate highly accurate categorization with simple features that do not rely on flow rate gradients

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals