40 research outputs found

    Dynamic Data Structures for Document Collections and Graphs

    Full text link
    In the dynamic indexing problem, we must maintain a changing collection of text documents so that we can efficiently support insertions, deletions, and pattern matching queries. We are especially interested in developing efficient data structures that store and query the documents in compressed form. All previous compressed solutions to this problem rely on answering rank and select queries on a dynamic sequence of symbols. Because of the lower bound in [Fredman and Saks, 1989], answering rank queries presents a bottleneck in compressed dynamic indexing. In this paper we show how this lower bound can be circumvented using our new framework. We demonstrate that the gap between static and dynamic variants of the indexing problem can be almost closed. Our method is based on a novel framework for adding dynamism to static compressed data structures. Our framework also applies more generally to dynamizing other problems. We show, for example, how our framework can be applied to develop compressed representations of dynamic graphs and binary relations

    Competitive Parallel Disk Prefetching and Buffer Management

    Get PDF
    We provide a competitive analysis framework for online prefetching and buffer management algorithms in parallel I/O systems, using a read-once model of block references. This has widespread applicability to key I/O-bound applications such as external merging and concurrent playback of multiple video streams. Two realistic lookahead models, global lookahead and local lookahead, are defined. Algorithms NOM and GREED based on these two forms of lookahead are analyzed for shared buffer and distributed buffer configurations, both of which occur frequently in existing systems. An important aspect of our work is that we show how to implement both the models of lookahead in practice using the simple techniques of forecasting and flushing. Given a -disk parallel I/O system and a globally shared I/O buffer that can hold upto disk blocks, we derive a lower bound of on the competitive ratio of any deterministic online prefetching algorithm with lookahead. NOM is shown to match the lower bound using global -block lookahead. In contrast, using only local lookahead results in an competitive ratio. When the buffer is distributed into portions of blocks each, the algorithm GREED based on local lookahead is shown to be optimal, and NOM is within a constant factor of optimal. Thus we provide a theoretical basis for the intuition that global lookahead is more valuable for prefetching in the case of a shared buffer configuration whereas it is enough to provide local lookahead in case of the distributed configuration. Finally, we analyze the performance of these algorithms for reference strings generated by a uniformly-random stochastic process and we show that they achieve the minimal expected number of I/Os. These results also give bounds on the worst-case expected performance of algorithms which employ randomization in the data layout

    Cylindrical Static and Kinetic Binary Space Partitions

    Get PDF
    P. K. Agarwal, L. Guibas, T. M. Murali, and J. S. Vitter. “Cylindrical Static and Kinetic Binary Space Partitions,” Computational Geometry, 16(2), 2000, 103–127. An extended abstract appears in Proceedings of the 13th Annual ACM Symposium on Computational Geometry (SCG ’97), Nice, France, June 1997, 39–48

    Cylindrical Static and Kinetic Binary Space Partitions

    Get PDF
    P. K. Agarwal, L. Guibas, T. M. Murali, and J. S. Vitter. “Cylindrical Static and Kinetic Binary Space Partitions,” Computational Geometry, 16(2), 2000, 103–127. An extended abstract appears in Proceedings of the 13th Annual ACM Symposium on Computational Geometry (SCG ’97), Nice, France, June 1997, 39–48

    Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays

    Get PDF
    Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array of size n and we want to compute the first N local moments, for some constant N. Without precomputation, this requires O(n) time. We develop a sequence of algorithms of increasing sophistication that use precomputation and additional buffer space to speed up queries. The simpler algorithms partition the I/O array into consecutive ranges called bins, and they are applicable not only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM, etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A more sophisticated approach uses hierarchical buffering and has a logarithmic time complexity (O(b log_b n)), when using N hierarchical buffers of size n/b. Using Overlapped Bin Buffering, we show that only a single buffer is needed, as with wavelet-based algorithms, but using much less storage. Applications exist in multidimensional and statistical databases over massive data sets, interactive image processing, and visualization

    Water Event Categorization Using Sub-Metered Water and Coincident Electricity Data

    No full text
    This study evaluated the potential for data from dedicated water sub-meters and circuit-level electricity gauges to support accurate water end-use disaggregation tools. A supervised learning algorithm was trained to categorize end-use events from an existing database consisting of features related to whole-home and hot water use. Additional features were defined based on dedicated irrigation metering and circuit-level electricity gauges on major water appliances. Support vector machine classifiers were trained and tested on portions of the database using multiple feature combinations, and then externally validated on water event data collected under dissimilar conditions from a demonstration house in Austin, Texas, USA. On the testing data, a trained classifier achieved true positive rates for occurrences and volume exceeding 95% for most categories and 93% for toilet events. Performance for faucet events was less than 90%. Initial results suggest that dedicated sub-meters and circuit-level electricity gauges can facilitate highly accurate categorization with simple features that do not rely on flow rate gradients

    US&R

    No full text

    US&R

    No full text

    Competitive Parallel Disk Prefetching and Buffer Management

    No full text
    We provide a competitive analysis framework for online prefetching and buffer management algorithms in parallel I/O systems, using a read-once model of block references. This has widespread applicability to key I/O-bound applications such as external merging and concurrent playback of multiple video streams. Two realistic lookahead models, global lookahead and local lookahead, are defined. Algorithms NOM and GREED based on these two forms of lookahead are analyzed for shared buffer and distributed buffer configurations, both of which occur frequently in existing systems. An important aspect of our work is that we show how to implement both the models of lookahead in practice using the simple techniques of forecasting and flushing. Given a D-disk parallel I/O system and a globally shared I/O buffer that can hold upto M disk blocks, we derive a lower bound of\Omega\Gamma p D) on the competitive ratio of any deterministic online prefetching algorithm with O(M) lookahead. NOM is shown to..
    corecore