Search CORE

603 research outputs found

Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation

Author: Aouiche Kamel
Godin Robert
Lemire Daniel
Publication venue
Publication date: 15/03/2016
Field of study

Increasingly, business projects are ephemeral. New Business Intelligence tools must support ad-lib data sources and quick perusal. Meanwhile, tag clouds are a popular community-driven visualization technique. Hence, we investigate tag-cloud views with support for OLAP operations such as roll-ups, slices, dices, clustering, and drill-downs. As a case study, we implemented an application where users can upload data and immediately navigate through its ad hoc dimensions. To support social networking, views can be easily shared and embedded in other Web sites. Algorithmically, our tag-cloud views are approximate range top-k queries over spontaneous data cubes. We present experimental evidence that iceberg cuboids provide adequate online approximations. We benchmark several browser-oblivious tag-cloud layout optimizations.Comment: Software at https://github.com/lemire/OLAPTagClou

arXiv.org e-Print Archive

CiteSeerX

Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays

Author: Chakrabarti K.
Daniel Lemire
Geffner S.
Gray J.
Lemire D.
Li B.-C.
Moerkotte G.
Owen Kaser
Schmidt R. R.
Scott D.
Vitter J. S.
Zhou F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2008
Field of study

Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array of size n and we want to compute the first N local moments, for some constant N. Without precomputation, this requires O(n) time. We develop a sequence of algorithms of increasing sophistication that use precomputation and additional buffer space to speed up queries. The simpler algorithms partition the I/O array into consecutive ranges called bins, and they are applicable not only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM, etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A more sophisticated approach uses hierarchical buffering and has a logarithmic time complexity (O(b log_b n)), when using N hierarchical buffers of size n/b. Using Overlapped Bin Buffering, we show that only a single buffer is needed, as with wavelet-based algorithms, but using much less storage. Applications exist in multidimensional and statistical databases over massive data sets, interactive image processing, and visualization

arXiv.org e-Print Archive

R-libre

Crossref

Umzi: Unified Multi-Zone Indexing for Large-Scale HTAP

Author: Barber Ronald
Luo Chen
Raman Vijayshankar
Sidle Richard
Tian Yuanyuan
Tözün Pinar
Publication venue
Publication date: 01/01/2019
Field of study

The IT University of Copenhagen's Repository

Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation

Author: Godin Robert
Kamel Aouiche
Lemire Daniel
Publication venue
Publication date: 01/01/2008
Field of study

Archipel - Université du Québec à Montréal

Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation

Author: Aouiche Kamel
Godin Robert
Lemire Daniel
Publication venue
Publication date: 01/01/2008
Field of study

R-libre

Archipel - Université du Québec à Montréal

Secondary Indexing in One Dimension: Beyond B-trees and Bitmap Indexes

Author: Pagh Rasmus
Rao S. Srinivasa
Publication venue
Publication date: 18/11/2008
Field of study

Let S be a finite, ordered alphabet, and let x = x_1 x_2 ... x_n be a string over S. A "secondary index" for x answers alphabet range queries of the form: Given a range [a_l,a_r] over S, return the set I_{[a_l;a_r]} = {i |x_i \in [a_l; a_r]}. Secondary indexes are heavily used in relational databases and scientific data analysis. It is well-known that the obvious solution, storing a dictionary for the position set associated with each character, does not always give optimal query time. In this paper we give the first theoretically optimal data structure for the secondary indexing problem. In the I/O model, the amount of data read when answering a query is within a constant factor of the minimum space needed to represent I_{[a_l;a_r]}, assuming that the size of internal memory is (|S| log n)^{delta} blocks, for some constant delta > 0. The space usage of the data structure is O(n log |S|) bits in the worst case, and we further show how to bound the size of the data structure in terms of the 0-th order entropy of x. We show how to support updates achieving various time-space trade-offs. We also consider an approximate version of the basic secondary indexing problem where a query reports a superset of I_{[a_l;a_r]} containing each element not in I_{[a_l;a_r]} with probability at most epsilon, where epsilon > 0 is the false positive probability. For this problem the amount of data that needs to be read by the query algorithm is reduced to O(|I_{[a_l;a_r]}| log(1/epsilon)) bits.Comment: 16 page

arXiv.org e-Print Archive

The IT University of Copenhagen's Repository