Search CORE

1,070 research outputs found

Diamond Dicing

Author: Antony
Bouman
Börzsönyi
Cerf
Daniel Lemire
Donjerkovic
Engene
Fang
Frank
Godin
Hahn
Hazel Webb
Kaser
Knorr
Kondo
Korn
Kumar
Lemire
Ley
Mazón
MonetDB BV
Netflix Inc.
Ng
O'Neil
Owen Kaser
Porter
Rizzi
Sarawagi
Tang
Transaction Processing Performance Council
Turney
Webb
Webb
Wille
Ślezak
Publication venue: 'Elsevier BV'
Publication date: 01/09/2013
Field of study

In OLAP, analysts often select an interesting sample of the data. For example, an analyst might focus on products bringing revenues of at least 100 000 dollars, or on shops having sales greater than 400 000 dollars. However, current systems do not allow the application of both of these thresholds simultaneously, selecting products and shops satisfying both thresholds. For such purposes, we introduce the diamond cube operator, filling a gap among existing data warehouse operations. Because of the interaction between dimensions the computation of diamond cubes is challenging. We compare and test various algorithms on large data sets of more than 100 million facts. We find that while it is possible to implement diamonds in SQL, it is inefficient. Indeed, our custom implementation can be a hundred times faster than popular database engines (including a row-store and a column-store).Comment: 29 page

arXiv.org e-Print Archive

R-libre

Crossref

Entropy in Image Analysis II

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

Image analysis is a fundamental task for any application where extracting information from images is required. The analysis requires highly sophisticated numerical and analytical methods, particularly for those applications in medicine, security, and other fields where the results of the processing consist of data of vital importance. This fact is evident from all the articles composing the Special Issue "Entropy in Image Analysis II", in which the authors used widely tested methods to verify their results. In the process of reading the present volume, the reader will appreciate the richness of their methods and applications, in particular for medical imaging and image security, and a remarkable cross-fertilization among the proposed research areas

Directory of Open Access Books (DOAB)

In-memory business intelligence: a Wits context

Author: Dulabh Harshila Ravjee
Publication venue
Publication date: 01/01/2014
Field of study

The organisational demand for real-time, flexible and cheaper approaches to Business Intelligence is impacting the Business Intelligence ecosystem. In-memory databases, in-memory analytics, the availability of 64 bit computing power, as well as the reduced costs of memory, are enabling technologies to meet this demand. This research report examines whether these technologies will have an evolutionary or a revolutionary impact on traditional Business Intelligence implementations. An in-memory analytic solution was developed for University of the Witwatersrand Procurement Office, to evaluate the benefits claimed for the in-memory approach for Business intelligence, in the development, reporting and analysis processes. A survey was used to collect data on the users' experience when using an in-memory solution. The results indicate that the in-memory solution offers a fast, flexible and visually rich user experience. However, there are certain key steps of the traditional BI approach that cannot be omitted. The conclusion reached is that the in-memory approach to Business Intelligence can co-exist with the traditional Business Intelligence approach, so that the merits of both approaches can be leveraged to enhance value for an organisation

Wits Institutional Repository on DSPACE

A Survey on Array Storage, Query Languages, and Systems

Author: Cheng Yu
Rusu Florin
Publication venue
Publication date: 19/02/2013
Field of study

Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

arXiv.org e-Print Archive

CiteSeerX

SoK: Chasing Accuracy and Privacy, and Catching Both in Differentially Private Histogram Publication

Author: Nelson Boel
Reuben Jenni
Publication venue
Publication date: 01/01/2020
Field of study

Histograms and synthetic data are of key importance in data analysis. However, researchers have shown that even aggregated data such as histograms, containing no obvious sensitive attributes, can result in privacy leakage. To enable data analysis, a strong notion of privacy is required to avoid risking unintended privacy violations.Such a strong notion of privacy is differential privacy, a statistical notion of privacy that makes privacy leakage quantifiable. The caveat regarding differential privacy is that while it has strong guarantees for privacy, privacy comes at a cost of accuracy. Despite this trade-off being a central and important issue in the adoption of differential privacy, there exists a gap in the literature regarding providing an understanding of the trade-off and how to address it appropriately. Through a systematic literature review (SLR), we investigate the state-of-the-art within accuracy improving differentially private algorithms for histogram and synthetic data publishing. Our contribution is two-fold: 1) we identify trends and connections in the contributions to the field of differential privacy for histograms and synthetic data and 2) we provide an understanding of the privacy/accuracy trade-off challenge by crystallizing different dimensions to accuracy improvement. Accordingly, we position and visualize the ideas in relation to each other and external work, and deconstruct each algorithm to examine the building blocks separately with the aim of pinpointing which dimension of accuracy improvement each technique/approach is targeting. Hence, this systematization of knowledge (SoK) provides an understanding of in which dimensions and how accuracy improvement can be pursued without sacrificing privacy

Chalmers Research