661 research outputs found
A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing
The overwhelmingly increasing amount of stored data has spurred researchers
seeking different methods in order to optimally take advantage of it which
mostly have faced a response time problem as a result of this enormous size of
data. Most of solutions have suggested materialization as a favourite solution.
However, such a solution cannot attain Real- Time answers anyhow. In this paper
we propose a framework illustrating the barriers and suggested solutions in the
way of achieving Real-Time OLAP answers that are significantly used in decision
support systems and data warehouses
Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation
Increasingly, business projects are ephemeral. New Business Intelligence
tools must support ad-lib data sources and quick perusal. Meanwhile, tag clouds
are a popular community-driven visualization technique. Hence, we investigate
tag-cloud views with support for OLAP operations such as roll-ups, slices,
dices, clustering, and drill-downs. As a case study, we implemented an
application where users can upload data and immediately navigate through its ad
hoc dimensions. To support social networking, views can be easily shared and
embedded in other Web sites. Algorithmically, our tag-cloud views are
approximate range top-k queries over spontaneous data cubes. We present
experimental evidence that iceberg cuboids provide adequate online
approximations. We benchmark several browser-oblivious tag-cloud layout
optimizations.Comment: Software at https://github.com/lemire/OLAPTagClou
Clustering-Based Materialized View Selection in Data Warehouses
Materialized view selection is a non-trivial task. Hence, its complexity must
be reduced. A judicious choice of views must be cost-driven and influenced by
the workload experienced by the system. In this paper, we propose a framework
for materialized view selection that exploits a data mining technique
(clustering), in order to determine clusters of similar queries. We also
propose a view merging algorithm that builds a set of candidate views, as well
as a greedy process for selecting a set of views to materialize. This selection
is based on cost models that evaluate the cost of accessing data using views
and the cost of storing these views. To validate our strategy, we executed a
workload of decision-support queries on a test data warehouse, with and without
using our strategy. Our experimental results demonstrate its efficiency, even
when storage space is limited
Dimensional enrichment of statistical linked open data
On-Line Analytical Processing (OLAP) is a data analysis technique typically used for local and well-prepared data. However, initiatives like Open Data and Open Government bring new and publicly available data on the web that are to be analyzed in the same way. The use of semantic web technologies for this context is especially encouraged by the Linked Data initiative. There is already a considerable amount of statistical linked open data sets published using the RDF Data Cube Vocabulary (QB) which is designed for these purposes. However, QB lacks some essential schema constructs (e.g., dimension levels) to support OLAP. Thus, the QB4OLAP vocabulary has been proposed to extend QB with the necessary constructs and be fully compliant with OLAP. In this paper, we focus on the enrichment of an existing QB data set with QB4OLAP semantics. We first thoroughly compare the two vocabularies and outline the benefits of QB4OLAP. Then, we propose a series of steps to automate the enrichment of QB data sets with specific QB4OLAP semantics; being the most important, the definition of aggregate functions and the detection of new concepts in the dimension hierarchy construction. The proposed steps are defined to form a semi-automatic enrichment method, which is implemented in a tool that enables the enrichment in an interactive and iterative fashion. The user can enrich the QB data set with QB4OLAP concepts (e.g., full-fledged dimension hierarchies) by choosing among the candidate concepts automatically discovered with the steps proposed. Finally, we conduct experiments with 25 users and use three real-world QB data sets to evaluate our approach. The evaluation demonstrates the feasibility of our approach and shows that, in practice, our tool facilitates, speeds up, and guarantees the correct results of the enrichment process.Peer ReviewedPostprint (author's final draft
Data Cube Approximation and Mining using Probabilistic Modeling
On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data.
Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be
used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches
An O(1) Solution to the Prefix Sum Problem on a Specialized Memory Architecture
In this paper we study the Prefix Sum problem introduced by Fredman.
We show that it is possible to perform both update and retrieval in O(1) time
simultaneously under a memory model in which individual bits may be shared by
several words.
We also show that two variants (generalizations) of the problem can be solved
optimally in time under the comparison based model of
computation.Comment: 12 page
Integrating data warehouses with web data : a survey
This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML
technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper
reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces
the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for
XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of
information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover
the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research
line
- …