508 research outputs found
QB2OLAP : enabling OLAP on statistical linked open data
Publication and sharing of multidimensional (MD) data on the Semantic Web (SW) opens new opportunities for the use of On-Line Analytical Processing (OLAP). The RDF Data Cube (QB) vocabulary, the current standard for statistical data publishing, however, lacks key MD concepts such as dimension hierarchies and aggregate functions. QB4OLAP was proposed to remedy this. However, QB4OLAP requires extensive manual annotation and users must still write queries in SPARQL, the standard query language for RDF, which typical OLAP users are not familiar with. In this demo, we present QB2OLAP, a tool for enabling OLAP on existing QB data. Without requiring any RDF, QB(4OLAP), or SPARQL skills, it allows semi-automatic transformation of a QB data set into a QB4OLAP one via enrichment with QB4OLAP semantics, exploration of the enriched schema, and querying with the high-level OLAP language QL that exploits the QB4OLAP semantics and is automatically translated to SPARQL.Peer ReviewedPostprint (author's final draft
Using Ontologies for the Design of Data Warehouses
Obtaining an implementation of a data warehouse is a complex task that forces
designers to acquire wide knowledge of the domain, thus requiring a high level
of expertise and becoming it a prone-to-fail task. Based on our experience, we
have detected a set of situations we have faced up with in real-world projects
in which we believe that the use of ontologies will improve several aspects of
the design of data warehouses. The aim of this article is to describe several
shortcomings of current data warehouse design approaches and discuss the
benefit of using ontologies to overcome them. This work is a starting point for
discussing the convenience of using ontologies in data warehouse design.Comment: 15 pages, 2 figure
Integrating data warehouses with web data : a survey
This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML
technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper
reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces
the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for
XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of
information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover
the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research
line
QB4OLAP : Enabling business intelligence over semantic web data
Premio Primer puesto otorgado por la Academia Nacional de IngenierÃa.The World-Wide Web was initially conceived as a repository of information tailored for human consumption. In the last decade, the idea of transforming the web into a machine-understandable web of data, has gained momentum. To this end, the World Wide Web Consortium (W3C) maintains a set of standards, referred to as the Semantic Web (SW), which allow to openly share data and metadata. Among these is the Resource Description Framework (RDF), which represents data as graphs, RDF-S and OWL to describe the data structure via ontologies or vocabularies, and SPARQL, the RDF query language. On top
of the RDF data model, standards and recommendations can be built to represent data that adheres to other models. The multidimensional (MD) model views data in an n-dimensional space, usually called a data cube, composed of dimensions and facts. The former reflect the perspectives from which data are viewed, and the latter correspond to points in this space, associated with (usually) quantitative data (also known as measures). Facts can be aggregated, disaggregated, and filtered using the dimensions. This process is called Online Analytical Processing (OLAP). Despite the RDF Data Cube Vocabulary (QB) is the W3C standard to represent statistical data, which resembles MD data, it does not include key features needed for OLAP analysis, like dimension hierarchies, dimension level attributes, and aggregate functions. To enable this kind of analysis over SW data cubes, in this thesis we propose the QB4 OLAP vocabulary, an extension of QB. A problem remains, however: writing efficient analytical queries over SW data cubes requires a deep knowledge of RDF and SPARQL, unlikely to be found in typical OLAP users. We address this problem in this thesis. Our approach is based on allowing analytical users to write queries using what they know best: OLAP operations over data cubes, without dealing with SW technicalities. For this, we devised CQL, a simple, high-level query language over data cubes. Then we make use of the structural metadata provided by QB4 OLAP to translate CQL queries into SPARQL ones. We adapt general-purpose SPARQL query optimization techniques, and propose query improvement strategies to produce efficient SPARQL queries. We evaluate our implementation tailoring the well known Star-Schema benchmark, which allows us to compare our proposal against existing ones in a fair way. We show that our approach outperforms other ones. Finally, as another result, our experiments allow us to study which combinations of improvement strategies fits better to an analytical scenario.La World-Wide Web fue concebida como un repositorio de informa- ción a ser procesada y consumida por humanos. Pero en la última década ha ganado impulso la idea de transformar a la Web en una gran base de datos procesables por máquinas. Con este fin, el World Wide Web Consortium (W3C) ha establecido una serie de estándares también conocidos como estándares para la Web Semántica (WS), los cuales permiten compartir datos y metadatos en formatos abiertos. Entre estos estándares se destacan: el Resource Description Framework (RDF), un modelo de datos basado en grafos para representar datos y relaciones entre ellos, RDF-S y OWL que permiten describir la estructura y el significado de los datos por medio de ontologÃas o vocabu- larios, y el lenguaje de consultas SPARQL. Estos estándares pueden ser utilizados para construir representaciones de otros modelos de datos, por ejemplo datos tabulares o datos relacionales. El modelo de datos multidimensional (MD) representa a los datos dentro de un espacio n-dimensional, usualmente denominado cubo de datos, que se compone de dimensiones y hechos. Las primeras reflejan las perspectivas desde las cuales interesa analizar los datos, mientras que las segundas corresponden a puntos en este espacio n- dimensional, a los cuales se asocian valores usualmente numéricos, conocidos como medidas. Los hechos pueden ser agregados y resumidos, desagregados, y filtrados utilizando las dimensiones. Este pro- ceso es conocido como Online Analytical Processing (OLAP). Pese a que la W3C ha establecido un estándar que puede ser utilizado para publicación de datos multidimensionales, conocido como el RDF Data Cube Vocabulary (QB), éste no incluye algunos aspectos del modelo MD que son imprescindibles para realizar análisis tipo OLAP como son las jerarquÃas de dimensión, los atributos en los niveles de dimensión, y las funciones de agregaciónpara resumir valores de medidas. Para permitir este tipo de análisis sobre cubos en la SW, en esta tesis se propone un vocabulario que extiende el vocabulario QB denominado QB4OLAP. Sin embargo, para realizar análisis tipo OLAP en forma eficiente sobre cubos QB4OLAP es necesario un conocimiento profundo de RDF y SPARQL, los cuales distan de ser populares entre los usuarios OLAP tÃpicos. Esta tesis también aborda este problema. Nuestro enfoque consiste en brindar un conjunto de operaciones clásicas para los usuarios OLAP, y luego realizar la traducción en forma automática de estas operaciones en consultas SPARQL. Comenzamos definiendo un lenguaje de consultas para cubos en alto nivel: Cube Query Language (CQL), y luego explotamos la metadata representada mediante QB4OLAP para realizar la traducción a SPARQL. Asimismo, mejoramos el rendimiento de las consultas obtenidas, adaptando y aplicando técnicas existentes de optimización de consultas SPARQL. Para evaluar nuestra propuesta adaptamos a los estándares de la SW el Star Schema benchmark, el cual es el estándar para la evaluación de sistemas tipo OLAP. Esto permite comparar nuestro enfoque con otras propuestas existentes, asi como evaluar el impacto de nuestras estrategias de mejoras de consultas SPARQL. De esta comparación podemos concluir que nuestro enfoque supera a otras propuestas existentes, y que nuestras técnicas de mejoras logran incrementar en 10 veces el rendimiento del sistema
On-line analytical processing
On-line analytical processing (OLAP) describes an approach to decision support, which aims to extract knowledge from a data warehouse, or more specifically, from data marts. Its main idea is providing navigation through data to non-expert users, so that they are able to interactively generate ad hoc queries without the intervention of IT professionals. This name was introduced in contrast to on-line transactional processing (OLTP), so that it reflected the different requirements and characteristics between these classes of uses. The concept falls in the area of business intelligence.Peer ReviewedPostprint (author's final draft
Business Intelligence for Small and Middle-Sized Entreprises
Data warehouses are the core of decision support sys- tems, which nowadays
are used by all kind of enter- prises in the entire world. Although many
studies have been conducted on the need of decision support systems (DSSs) for
small businesses, most of them adopt ex- isting solutions and approaches, which
are appropriate for large-scaled enterprises, but are inadequate for small and
middle-sized enterprises. Small enterprises require cheap, lightweight
architec- tures and tools (hardware and software) providing on- line data
analysis. In order to ensure these features, we review web-based business
intelligence approaches. For real-time analysis, the traditional OLAP
architecture is cumbersome and storage-costly; therefore, we also re- view
in-memory processing. Consequently, this paper discusses the existing approa-
ches and tools working in main memory and/or with web interfaces (including
freeware tools), relevant for small and middle-sized enterprises in decision
making
- …