5 research outputs found

    An Iterative Methodology for Defining Big Data Analytics Architectures

    Get PDF
    Thanks to the advances achieved in the last decade, the lack of adequate technologies to deal with Big Data characteristics such as Data Volume is no longer an issue. Instead, recent studies highlight that one of the main Big Data issues is the lack of expertise to select adequate technologies and build the correct Big Data architecture for the problem at hand. In order to tackle this problem, we present our methodology for the generation of Big Data pipelines based on several requirements derived from Big Data features that are critical for the selection of the most appropriate tools and techniques. Thus, thanks to our approach we reduce the required know-how to select and build Big Data architectures by providing a step-by-step methodology that leads Big Data architects into creating their Big Data Pipelines for the case at hand. Our methodology has been tested in two use cases.This work has been funded by the ECLIPSE project (RTI2018-094283-B-C32) from the Spanish Ministry of Science, Innovation and Universities

    A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques

    Get PDF
    In recent years, several new technologies have enabled OLAP processing over Big Data sources. Among these technologies, we highlight those that allow data pre-aggregation because of their demonstrated performance in data querying. This is the case of Apache Kylin, a Hadoop based technology that supports sub-second queries over fact tables with billions of rows combined with ultra high cardinality dimensions. However, taking advantage of data pre-aggregation techniques to designing analytic models for Big Data OLAP is not a trivial task. It requires very advanced knowledge of the underlying technologies and user querying patterns. A wrong design of the OLAP cube alters significantly several key performance metrics, including: (i) the analytic capabilities of the cube (time and ability to provide an answer to a query), (ii) size of the OLAP cube, and (iii) time required to build the OLAP cube. Therefore, in this paper we (i) propose a benchmark to aid Big Data OLAP designers to choose the most suitable cube design for their goals, (ii) we identify and describe the main requirements and trade-offs for effectively designing a Big Data OLAP cube taking advantage of data pre-aggregation techniques, and (iii) we validate our benchmark in a case study.This work has been funded by the ECLIPSE project (RTI2018-094283-B-C32) from the Spanish Ministry of Science, Innovation and Universities

    Un método iterativo y alineado con los planes estratégicos para modelar, integrar y analizar datos en escenarios Big Data

    Get PDF
    En esta tesis doctoral se analizan las características de las fuentes de datos Big Data así como las aproximaciones existentes para su procesamiento y uso en aplicaciones de Inteligencia de Negocio. Cómo resultado principal de esta investigación, se presenta una metodología para la gestión, análisis y visualización del Big Data. Esta metodología está basada en el análisis de los requisitos de las aplicaciones de Inteligencia Negocio, guiando de forma sistemática la aplicación del resto técnicas presentadas: (i) un método para la generación del diseño y validación de la arquitectura Big Data, (ii) técnicas para las integración eficiente de las fuentes de datos, (iii) diseño de los modelos de datos óptimos y comparación del rendimiento en sistemas Big Data OLAP (On-Line Analytical Processing) y (iv) diseño de aplicaciones de Inteligencia de Negocio colaborativas. La metodología y métodos propuestos ayudan a reducir la alta tasa de fracaso existente en la implantación de estrategias de Big Data en las organizaciones. Además, la propuesta de benchmarking presentada para sistemas Big Data OLAP es la primera aproximación conocida para este tipo de sistemas, permitiendo su estudio y comparación. Los sistemas Big Data OLAP permiten la ejecución de consultas analíticas, informes o cuadros de mando con tiempos de respuesta inferiores al segundo sobre modelos de datos con tablas de hasta decenas de miles de millones de filas

    Beyond TPC-DS, a benchmark for Big Data OLAP systems (BDOLAP-Bench)

    No full text
    Online Analytical Processing (OLAP) systems with Big Data support allow storing tables of up to tens of billions of rows or terabytes of data. At the same time, these tools allow the execution of analytical queries with interactive response times, thus making them suitable for the implementation of Business Intelligence applications. However, since there can be significant differences in query and data loading performance between current Big Data OLAP tools, it is worthwhile to evaluate and compare them using a benchmark. But we identified that none of the existing approaches are really suitable for this type of system. To address this, in this research we propose a new benchmark specifically designed for Big Data OLAP systems and based on the widely adopted TPC-DS benchmark. To overcome TPC-DS inadequacy, we propose (i) a set of transformations to support the implementation of its sales data mart on any current Big Data OLAP system, (ii) a choice of 16 genuine OLAP queries, and (iii) an improved data maintenance performance metric. Moreover, we validated our benchmark through its implementation on four representative systems.This research has been funded by the AETHER-UA project (PID2020-112540RB-C43) of the Spanish Ministry of Science and Innovation and by the BALLADEER project (PROMETEO/2021/088), funded by the Conselleria d’Innovació, Universitats, Ciència i Societat Digital

    A Novel Multidimensional Approach to Integrate Big Data in Business Intelligence

    No full text
    The huge amount of information available and its heterogeneity has surpassed the capacity of current data management technologies. Dealing with huge amounts of structured and unstructured data, often referred as Big Data, is a hot research topic and a technological challenge. In this paper, the authors present an approach aimed to enable OLAP queries over different, heterogeneous, data sources. Their approach is based on a MapReduce paradigm, which integrates different formats into the recent RDF Data Cube format. The benefits of their approach are that it is capable of querying different sources of information, while maintaining at the same time, an integrated, comprehensive view of the data available. The paper discusses the advantages and disadvantages, as well as the implementation challenges that such approach presents. Furthermore, the approach is evaluated in detail by means of a case study.This work has been funded by the Spanish Ministry of Economy and Competitiveness under the project Grant GEODAS-BI (TIN2012-37493-C03-03)
    corecore