Search CORE

Scientific Publications of the University of Toulouse II Le Mirail

Benchmarking Big Data OLAP NoSQL Databases

Author: A Jacobs
H Zhao
K Dehdouh
K Morfonios
K-H Lee
M Chevalier
M Chevalier
M Stonebraker
P Vassiliadis
S Chaudhuri
T Ivanov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/05/2018
Field of study

With the advent of Big Data, new challenges have emerged regarding the evaluation of decision support systems (DSS). Existing evaluation benchmarks are not configured to handle a massive data volume and wide data diversity. In this paper, we introduce a new DSS benchmark that supports multiple data storage systems, such as relational and Not Only SQL (NoSQL) systems. Our scheme recognizes numerous data models (snowflake, star and flat topologies) and several data formats (CSV, JSON, TBL, XML, etc.). It entails complex data generation characterized within “volume, variety, and velocity” framework (3 V). Next, our scheme enables distributed and parallel data generation. Furthermore, we exhibit some experimental results with KoalaBench

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Entrepôts de données multidimensionnelles NoSQL

Author: Chevalier Max
El Malki Mohammed
Kopliku Arlind
Teste Olivier
Tournier Ronan
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

International audienceLes données des systèmes d'analyse en ligne (OLAP, On-Line Analytical Processing) sont traditionnellement gérées par des bases de données relationnelles. Malheureusement, il devient difficile de gérer des mégadonnées (de gros volumes de données, « Big Data »). Dans un tel contexte, comme alternative, les environnements « Not-Only SQL » (NoSQL) peuvent fournir un passage à l'échelle tout en gardant une certaine flexibilité pour un système OLAP. Nous définissons ainsi des règles pour convertir un schéma en étoile, ainsi que son optimisation, le treillis d'agrégats pré-calculés, en deux modèles logiques NoSQL : orienté-colonnes ou orienté-documents. En utilisant ces règles, nous implémentons et analysons deux systèmes décisionnels, un par modèle, avec MongoDB et HBase. Nous comparons ces derniers sur les phases de chargement des données (générées avec le benchmark TPC-DS), de calcul d'un treillis et d'interrogation

Scientific Publications of the University of Toulouse II Le Mirail

Implementation of Multidimensional Databases with Document-Oriented NoSQL

Author: F Ravat
G Colliat
H Zhao
K Dehdouh
M Golfarelli
M Stonebraker
P O’Neil
R Kimball
S Chaudhuri
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

International audienceNoSQL (Not Only SQL) systems are becoming popular due to known advantages such as horizontal scalability and elasticity. In this paper, we study the implementation of data warehouses with document-oriented NoSQL systems. We propose mapping rules that transform the multidimensional data model to logical document-oriented models. We consider three different logical models and we use them to instantiate data warehouses. We focus on data loading, model-to-model conversion and OLAP cuboid computation

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Implementation of multidimensional databases in column-oriented NoSQL systems

Author: Chevalier Max
El Malki Mohammed
Kopliku Arlind
Teste Olivier
Tournier Ronan
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

International audienceNoSQL (Not Only SQL) systems are becoming popular due to known advantages such as horizontal scalability and elasticity. In this paper, we study the implementation of multidimensional data warehouses with columnoriented NoSQL systems. We define mapping rules that transform the conceptual multidimensional data model to logical column-oriented models. We consider three different logical models and we use them to instantiate data warehouses. We focus on data loading, model-to-model conversion and OLAP cuboid computation

Digital library of Brno University of Technology

In-Memory Databases

Author: Možucha Jakub
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2014
Field of study

Táto práca sa zaoberá databázami pracujúcimi v pamäti a tiež konceptmi, ktoré boli vyvinuté na vytvorenie takýchto systémov, pretože dáta sú v týchto databázach uložené v hlavnej pamäti, ktorá je schopná spracovať data niekoľkokrát rýchlejšie, ale je to súčasne nestabilné pamäťové medium. Na podloženie týchto konceptov je v práci zhrnutý vývoj databázových systémov od počiatku ich vývoja až do súčasnosti. Prvými databázovými typmi boli hierarchické a sieťové databázy, ktoré boli už v 70. rokoch 20. storočia nahradené prvými relačnými databázami ktorých vývoj trvá až do dnes a v súčastnosti sú zastúpené hlavne OLTP a OLAP systémami. Ďalej sú spomenuté objektové, objektovo-relačné a NoSQL databázy a spomenuté je tiež rozširovanie Big Dát a možnosti ich spracovania. Pre porozumenie uloženia dát v hlavnej pamäti je predstavená pamäťová hierarchia od registrov procesoru, cez cache a hlavnú pamäť až po pevné disky spolu s informáciami o latencii a stabilite týchto pamäťových médií. Ďalej sú spomenuté možnosti usporiadania dát v pamäti a je vysvetlené riadkové a stĺpcové usporiadanie dát spolu s možnosťami ich využitia pre čo najvyšší výkon pri spracovaní dát. V tejto sekcii sú spomenuté aj kompresné techniky, ktoré slúžia na čo najúspornejšie využitie priestoru hlavnej pamäti. V nasledujúcej sekcii sú uvedené postupy, ktoré zabezpečujú, že zmeny v týchto databázach sú persistentné aj napriek tomu, že databáza beží na nestabilnom pamäťovom médiu. Popri tradičných technikách zabezpečujúcich trvanlivosť zmien je predstavený koncept diferenciálnej vyrovnávacej pamäte do ktorej sa ukladajú všetky zmeny v a taktiež je popísaný proces spájania dát z tejto vyrovnávacej pamäti a dát z hlavného úložiska. V ďalšej sekcii práce je prehľad existujúcich databáz, ktoré pracujú v pamäti ako SAP HANA, Times Ten od Oracle ale aj hybridných systémov, ktoré pracujú primárne na disku, ale sú schopné pracovať aj v pamäti. Jedným z takýchto systémov je SQLite. Táto sekcia porovnáva jednotlivé systémy, hodnotí nakoľko využívajú koncepty predstavené v predchádzajúcich kapitolách, a na jej konci je tabuľka kde sú prehľadne zobrazené informácie o týchto systémoch. Ďalšie časti práce sa týkajú už samotného testovania výkonnosti týchto databáz. Zo začiatku sú popísané testovacie dáta pochádzajúce z DBLP databázy a spôsob ich získania a transformácie do použiteľnej formy pre testovanie. Ďalej je popísaná metodika testovania, ktorá sa deli na dve časti. Prvá časť porovnáva výkon databázy pracujúcej v disku s databázou pracujúcou v pamäti. Pre tento účel bola využitá databáza SQLite a možnosť spustenia databázy v pamäti. Druhá časť testovania sa zaoberá porovnaním výkonu riadkového a stĺpcového usporiadania dát v databáze pracujúcej v pamäti. Na tento účel bola využitá databáza SAP HANA, ktorá umožňuje ukladať dáta v oboch usporiadaniach. Výsledkom práce je analýza výsledkov, ktoré boli získané pomocou týchto testov.This bachelor thesis deals with in-memory databases and concepts that were developed to create such systems. To lay the base ground for in-memory concepts, the thesis summarizes the development of the most used database systems. The data layouts like the column and the row layout are introduced together with the compression and storage techniques used to maintain persistence of the in-memory databases. The other parts contain the overview of the existing in-memory database systems and describe the benchmarks used to test the performance of the in-memory databases. At the end, the thesis analyses the results of benchmarks.

National Repository of Grey Literature

Modelação ágil para sistemas de Big Data Warehousing

Author: Nogueira Marta Susete Carvalho Batista
Publication venue
Publication date: 01/01/2019
Field of study

Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de InformaçãoOs Sistemas de Informação, com a popularização do conceito de Big Data começaram a considerar aspetos relativos às infraestruturas capazes de lidar com a recolha, armazenamento, processamento e análise de vastas quantidades de dados heterogéneos, como pouca estrutura (ou nenhuma) e gerados a velocidades cada vez maiores. Estes têm sido os desafios inerentes à transição do Data Modelling em Data Warehouses tradicionais para ambientes de Big Data. O estado-de-arte reflete que a área científica de Big Data Warehousing é recente, ambígua e apresenta lacunas relativas a abordagens para a conceção e implementação destes sistemas; deste modo, nos últimos anos, vários autores motivados pela ausência de trabalhos científicos e técnicos desenvolveram estudos na área com o intuito de explorar modelos adequados (representação de componentes lógicas e tecnológicas, data flows e estruturas de dados), métodos e instanciações (casos de demonstração recorrendo a protótipos e benchmarks). A presente dissertação está inserida no estudo da proposta geral dos padrões de design para sistemas de Big Data Warehousing (M. Y. Santos & Costa, 2019) e, posteriormente, é efetuada a proposta de um método, em vista a semiautomatização da proposta de design dos autores referidos, constituído por sete regras computacionais, apresentadas, demonstradas e validadas com exemplos baseados em contextos reais. De forma a apresentar o processo de modelação ágil, foi criado um fluxograma para cada regra, permitindo assim apresentar todos passos. Comparando os resultados dos exemplos obtidos após aplicação do método e dos resultantes de uma modelação totalmente manual, o trabalho proposto apresenta uma proposta de modelação geral, que funciona como uma sugestão de modelação de Big Data Warehouses para o utilizador que, posteriormente, deve validar e ajustar o resultado tendo em consideração o contexto do caso em análise, as queries que pretende utilizar e as características dos dados.Information Systems, with the popularization of Big Data, have started to consider the aspects related to infrastructures capable of dealing with collection, storage, processing and analysis of vast amounts of heterogeneous data, with little or no structure and produced at increasing speed. These have been the challenges inherent to the transition from Data Modelling into traditional Data Warehouses for Big Data environments. The state-of-the-art reflects that the scientific field of Big Data Warehousing is recent, ambiguous and that it shows a few gaps regarding the approaches to the design and implementation of these systems; thus, in the past few years, several authors, motivated by the lack of scientific and technical work, have developed some studies in this scientific area in order to explore appropriated models (representation of logical and technological components, data flows and data structures), methods and instantiations (demonstration cases using prototypes and benchmarks). This dissertation is inserted in the study of the general proposal of design standards for Big Data Warehousing systems. Late on, the proposed method is comprised of seven sequential rules which are thoroughly explained, demonstrated and validated with relevante exemples based on common real use-cases. For each rule, step-by-step flowchart is provider an agile modelling process. When compared a fully manual example, the proposed work offered a correct but genereal resulting model that works best as a first modelling effort that should then be validated by a use-case expert

Universidade do Minho: RepositoriUM

Implantation Not Only SQL des bases de données multidimensionnelles

Author: Chevalier Max
El Malki Mohammed
Kopliku Arlind
Teste Olivier
Tournier Ronan
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

International audienceLes systèmes NoSQL (Not Only SQL) se développent notamment grâce à leur capacité à gérer facilement de grands volumes de données, et leur flexibilité en terme de type de données. Dans cet article, nous étudions l'implantation d'un entrepôt de données multidimensionnelles avec un système NoSQL orienté documents. Nous proposons des règles de transformation qui permettent de passer d'un modèle conceptuel multidimensionnel vers un modèle logique NoSQL orienté documents. Nous proposons trois types de transformation pour implanter les entrepôts de données multidimensionnelles. Nous expérimentons ces trois approches avec le système MongoDB, et étudions le chargement des données, les processus de transformation d'un type d'implantation à un autre ainsi que le pré-calcul d'agrégats inhérents aux entrepôts de données multidimensionnelles

Scientific Publications of the University of Toulouse II Le Mirail