245 research outputs found

    Growth of relational model: Interdependence and complementary to big data

    Get PDF
    A database management system is a constant application of science that provides a platform for the creation, movement, and use of voluminous data. The area has witnessed a series of developments and technological advancements from its conventional structured database to the recent buzzword, bigdata. This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability. Specifically, the objective of this paper is to highlight the adoption of relational model approaches by bigdata techniques. Towards addressing the reason for this in corporation, this paper qualitatively studied the advancements done over a while on the relational data model. First, the variations in the data storage layout are illustrated based on the needs of the application. Second, quick data retrieval techniques like indexing, query processing and concurrency control methods are revealed. The paper provides vital insights to appraise the efficiency of the structured database in the unstructured environment, particularly when both consistency and scalability become an issue in the working of the hybrid transactional and analytical database management system

    TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark

    Get PDF
    The TPC-D benchmark was developed almost 20 years ago, and even though its current existence as TPC H could be considered superseded by TPC-DS, one can still learn from it. We focus on the technical level, summarizing the challenges posed by the TPC-H workload as we now understand them, which w

    SAP HANA Platform

    Get PDF
    Tato práce pojednává o databázi pracující v paměti nazývané SAP HANA. Detailně popisuje architekturu a nové technologie, které tato databáze využívá. V další části se zabývá porovnáním rychlosti provedení vkládání a vybírání záznamů z databáze se stávající používanou relační databází MaxDB. Pro účely tohoto testování jsem vytvořil jednoduchou aplikaci v jazyce ABAP, která umožňuje testy provádět a zobrazuje jejich výsledky. Ty jsou shrnuty v poslední kapitole a ukazují SAP HANA jako jednoznačně rychlejší ve vybírání dat, avšak srovnatelnou, či pomalejší při vkládání dat do databáze. Přínos mé práce vidím v shrnutí podstatných změn, které s sebou data uložená v paměti přináší a názorné srovnání rychlosti provedení základních typů dotazů.This thesis discusses the in-memory database called SAP HANA. It describes in detail the architecture and new technologies used in this type of database. The next section presents a comparison of speed of the inserting and selecting data from the database with existing relational database MaxDB. For the purposes of this testing I created a simple application in ABAP language, which allows user to perform and display their results. These are summarized in the last chapter and demonstrate SAP HANA as clearly faster during selection of data, but comparable, or slower when inserting data into the database. I see contribution of my work in the summary of significant changes that come with data stored in the main memory and brings comparison of speed of basic types of queries.

    Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines

    Get PDF
    Increasing single instruction multiple data (SIMD) capabilities in modern hardware allows for the compilation of data-parallel query pipelines. This means GPU-alike challenges arise: control flow divergence causes the underutilization of vector-processing units. In this paper, we present efficient algorithms for the AVX-512 architecture to address this issue. These algorithms allow for the fine-grained assignment of new tuples to idle SIMD lanes. Furthermore, we present strategies for their integration with compiled query pipelines so that tuples are never evicted from registers. We evaluate our approach with three query types: (i) a table scan query based on TPC-H Query 1, that performs up to 34% faster when addressing underutilization, (ii) a hashjoin query, where we observe up to 25% higher performance, and (iii) an approximate geospatial join query, which shows performance improvements of up to 30%

    Study of SAP Hana in the in-memory context

    Get PDF
    Nos dias de hoje, os sistemas de informação tem um papel crucial na gestão de qualquerempresa, sendo um dos factor de vantagem competitiva, através da sua capacidade de processare analisar grandes quantidades de dados. Eles podem ser usados nas mais variadas situações,tais como controlar stocks, análise de vendas, lucros, prejuízos, balanços finais de período, entremuitos outros factores. Algumas destas empresas têm um grande número de recursos humanos,milhares de transacções por dia, vários estabelecimentos que precisam comunicar uns com osoutros, levando a que os acessos à base de dados sejam em grande quantidade e cada vez maiscomplexos, o que origina tempos de resposta cada vez maiores. Para resolver estes problemas, aSAP, a gigante alemã do ramo de software, criou o Hana, uma plataforma que é baseada numabase de dados que corre 100% em memória, o que vem permitir reduzir alguns destes temposaté cem mil vezes.Apesar de ser do conhecimento geral que esta tecnologia poder ser vantajosa para asempresas, ainda não está bem claro quais são estas vantagens e como poderão ser valiosas, jáque velocidade por si só pode não trazer melhoramentos significativos. No curso destadissertação, o objectivo é obter uma análise sobre o valor concreto que a tecnologia em poderá trazer para uma empresa, que processos poderão ser e o que poderá ser feito com esta tecnologia que é possivel fazer actualmente com as bases de dados em memória persistente.De um ponto de vista mais prático, uma análise a dados reais das exportações de todo o mundo, com um maior foco em tabelas de profundidade. Uma análise ao tempo de execução de queries analiticas será a forma de medir esta performance.Na parte de in-memory será usado o Hana Cloud Platform, de forma a utilizar toda a capacidade do in-memory. No final, o objectivo será extrai conclusões gerais e com certonivel de abastracção, que possa ser usado por outras empresas.Para finalizar, em vez de nesta dissertação ser feita uma análise profunda àsespecificidades técnicas do Hana, irá ser analisar como estas especificidades podem trazer valorpara o negócio. Dado que investir numa tecologia deste tipo involve sempre um investimentoconsiderável, as empresas irão necessitar de uma análise minuciosa das vantagens e desvantagens, os pros e os cons. do ponto de vista técnico. Apesar de uma estratégia prudente, de ir melhorando o sistemapouco a pouco em vez de alterações radicais, ser quase sempre a melhor opção, neste casoestamos perante uma tecnologia disruptiva, neste caso eu acredito que quanto mais rápida foradoptada, as vantagens serão ainda maiores do que se for apenas uma estratégia a longo prazo.Information Systems have a crucial part on the management of any company, being one ofthe factors that enhance the competitive edge, through thoughtful analysis and fast process ofinformation. IS can be used for example, for stock control and overview, account receivables,profits and losses, balance sheets, among other factors. Some of these companies have hugeHuman Resources department, millions of transactions per day, several departments who needto communicate with each other, and access to databases are made almost every second, wheresome of them are complex queries that can take hours to materialize. To answer to some ofthese problems, SAP, the German software company, created Hana, a software platform, basedon a new database based 100% on memory, allowing an improvement on access's performancesup to one hundred times faster.In spite of everyone in the area having knowledge that this technology can bring severaladvantages to organizations, it's not yet clear what advantages are and how they can leverage abusiness's performance, because speed may not be enough to justify an investment in such anexpensive technology. In the course of this dissertation, the objective is to answer what is theconcrete business value to an organization, what can be improved and why an investment in thein-memory technology can be asset, over the old databases on persistent memory.In a more practical view, there will be made an analysis on data from exportations of every country since 1963 (about 103M records).. Themain subject of analysis will be analysis of performance time both in Hana as in R/3, and see the difference in deep tables . In the end, theobjective is to extract some general conclusions from this analysis, which could be transferredto other areas, but of course, with a certain level of abstraction.To sum up, the point of this dissertation is how the technical specifications of Hana canleverage the business value of this technology. Since investing in a new platform like this onealways involve a considerable investment, organizations need to have thoughtful analysisvantages and disadvantages, the pros and cons., from a technical point of view. In spite small improvements instead of bigchanges is a prudent step, in this case I believe the technology is so disruptive that the faster theorganizations adopt, the advantages will be a crucial factor to gain advantage over competition

    Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System

    Get PDF
    Linear algebra operations appear in nearly every application in advanced analytics, machine learning, and of various science domains. Until today, many data analysts and scientists tend to use statistics software packages or hand-crafted solutions for their analysis. In the era of data deluge, however, the external statistics packages and custom analysis programs that often run on single-workstations are incapable to keep up with the vast increase in data volume and size. In particular, there is an increasing demand of scientists for large scale data manipulation, orchestration, and advanced data management capabilities. These are among the key features of a mature relational database management system (DBMS). With the rise of main memory database systems, it now has become feasible to also consider applications that built up on linear algebra. This thesis presents a deep integration of linear algebra functionality into an in-memory column-oriented database system. In particular, this work shows that it has become feasible to execute linear algebra queries on large data sets directly in a DBMS-integrated engine (LAPEG), without the need of transferring data and being restricted by hard disc latencies. From various application examples that are cited in this work, we deduce a number of requirements that are relevant for a database system that includes linear algebra functionality. Beside the deep integration of matrices and numerical algorithms, these include optimization of expressions, transparent matrix handling, scalability and data-parallelism, and data manipulation capabilities. These requirements are addressed by our linear algebra engine. In particular, the core contributions of this thesis are: firstly, we show that the columnar storage layer of an in-memory DBMS yields an easy adoption of efficient sparse matrix data types and algorithms. Furthermore, we show that the execution of linear algebra expressions significantly benefits from different techniques that are inspired from database technology. In a novel way, we implemented several of these optimization strategies in LAPEG’s optimizer (SpMachO), which uses an advanced density estimation method (SpProdest) to predict the matrix density of intermediate results. Moreover, we present an adaptive matrix data type AT Matrix to obviate the need of scientists for selecting appropriate matrix representations. The tiled substructure of AT Matrix is exploited by our matrix multiplication to saturate the different sockets of a multicore main-memory platform, reaching up to a speed-up of 6x compared to alternative approaches. Finally, a major part of this thesis is devoted to the topic of data manipulation; where we propose a matrix manipulation API and present different mutable matrix types to enable fast insertions and deletes. We finally conclude that our linear algebra engine is well-suited to process dynamic, large matrix workloads in an optimized way. In particular, the DBMS-integrated LAPEG is filling the linear algebra gap, and makes columnar in-memory DBMS attractive as efficient, scalable ad-hoc analysis platform for scientists
    corecore