6 research outputs found

    Time series database in Industrial IoT and its testing tool

    Get PDF
    Abstract. In the essence of the Industrial Internet of Things is data gathering. Data is time and event-based and hence time series data is key concept in the Industrial Internet of Things, and specific time series database is required to process and store the data. Solution development and choosing the right time series database for Industrial Internet of Things solution can be difficult. Inefficient comparison of time series databases can lead to wrong choices and consequently to delays and financial losses. This thesis is improving the tools to compare different time series databases in context of the Industrial Internet of Things. In addition, the thesis identifies the functional and non-functional requirements of time series database in Industrial Internet of Things and designs and implements a performance test bench. A practical example of how time series databases can be compared with identified requirements and developed test bench is also provided. The example is used to examine how selected time series databases fulfill these requirements. Eight functional requirements and eight non-functional requirements were identified. Functional requirements included, e.g., aggregation support, information models, and hierarchical configurations. Non-functional requirements included, e.g., scalability, performance, and lifecycle. Developed test bench took Industrial Internet of Things point of view by testing the database in three scenarios: write heavy, read heavy, and concurrent write and read operations. In the practical example, ABB’s cpmPlus History, InfluxDB, and TimescaleDB were evaluated. Both requirement evaluation and performance testing resulted that cpmPlus History performed best, InfluxDB second best, and TimescaleDB the worst. cpmPlus History showed extensive support for the requirements and best performance in all performance test cases. InfluxDB showed high performance for data writing while TimescaleDB showed better performance for data reading.Aikasarjatietokanta teollisuuden esineiden internetissä ja sen testipenkki. Tiivistelmä. Teollisuuden esineiden internetin ytimessä on tiedon keruu. Tieto on aika ja tapahtuma pohjaista ja sen vuoksi aikasarjatieto on teollisuuden esineiden internetin avainkäsitteitä. Prosessoidakseen tällaista tietoa tarvitaan erityinen aikasarjatietokanta. Sovelluskehitys ja oikean aikasarjatietokannan valitseminen teollisuuden esineiden internetin ratkaisuun voi olla vaikeaa. Tehoton aikasarjatietokantojen vertailu voi johtaa vääriin valintoihin ja siten viiveisiin sekä taloudellisiin tappioihin. Tässä diplomityössä kehitetään työkaluja, joilla eri aikasarjatietokantoja teollisuuden esineiden internetin ympäristössä voidaan vertailla. Diplomityössä tunnistetaan toiminnalliset ja ei-toiminnalliset vaatimukset aikasarjatietokannalle teollisuuden esineiden internetissä ja suunnitellaan ja toteutetaan suorituskykytestipenkki aikasarjatietokannoille. Työ tarjoaa myös käytännön esimerkin kuinka aikasarjatietokantoja voidaan vertailla tunnistetuilla vaatimuksilla ja kehitetyllä testipenkillä. Esimerkkiä hyödynnetään tutkimuksessa, jossa selvitetään kuinka nykyiset aikasarjatietokannat täyttävät tunnistetut vaatimukset. Diplomityössä tunnistettiin kahdeksan toiminnallista ja kahdeksan ei-toiminnallista vaatimusta. Toiminnallisiin vaatimuksiin sisältyi mm. aggregoinnin tukeminen, informaatiomallit ja hierarkkiset konfiguraatiot. Ei-toiminnallisiin vaatimuksiin sisältyi mm. skaalautuvuus, suorituskyky ja elinkaari. Kehitetty testipenkki otti teollisuuden esineiden internetin näkökulman kolmella eri testiskenaariolla: kirjoituspainoitteinen, lukemispainoitteinen ja yhtäaikaiset kirjoitus- ja lukemisoperaatiot. Käytännön esimerkissä ABB:n cpmPlus History, InfluxDB ja TimescaleDB tietokannat olivat arvioitavina. Sekä vaatimusten arviointi että suorituskykytestit osoittivat cpmPlus History:n suoriutuvan parhaiten, InfluxDB:n toiseksi parhaiten ja TimescaleDB:n huonoiten. cpmPlus History tuki tunnistettuja vaatimuksia laajimmin ja tarjosi parhaan suorituskyvyn kaikissa testiskenaarioissa. InfluxDB antoi hyvän suorituskyvyn tiedon kirjoittamiselle, kun vastaavasti TimescaleDB osoitti parempaa suorituskykyä tiedon lukemisessa

    Análise de dados científicos baseada em algoritmos de indexação bitmap

    Get PDF
    Computer simulations in large-scale often consume and produce a large volume of raw data files, which can be presented in different formats. Users usually need to analyze domain-specific data based on data elements related through multiple files generated along the computer simulation execution. Different existing solutions, like FastBit and NoDB, intend to support this analysis by indexing raw data in order to allow direct access to specific elements in raw data files regions of interest. However, those solutions are limited to analyze a single raw data file at once, while they are used only after computer simulation execution. The ARMFUL architecture proposes a solution capable of guarantee dataflow management, record related raw data elements in a provenance database and combine techniques of raw data file analysis at runtime. Through a data model that supports integration between computer simulation execution data and domain data, the architecture allows for queries on data elements related by multiple files. This dissertation proposes the implementation of instances of raw data indexing and query processor components presented by ARMFUL architecture, aiming to reduce the elapsed time of data ingestion in the provenance database and support raw data exploratory analysis.As simulações computacionais de larga escala usualmente consomem e produzem grandes volumes de arquivos de dados científicos, os quais podem apresentar diferentes formatos. Os usuários, por sua vez, comumente necessitam analisar dados específicos de domínio baseados em elementos de dados relacionados por meio de múltiplos arquivos gerados ao longo da execução de simulações computacionais. Diferentes soluções existentes, como o FastBit e o NoDB, buscam apoiar esta análise por meio da indexação de dados científicos de forma a permitir o acesso direto a elementos específicos de regiões de interesse em arquivos de dados científicos. Entretanto, tais soluções são limitadas a analisar um único arquivo de dados científicos por vez, ao passo que são utilizadas apenas após a execução de simulações computacionais. A arquitetura ARMFUL propõe uma solução capaz de garantir a gerência do fluxo de dados, registrar elementos de dados científicos relacionados em uma base de proveniência e combinar técnicas de análise de arquivos de dados científicos em tempo de execução. A partir de um modelo de dados que apoia a integração de dados de execução da simulação computacional e dados de domínio, a arquitetura permite consultas a elementos de dados relacionados por múltiplos arquivos. Esta dissertação propõe a implementação de instâncias dos componentes de indexação de dados científicos e de processamento de consultas presentes na arquitetura ARMFUL, buscando reduzir o tempo total de ingestão de dados na base de proveniência e apoiar a análise exploratória de dados científicos

    Database system support of simulation data

    No full text
    International audienceSupported by increasingly efficient HPC infrastructure , numerical simulations are rapidly expanding to fields such as oil and gas, medicine and meteorology. As simulations become more precise and cover longer periods of time, they may produce files with terabytes of data that need to be efficiently analyzed. In this paper, we investigate techniques for managing such data using an array DBMS. We take advantage of multidimensional arrays that nicely models the dimensions and variables used in numerical simulations. However , a naive approach to map simulation data files may lead to sparse arrays, impacting query response time, in particular, when the simulation uses irregular meshes to model its physical domain. We propose efficient techniques to map coordinate values in numerical simulations to evenly distributed cells in array chunks with the use of equi-depth his-tograms and space-filling curves. We implemented our techniques in SciDB and, through experiments over real-world data, compared them with two other approaches: row-store and column-store DBMS. The results indicate that multidi-mensional arrays and column-stores are much faster than a traditional row-store system for queries over a larger amount of simulation data. They also help identifying the scenarios where array DBMSs are most efficient, and those where they are outperformed by column-stores

    Database System Support of Simulation Data

    No full text
    International audienceSupported by increasingly efficient HPC infrastructure , numerical simulations are rapidly expanding to fields such as oil and gas, medicine and meteorology. As simulations become more precise and cover longer periods of time, they may produce files with terabytes of data that need to be efficiently analyzed. In this paper, we investigate techniques for managing such data using an array DBMS. We take advantage of multidimensional arrays that nicely models the dimensions and variables used in numerical simulations. However , a naive approach to map simulation data files may lead to sparse arrays, impacting query response time, in particular, when the simulation uses irregular meshes to model its physical domain. We propose efficient techniques to map coordinate values in numerical simulations to evenly distributed cells in array chunks with the use of equi-depth his-tograms and space-filling curves. We implemented our techniques in SciDB and, through experiments over real-world data, compared them with two other approaches: row-store and column-store DBMS. The results indicate that multidi-mensional arrays and column-stores are much faster than a traditional row-store system for queries over a larger amount of simulation data. They also help identifying the scenarios where array DBMSs are most efficient, and those where they are outperformed by column-stores
    corecore