11,490 research outputs found

    Database Vs Data Warehouse

    Get PDF
    Data warehouse technology includes a set of concepts and methods that offer the users useful information for decision making. The necessity to build a data warehouse arises from the necessity to improve the quality of information in the organization. The date proceeding from different sources, having a variety of forms - both structured and unstructured, are filtered according to business rules and are integrated in a single large data collection. Using informatics solutions, managers have understood that data stored in operational systems - including databases, are an informational gold mine that must be exploited. Data warehouses have been developed to answer the increasing demands for complex analysis, which could not be properly achieved with operational databases. The present paper emphasizes some of the criteria that information application developers can use in order to choose between a database solution or a data warehouse one.data warehouse, database, database management systems, information systems, data organisation in externe memory, business intelligence

    Scalable Model-Based Management of Correlated Dimensional Time Series in ModelarDB+

    Full text link
    To monitor critical infrastructure, high quality sensors sampled at a high frequency are increasingly used. However, as they produce huge amounts of data, only simple aggregates are stored. This removes outliers and fluctuations that could indicate problems. As a remedy, we present a model-based approach for managing time series with dimensions that exploits correlation in and among time series. Specifically, we propose compressing groups of correlated time series using an extensible set of model types within a user-defined error bound (possibly zero). We name this new category of model-based compression methods for time series Multi-Model Group Compression (MMGC). We present the first MMGC method GOLEMM and extend model types to compress time series groups. We propose primitives for users to effectively define groups for differently sized data sets, and based on these, an automated grouping method using only the time series dimensions. We propose algorithms for executing simple and multi-dimensional aggregate queries on models. Last, we implement our methods in the Time Series Management System (TSMS) ModelarDB (ModelarDB+). Our evaluation shows that compared to widely used formats, ModelarDB+ provides up to 13.7 times faster ingestion due to high compression, 113 times better compression due to the adaptivity of GOLEMM, 630 times faster aggregates by using models, and close to linear scalability. It is also extensible and supports online query processing.Comment: 12 Pages, 28 Figures, and 1 Tabl

    ShenZhen transportation system (SZTS): a novel big data benchmark suite

    Get PDF
    Data analytics is at the core of the supply chain for both products and services in modern economies and societies. Big data workloads, however, are placing unprecedented demands on computing technologies, calling for a deep understanding and characterization of these emerging workloads. In this paper, we propose ShenZhen Transportation System (SZTS), a novel big data Hadoop benchmark suite comprised of real-life transportation analysis applications with real-life input data sets from Shenzhen in China. SZTS uniquely focuses on a specific and real-life application domain whereas other existing Hadoop benchmark suites, such as HiBench and CloudRank-D, consist of generic algorithms with synthetic inputs. We perform a cross-layer workload characterization at the microarchitecture level, the operating system (OS) level, and the job level, revealing unique characteristics of SZTS compared to existing Hadoop benchmarks as well as general-purpose multi-core PARSEC benchmarks. We also study the sensitivity of workload behavior with respect to input data size, and we propose a methodology for identifying representative input data sets

    Comprehensive characterization of an open source document search engine

    Get PDF
    This work performs a thorough characterization and analysis of the open source Lucene search library. The article describes in detail the architecture, functionality, and micro-architectural behavior of the search engine, and investigates prominent online document search research issues. In particular, we study how intra-server index partitioning affects the response time and throughput, explore the potential use of low power servers for document search, and examine the sources of performance degradation ands the causes of tail latencies. Some of our main conclusions are the following: (a) intra-server index partitioning can reduce tail latencies but with diminishing benefits as incoming query traffic increases, (b) low power servers given enough partitioning can provide same average and tail response times as conventional high performance servers, (c) index search is a CPU-intensive cache-friendly application, and (d) C-states are the main culprits for performance degradation in document search.Web of Science162art. no. 1

    An Open Source Based Data Warehouse Architecture to Support Decision Making in the Tourism Sector

    Get PDF
    In this paper an alternative Tourism oriented Data Warehousing architecture is proposed which makes use of the most recent free and open source technologies like Java, Postgresql and XML. Such architecture's aim will be to support the decision making process and giving an integrated view of the whole Tourism reality in an established context (local, regional, national, etc.) without requesting big investments for getting the necessary software.Tourism, Data warehousing architecture

    Mission scheduler for a rail guided vehicle system

    Get PDF
    A transport system with automatic guided vehicles AGVs, is a fully automatic system that provides logistics services in industrial environments such as warehouses and production plants. These systems have reached such a degree of maturity as to allow, in their daily use, the application of heuristic algorithms for the optimization of the various operations they perform. For instance, find the shortest paths between working stations and storage area, assign movements and strategic positions for idle vehicles, operate efficient and long-life battery management and more. A relevant interesting algorithm, presented and developed in this thesis, concerns the sorting of products in the shipping phase, which affects the scheduling tasks assigned to the autonomous vehicles. The scheduler has the aims of determining which operations have more strict constraints and more priority over others. Studies and practice have shown that the adoption of a valid scheduler implies considerable improvements in the system performance, consequently it is advisable to dedicate time and effort to the research for the right one. The following algorithms obtained a successful outcome and they have been implemented for the production of a modern automated warehouse located in the city of Cesena, Italy. The paper is divided into four chapters, with a further one dedicated to conclusions

    ARENA—augmented reality to enhanced experimentation in smart warehouses

    Get PDF
    The current industrial scenario demands advances that depend on expensive and sophisticated solutions. Augmented Reality (AR) can complement, with virtual elements, the real world. Faced with this features, an AR experience can meet the demand for prototype testing and new solutions, predicting problems and failures that may only exist in real situations. This work presents an environment for experimentation of advanced behaviors in smart factories, allowing experimentation with multi-robot systems (MRS), interconnected, cooperative, and interacting with virtual elements. The concept of ARENA introduces a novel approach to realistic and immersive experimentation in industrial environments, aiming to evaluate new technologies aligned with the Industry 4.0. The proposed method consists of a small-scale warehouse, inspired in a real scenario characterized in this paper, managing by a group of autonomous forklifts, fully interconnected, which are embodied by a swarm of tiny robots developed and prepared to operate in the small scale scenario. The AR is employed to enhance the capabilities of swarm robots, allowing box handling and virtual forklifts. Virtual laser range finders (LRF) are specially designed as segmentation of a global RGB-D camera, to improve robot perception, allowing obstacle avoidance and environment mapping. This infrastructure enables the evaluation of new strategies to improve manufacturing productivity, without compromising the production by automation faults.info:eu-repo/semantics/publishedVersio

    The use of alternative data models in data warehousing environments

    Get PDF
    Data Warehouses are increasing their data volume at an accelerated rate; high disk space consumption; slow query response time and complex database administration are common problems in these environments. The lack of a proper data model and an adequate architecture specifically targeted towards these environments are the root causes of these problems. Inefficient management of stored data includes duplicate values at column level and poor management of data sparsity which derives from a low data density, and affects the final size of Data Warehouses. It has been demonstrated that the Relational Model and Relational technology are not the best techniques for managing duplicates and data sparsity. The novelty of this research is to compare some data models considering their data density and their data sparsity management to optimise Data Warehouse environments. The Binary-Relational, the Associative/Triple Store and the Transrelational models have been investigated and based on the research results a novel Alternative Data Warehouse Reference architectural configuration has been defined. For the Transrelational model, no database implementation existed. Therefore it was necessary to develop an instantiation of it’s storage mechanism, and as far as could be determined this is the first public domain instantiation available of the storage mechanism for the Transrelational model
    corecore