289 research outputs found

    Segment Oriented Compression Scheme for MOLAP Based on Extendible Multidimensional Arrays

    Get PDF
    Many statistical and MOLAP applications use multidimensional arrays as the basic data structure to allow the efficient and convenient storage and retrieval of large volumes of business data for decision making. Allocation of data or data compression is a key performance factor for this purpose because performance strongly depends on the amount of storage required and availability of memory. This holds especially for data warehousing environments in which huge amounts of data have to be dealt with. The most evident consequence of data compression is that it reduces storage cost by packing more logical data per unit of physical capacity. And improved performance is a net outcome because less physical data need to be retrieved during scan-oriented queries. In this paper, an efficient data compression technique is proposed based on the notion of extendible array. The main idea of the scheme is to compress each of the segments of the extendible array using the position information only. We compare the proposed scheme for different performance issues with prominent compression schemes.</p

    Incremental aggregation on MOLAP cube based on n-dimensional extendible karnaugh arrays

    Get PDF
    Data is increasing so rapidly that new data warehousing approaches are required to process and analyze data. Aggregation of data incrementally is needed to fast access of data and compute aggregation functions. Multidimensional arrays are generally used for this purpose. But some disadvantages such as address space requirement is large and processing time is comparatively slow in case of aggregation. For this purpose we use Extendible Karnaugh Array (EKA). EKA is an efficient scheme which has better performance than other data structures that we have tested in our research. In this research work we use EKA as basic structure for implementing incremental aggregation of data and evaluate its performance over other approaches. We use Multidimensional Online Analytical Processing (MOLAP) which stores data in optimized multi-dimensional array storage, rather than in a relational database. We create 4 and 6 dimensional MOLAP data cube using Traditional Multidimensional Array (TMA) and EKA scheme and compare incremental aggregation with Relational Online Analytical Processing (ROLAP). The effective outcome of EKA structure for incremental aggregation on 4 and 6 dimensional MOLAP structure is shown by some experimental results and efficiency is proved for n higher dimensions

    OpenSDM - An Open Sensor Data Management Tool

    Full text link
    Exchange of scientific data and metadata between single users or organizations is a challenging task due to differences in data formats, the genesis of data collection, ontologies and prior knowledge of the users. Different data storage requirements, mostly defined by the structure, size and access scenarios, require also different data storage solutions, since there is no and there cannot be a data format which is suitable for all tasks and needs that occur especially in a scientific workflow. Besides data, the generation and handling of additional corresponding metadata leads us to the additional challenge of defining the meaning of data, which should be formulated in a way that it can be commonly understood to get out a maximum of expected and shareable information of the observed processes. In our domain we are able to take advantage of standards defined by the Open Geospatial Consortium, namely the standards defined by the Sensor Web Enablement, WaterML and CF-NetCDF working groups. Even though these standards are freely available and some of them are commonly used in specialized software packages, the adaption in widespread end-user software solutions still seems to be in its beginnings. This contribution describes a software solution developed at Graz University of Technology, which targets the storage and exchange of measurement data with a special focus on meteorological, water quantity and water quality observation data collected within the last three decades. The solution was planned on basis of long-term experience in sewer monitoring and was built on top of open-source software only. It allows high-performance storage of time series and associated metadata, access-controlled web services for programmatic access, validation tasks, event detection, automated alerting and notification. An additional web-based graphical user interface was created which gives full control to end-users. The OpenSDM software approach makes it easier for measurement station operators, maintainers and end-users to take advantage of the standards of the Open Geospatial Consortium, which usage should be promoted in the water related communities

    A Data Structure for Spatio-Temporal Databases

    Get PDF
    The advantages and applications of spatial mechanisms are well documented; however, there are very few being designed. The principal hinderance to the design of spatial mechanisms is the great difficulty involved in specifying spatial problems and in interpreting spatial solutions. Similarly, the development of spatial codes to implement these techniques is held back by the lack of means to easily visualize and verify solutions, particularly in the realm of relational databases. If spatial mechanisms are to be successful, the designer must be able to synthesize, analyse and evaluate, as well as load and extract information, using a single code representing a spatial structure. This entails the implementation of spatial relationships involving spatial data structures. It is with this in mind that the Canadian Hydrographic Service database group embarked on the development of a new type of spatial database structure based on the quadtree concept

    On indexing highly dynamic multidimensional datasets for interactive analytics

    Get PDF
    Orientador : Prof. Dr. Luis Carlos Erpen de BonaTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 15/04/2016Inclui referências : f. 77-91Área de concentração : Ciência da computaçãoResumo: Indexação de dados multidimensionais tem sido extensivamente pesquisada nas últimas décadas. Neste trabalho, um novo workload OLAP identificado no Facebook é apresentado, caracterizado por (a) alta dinamicidade e dimensionalidade, (b) escala e (c) interatividade e simplicidade de consultas, inadequado para os SGBDs OLAP e técnicas de indexação de dados multidimensionais atuais. Baseado nesse caso de uso, uma nova estratégia de indexação e organização de dados multidimensionais para SGBDs em memória chamada Granular Partitioning é proposta. Essa técnica extende a visão tradicional de partitionamento em banco de dados, particionando por intervalo todas as dimensões do conjunto de dados e formando pequenos blocos que armazenam dados de forma não coordenada e esparsa. Desta forma, é possível atingir altas taxas de ingestão de dados sem manter estrutura auxiliar alguma de indexação. Este trabalho também descreve como um SGBD OLAP capaz de suportar um modelo de dados composto por cubos, dimensões e métricas, além de operações como roll-ups, drill-downs e slice and dice (filtros) eficientes pode ser construído com base nessa nova técnica de organização de dados. Com objetivo de validar experimentalmente a técnica apresentada, este trabalho apresenta o Cubrick, um novo SGBD OLAP em memória distribuída e otimizada para a execução de consultas analíticas baseado em Granular Partitioning, escritas desde a primeira linha de código para este trabalho. Finalmente, os resultados de uma avaliação experimental extensiva contendo conjuntos de dados e consultas coletadas de projetos pilotos que utilizam Cubrick é apresentada; em seguida, é mostrado que a escala desejada pode ser alcançada caso os dados sejam organizados de acordo com o Granular Partitioning e o projeto seja focado em simplicidade, ingerindo milhões de registros por segundo continuamente de uxos de dados em tempo real, e concorrentemente executando consultas com latência inferior a 1 segundo.Abstrct: Indexing multidimensional data has been an active focus of research in the last few decades. In this work, we present a new type of OLAP workload found at Facebook and characterized by (a) high dynamicity and dimensionality, (b) scale and (c) interactivity and simplicity of queries, that is unsuited for most current OLAP DBMSs and multidimensional indexing techniques. To address this use case, we propose a novel multidimensional data organization and indexing strategy for in-memory DBMSs called Granular Partitioning. This technique extends the traditional view of database partitioning by range partitioning every dimension of the dataset and organizing the data within small containers in an unordered and sparse fashion, in such a way to provide high ingestion rates and indexed access through every dimension without maintaining any auxiliary data structures. We also describe how an OLAP DBMS able to support a multidimensional data model composed of cubes, dimensions and metrics and operations such as roll-up, drill-down as well as efficient slice and dice filtering) can be built on top of this new data organization technique. In order to experimentally validate the described technique we present Cubrick, a new in-memory distributed OLAP DBMS for interactive analytics based on Granular Partitioning we have written from the ground up at Facebook. Finally, we present results from a thorough experimental evaluation that leveraged datasets and queries collected from a few pilot Cubrick deployments. We show that by properly organizing the dataset according to Granular Partitioning and focusing the design on simplicity, we are able to achieve the target scale and store tens of terabytes of in-memory data, continuously ingest millions of records per second from realtime data streams and still execute sub-second queries

    A parametric prototype for spatiotemporal databases

    Get PDF
    The main goal of this project is to design and implement the parametric database (ParaDB). Conceptually, ParaDB consists of the parametric data model (ParaDM) and the parametric structured query language (ParaSQL). Parametric data model is a data model for multi-dimensional databases such as temporal, spatial, spatiotemporal, or multi-level secure databases. Main difference compared to the classical relational data model is that ParaDM models an object as a single tuple, and an attribute is defined as a function from parametric elements. The set of parametric elements is closed under union, intersection, and complementation. These operations are counterparts of or, and, and not in a natural language like English. Therefore, the closure properties provide very flexible ways to query on objects without introducing additional self-join operations which are frequently required in other multi-dimensional database models

    A Survey on Spatial Indexing

    Get PDF
    Spatial information processing has been a centre of attention of research in the previous decade. In spatial databases, data related with spatial coordinates and extents are retrieved based on spatial proximity. A large number of spatial indexes have been proposed to make ease of efficient indexing of spatial objects in large databases and spatial data retrieval. The goal of this paper is to review the advance techniques of the access methods. This paper tries to classify the existing multidimensional access methods, according to the types of indexing, and their performance over spatial queries. K-d trees out performs quad tress without requiring additional memory usage

    An overview of decision table literature 1982-1995.

    Get PDF
    This report gives an overview of the literature on decision tables over the past 15 years. As much as possible, for each reference, an author supplied abstract, a number of keywords and a classification are provided. In some cases own comments are added. The purpose of these comments is to show where, how and why decision tables are used. The literature is classified according to application area, theoretical versus practical character, year of publication, country or origin (not necessarily country of publication) and the language of the document. After a description of the scope of the interview, classification results and the classification by topic are presented. The main body of the paper is the ordered list of publications with abstract, classification and comments.
    corecore