108 research outputs found

    Cracking KD-Tree: The first multidimensional adaptive indexing

    Get PDF
    Workload-aware physical data access structures are crucial to achieve short response time with (exploratory) data analysis tasks as commonly required for Big Data and Data Science applications. Recently proposed techniques such as automatic index advisers (for a priori known static workloads) and query-driven adaptive incremental indexing (for a priori unknown dynamic workloads) form the state-of-the-art to build single-dimensional indexes for single-attribute query predicates. However, similar techniques for more demanding multi-attribute query predicates, which are vital for any data analysis task, have not been proposed, yet. In this paper, we present our on-going work on a new set of workload-adaptive indexing techniques that focus on creating multidimensional indexes. We present our proof-of-concept, the Cracking KD-Tree, an adaptive indexing approach that generates a KD-Tree based on multidimensional range query predicates. It works by incrementally creating partial multidimensional indexes as a by-product of query processing. The indexes are produced only on those parts of the data that are accessed, and their creation cost is effectively distributed across a stream of queries. Experimental results show that the Cracking KD-Tree is three times faster than creating a full KD-Tree, one order of magnitude faster than executing full scans and two orders of magnitude faster than using uni-dimensional full or adaptive indexes on multiple columns

    COVID-19 Data Warehouse: A Systematic Literature Review

    Get PDF
    The coronavirus disease (COVID-19) affects the whole world and led clinicians to use the available knowledge to diagnose or predict the infection. Data Warehouse is one of the most crucial tools that may enhance decision-making (DW).In this paper, three main questions will be investigated according to using DW in the COVID-19 pandemic. The effect of using DW in the field of diagnosing and prediction will be investigated, besides, the most used architecture of DW will be explored. The sectors that faced a lot of researchers' attention such as diagnosing, predicting, and finding the correlations among features will be examined. The selected studies are explored where the papers that have been published between 2019-2022 in the digital libraries (ACM, IEEE, Springer, Science Direct, and Elsevier) in the field of DW that handle the COVID-19 are selected. During the research, many limitations have been detected, while some future works are presented. Enterprise DW is the most used architecture for COVID-19 DW while finding correlation among features and prediction are the sectors that had taken the researchers' attentio

    Mineração de dados em sistemas OLAP

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaAs diversas vantagens que os data warehouses têm proporcionado no que toca ao armazenamento e processamento de informação levaram a uma subida substancial na aquisição deste tipo de estruturas por parte das organizações. De facto, os data warehouses são caracterizados por um modelo de dados que permite, entre várias opções, realizar pesquisas complexas, selecionar conjuntos de dados de maior interesse, executar operações de sintetização, fazer comparações de dados e proporcionar diferentes visualizações dos dados. No entanto, a sua complexidade acarreta diversos custos, nomeadamente custos de computação e de materialização. Por um lado, a pré-computação de um cubo a partir de um data warehouse proporciona tempos de resposta reduzidos às pesquisas realizadas, mas, por outro lado, isso causa problemas no que toca à quantidade de espaço de armazenamento necessário. As técnicas de mineração de dados, nomeadamente aquelas que consideram os algoritmos de mineração de regras de associação, permitem encontrar conjuntos de itens frequentes entre os dados, permitindo, consequentemente, definir um conjunto de preferências de exploração ou de utilização. O estudo de preferências OLAP apresentado nesta dissertação visa identificar os dados mais acedidos por parte dos utilizadores, de forma a ser possível chegar a um consenso sobre quais as partes de um cubo que não são necessárias materializar, uma vez que não são utilizadas em processos de análise, mantendo tempos de resposta das pesquisas aceitáveis e reduzindo significativamente a quantidade de memória utilizada.The many benefits provided by data warehouses, in particular regarding to storage and data processing, have led to a substantial growth of the data warehousing market and in the number of organizations who adopted these systems. In fact, the data model of this type of structures allows the user to perform a large number of different operations: complex queries, find the most interesting information, aggregate and compare different values, and to provide an interactive data visualization. However, its complexity brings some computation and materialization costs. The pre-computation of the all data cube can provide a precise and fast response to analytical queries, but it requires an enormous quantity of space to storage all materialized views. The application of data mining techniques, such as algorithms for mining association rules, allows the discovery of frequent items among data and, consequently, the definition of OLAP preferences. The study of OLAP preferences presented in this dissertation aims to identify the most accessed parts in a data cube and to define which parts should be materialized. With the identification and materialization only of the important parts for the analysis, it is possible to preserve a satisfactory query response time, achieving a significant reduction of memory costs

    Automatic physical database design : recommending materialized views

    Get PDF
    This work discusses physical database design while focusing on the problem of selecting materialized views for improving the performance of a database system. We first address the satisfiability and implication problems for mixed arithmetic constraints. The results are used to support the construction of a search space for view selection problems. We proposed an approach for constructing a search space based on identifying maximum commonalities among queries and on rewriting queries using views. These commonalities are used to define candidate views for materialization from which an optimal or near-optimal set can be chosen as a solution to the view selection problem. Using a search space constructed this way, we address a specific instance of the view selection problem that aims at minimizing the view maintenance cost of multiple materialized views using multi-query optimization techniques. Further, we study this same problem in the context of a commercial database management system in the presence of memory and time restrictions. We also suggest a heuristic approach for maintaining the views while guaranteeing that the restrictions are satisfied. Finally, we consider a dynamic version of the view selection problem where the workload is a sequence of query and update statements. In this case, the views can be created (materialized) and dropped during the execution of the workload. We have implemented our approaches to the dynamic view selection problem and performed extensive experimental testing. Our experiments show that our approaches perform in most cases better than previous ones in terms of effectiveness and efficiency

    Report on the 6th ADBIS’2002 conference

    Get PDF
    The 6th East European Conference ADBIS 2002 was held on September~8--11, 2002 in Bratislava, Slovakia. It was organised by the Slovak University of Technology (and, in particular, its Faculty of Electrical Engineering and Information Technology) in Bratislava in co-operation with the ACM SIGMOD, the Moscow ACM SIGMOD Chapter, and Slovak Society for Computer Science. The call for papers attracted 115 submissions from 35~countries. The international program committee, consisting of 43 researchers from 21 countries, selected 25 full papers and 4 short papers for a monograph volume published by the Springer Verlag. Beside those 29 regular papers, the volume includes also 3 invited papers presented at the Conference as invited lectures. Additionally, 20 papers have been selected for the Research communications volume. The authors of accepted papers come from 22~countries of 4 continents, indicating the truly international recognition of the ADBIS conference series. The conference had 104 registered participants from 22~countries and included invited lectures, tutorials, and regular sessions. This report describes the goals of the conference and summarizes the issues discussed during the sessions

    Flexible Integration and Efficient Analysis of Multidimensional Datasets from the Web

    Get PDF
    If numeric data from the Web are brought together, natural scientists can compare climate measurements with estimations, financial analysts can evaluate companies based on balance sheets and daily stock market values, and citizens can explore the GDP per capita from several data sources. However, heterogeneities and size of data remain a problem. This work presents methods to query a uniform view - the Global Cube - of available datasets from the Web and builds on Linked Data query approaches

    RETAIL DATA ANALYTICS USING GRAPH DATABASE

    Get PDF
    Big data is an area focused on storing, processing and visualizing huge amount of data. Today data is growing faster than ever before. We need to find the right tools and applications and build an environment that can help us to obtain valuable insights from the data. Retail is one of the domains that collects huge amount of transaction data everyday. Retailers need to understand their customer’s purchasing pattern and behavior in order to take better business decisions. Market basket analysis is a field in data mining, that is focused on discovering patterns in retail’s transaction data. Our goal is to find tools and applications that can be used by retailers to quickly understand their data and take better business decisions. Due to the amount and complexity of data, it is not possible to do such activities manually. We witness that trends change very quickly and retailers want to be quick in adapting the change and taking actions. This needs automation of processes and using algorithms that are efficient and fast. In our work, we mine transaction data by modeling the data as graphs. We use clustering algorithms to discover communities (clusters) in the data and then use the clusters for building a recommendation system that can recommend products to customers based on their buying behavior

    Flexible Integration and Efficient Analysis of Multidimensional Datasets from the Web

    Get PDF
    If numeric data from the Web are brought together, natural scientists can compare climate measurements with estimations, financial analysts can evaluate companies based on balance sheets and daily stock market values, and citizens can explore the GDP per capita from several data sources. However, heterogeneities and size of data remain a problem. This work presents methods to query a uniform view - the Global Cube - of available datasets from the Web and builds on Linked Data query approaches
    corecore