416 research outputs found

    Efficient Incremental View Maintenance for Data Warehousing

    Get PDF
    Data warehousing and on-line analytical processing (OLAP) are essential elements for decision support applications. Since most OLAP queries are complex and are often executed over huge volumes of data, the solution in practice is to employ materialized views to improve query performance. One important issue for utilizing materialized views is to maintain the view consistency upon source changes. However, most prior work focused on simple SQL views with distributive aggregate functions, such as SUM and COUNT. This dissertation proposes to consider broader types of views than previous work. First, we study views with complex aggregate functions such as variance and regression. Such statistical functions are of great importance in practice. We propose a workarea function model and design a generic framework to tackle incremental view maintenance and answering queries using views for such functions. We have implemented this approach in a prototype system of IBM DB2. An extensive performance study shows significant performance gains by our techniques. Second, we consider materialized views with PIVOT and UNPIVOT operators. Such operators are widely used for OLAP applications and for querying sparse datasets. We demonstrate that the efficient maintenance of views with PIVOT and UNPIVOT operators requires more generalized operators, called GPIVOT and GUNPIVOT. We formally define and prove the query rewriting rules and propagation rules for such operators. We also design a novel view maintenance framework for applying these rules to obtain an efficient maintenance plan. Extensive performance evaluations reveal the effectiveness of our techniques. Third, materialized views are often integrated from multiple data sources. Due to source autonomicity and dynamicity, concurrency may occur during view maintenance. We propose a generic concurrency control framework to solve such maintenance anomalies. This solution extends previous work in that it solves the anomalies under both source data and schema changes and thus achieves full source autonomicity. We have implemented this technique in a data warehouse prototype developed at WPI. The extensive performance study shows that our techniques put little extra overhead on existing concurrent data update processing techniques while allowing for this new functionality

    A Design of Intelligent Pre-fetching Materialized Views Mechanism for Enhancing Summary Queries on Data Warehouses

    Get PDF
    To build up a materialized view that perfectly satisfies the need of the specific enterprise it serves is now the biggest challenge especially when it comes to larger and larger scale enterprises as well as more and more complicated and yet necessary socio-economical information. In this paper, we shall develop an Intelligent Materialized VIews Pre-fetching mechanism, also known as an IMVIP, from the characteristics of affinity grouping so as to enhance the efficiency of summary data warehouse querying. The IMVIP mechanism consists of the following two methods: the Apriori-Model association method and the Linear Structure Relation. The Apriori-Model association method explores and deduces the combination of the relations among individual user session. It is especially suitable for applications where the combinations of the relations are to be explored among multi-objective queries made by more than one decision maker. On the other hand, the Linear Structure Relation Model develops a set of principles as to the explorations into the deduced relation combination above with an aim to constructing a series of causal-effect association rules. Thus, we can not only pre-fetch and materialize views that really satisfy the needs of the decision makers so as to enhance the efficiency of summary data warehouse queries but also build up intelligent query paths according to the cause-and-effect association rules in order to attain the goal of providing helpful suggestions for decision-making

    View maintenance and change notification for application program views

    Full text link

    Data Quality Management: Trade-offs in Data Characteristics to Maintain Data Quality

    Get PDF
    We are living in an age of information in which organizations are crumbling under the pressure of exponentially growing data. Increased data quality ensures better decision making, thereby enabling companies to stay competitive in the market. To improve data quality, it is imperative to identify all the characteristics that describe data. And, building on one characteristic results in compromising another, creating a trade-off. There are many well established and interesting theories regarding data quality and data characteristics. However, we found that there is a lack of research and literature regarding how trade-offs are handled between the different types of data that is stored by an organization. To understand how organisations deal with trade-offs, we chose a framework formulated by Eppler, where various data characteristics trade-offs are discussed. After a pre-study with experts in this field, we narrowed it down to three main data characteristic trade-offs and these were further analysed through interviews. Based on the interviews conducted and the literature review, we could prioritize data types under different data characteristics. This research gives insight to how data characteristics trade-offs should be accomplished in organizations

    Automatic physical database design : recommending materialized views

    Get PDF
    This work discusses physical database design while focusing on the problem of selecting materialized views for improving the performance of a database system. We first address the satisfiability and implication problems for mixed arithmetic constraints. The results are used to support the construction of a search space for view selection problems. We proposed an approach for constructing a search space based on identifying maximum commonalities among queries and on rewriting queries using views. These commonalities are used to define candidate views for materialization from which an optimal or near-optimal set can be chosen as a solution to the view selection problem. Using a search space constructed this way, we address a specific instance of the view selection problem that aims at minimizing the view maintenance cost of multiple materialized views using multi-query optimization techniques. Further, we study this same problem in the context of a commercial database management system in the presence of memory and time restrictions. We also suggest a heuristic approach for maintaining the views while guaranteeing that the restrictions are satisfied. Finally, we consider a dynamic version of the view selection problem where the workload is a sequence of query and update statements. In this case, the views can be created (materialized) and dropped during the execution of the workload. We have implemented our approaches to the dynamic view selection problem and performed extensive experimental testing. Our experiments show that our approaches perform in most cases better than previous ones in terms of effectiveness and efficiency

    Incremental View Maintenance For Collection Programming

    Get PDF
    In the context of incremental view maintenance (IVM), delta query derivation is an essential technique for speeding up the processing of large, dynamic datasets. The goal is to generate delta queries that, given a small change in the input, can update the materialized view more efficiently than via recomputation. In this work we propose the first solution for the efficient incrementalization of positive nested relational calculus (NRC+) on bags (with integer multiplicities). More precisely, we model the cost of NRC+ operators and classify queries as efficiently incrementalizable if their delta has a strictly lower cost than full re-evaluation. Then, we identify IncNRC+; a large fragment of NRC+ that is efficiently incrementalizable and we provide a semantics-preserving translation that takes any NRC+ query to a collection of IncNRC+ queries. Furthermore, we prove that incremental maintenance for NRC+ is within the complexity class NC0 and we showcase how recursive IVM, a technique that has provided significant speedups over traditional IVM in the case of flat queries [25], can also be applied to IncNRC+.Comment: 24 pages (12 pages plus appendix

    The impact of microservices: an empirical analysis of the emerging software architecture

    Get PDF
    Dissertação de mestrado em Informatics EngineeringThe applications’ development paradigm has faced changes in recent years, with modern development being characterized by the need to continuously deliver new software iterations. With great affinity with those principles, microservices is a software architecture which features characteristics that potentially promote multiple quality attributes often required by modern, large-scale applications. Its recent growth in popularity and acceptance in the industry made this architectural style often described as a form of modernizing applications that allegedly solves all the traditional monolithic applications’ inconveniences. However, there are multiple worth mentioning costs associated with its adoption, which seem to be very vaguely described in existing empirical research, being often summarized as "the complexity of a distributed system". The adoption of microservices provides the agility to achieve its promised benefits, but to actually reach them, several key implementation principles have to be honored. Given that it is still a fairly recent approach to developing applications, the lack of established principles and knowledge from development teams results in the misjudgment of both costs and values of this architectural style. The outcome is often implementations that conflict with its promised benefits. In order to implement a microservices-based architecture that achieves its alleged benefits, there are multiple patterns and methodologies involved that add a considerable amount of complexity. To evaluate its impact in a concrete and empirical way, one same e-commerce platform was developed from scratch following a monolithic architectural style and two architectural patterns based on microservices, featuring distinct inter-service communication and data management mechanisms. The effort involved in dealing with eventual consistency, maintaining a communication infrastructure, and managing data in a distributed way portrayed significant overheads not existent in the development of traditional applications. Nonetheless, migrating from a monolithic architecture to a microservicesbased is currently accepted as the modern way of developing software and this ideology is not often contested, nor the involved technical challenges are appropriately emphasized. Sometimes considered over-engineering, other times necessary, this dissertation contributes with empirical data from insights that showcase the impact of the migration to microservices in several topics. From the trade-offs associated with the use of specific patterns, the development of the functionalities in a distributed way, and the processes to assure a variety of quality attributes, to performance benchmarks experiments and the use of observability techniques, the entire development process is described and constitutes the object of study of this dissertation.O paradigma de desenvolvimento de aplicações tem visto alterações nos últimos anos, sendo o desenvolvimento moderno caracterizado pela necessidade de entrega contínua de novas iterações de software. Com grande afinidade com esses princípios, microsserviços são uma arquitetura de software que conta com características que potencialmente promovem múltiplos atributos de qualidade frequentemente requisitados por aplicações modernas de grandes dimensões. O seu recente crescimento em popularidade e aceitação na industria fez com que este estilo arquitetural se comumente descrito como uma forma de modernizar aplicações que alegadamente resolve todos os inconvenientes apresentados por aplicações monolíticas tradicionais. Contudo, existem vários custos associados à sua adoção, aparentemente descritos de forma muito vaga, frequentemente sumarizados como a "complexidade de um sistema distribuído". A adoção de microsserviços fornece a agilidade para atingir os seus benefícios prometidos, mas para os alcançar, vários princípios de implementação devem ser honrados. Dado que ainda se trata de uma forma recente de desenvolver aplicações, a falta de princípios estabelecidos e conhecimento por parte das equipas de desenvolvimento resulta em julgamentos errados dos custos e valores deste estilo arquitetural. O resultado geralmente são implementações que entram em conflito com os seus benefícios prometidos. De modo a implementar uma arquitetura baseada em microsserviços com os benefícios prometidos existem múltiplos padrões que adicionam considerável complexidade. De modo a avaliar o impacto dos microsserviços de forma concreta e empírica, foi desenvolvida uma mesma plataforma e-commerce de raiz segundo uma arquitetura monolítica e duas arquitetura baseadas em microsserviços, contando com diferentes mecanismos de comunicação entre os serviços. O esforço envolvido em lidar com consistência eventual, manter a infraestrutura de comunicação e gerir os dados de uma forma distribuída representaram desafios não existentes no desenvolvimento de aplicações tradicionais. Apesar disso, a ideologia de migração de uma arquitetura monolítica para uma baseada em microsserviços é atualmente aceite como a forma moderna de desenvolver aplicações, não sendo frequentemente contestada nem os seus desafios técnicos são apropriadamente enfatizados. Por vezes considerado overengineering, outras vezes necessário, a presente dissertação visa contribuir com dados práticos relativamente ao impacto da migração para arquiteturas baseadas em microsserviços em diversos tópicos. Desde os trade-offs envolvidos no uso de padrões específicos, o desenvolvimento das funcionalidades de uma forma distribuída e nos processos para assegurar uma variedade de atributos de qualidade, até análise de benchmarks de performance e uso de técnicas de observabilidade, todo o desenvolvimento é descrito e constitui o objeto de estudo da dissertação

    Self Maintenance of Materialized XQuery Views via Query Containment and Re-Writing

    Get PDF
    In recent years XML, the eXtensible Markup Language has become the de-facto standard for publishing and exchanging information on the web and in enterprise data integration systems. Materialized views are often used in information integration systems to present a unified schema for efficient querying of distributed and possibly heterogenous data sources. On similar lines, ACE-XQ, an XQuery based semantic caching system shows the significant performance gains achieved by caching query results (as materialized views) and using these materialized views along with query containment techniques for answering future queries over distributed XML data sources. To keep data in these materialized views of ACE-XQ up-to-date, the view must be maintained i.e. whenever the base data changes, the corresponding cached data in the materialized view must also be updated. This thesis builds on the query containment ideas of ACE-XQ and proposes an efficient approach for self-maintenance of materialized views. Our experimental results illustrate the significant performance improvement achieved by this strategy over view re-computation for a variety of situations

    Scalable Automated Incrementalization for Real-Time Static Analyses

    Get PDF
    This thesis proposes a framework for easy development of static analyses, whose results are incrementalized to provide instantaneous feedback in an integrated development environment (IDE). Today, IDEs feature many tools that have static analyses as their foundation to assess software quality and catch correctness problems. Yet, these tools often fail to provide instantaneous feedback and are thus restricted to nightly build processes. This precludes developers from fixing issues at their inception time, i.e., when the problem and the developed solution are both still fresh in mind. In order to provide instantaneous feedback, incrementalization is a well-known technique that utilizes the fact that developers make only small changes to the code and, hence, analysis results can be re-computed fast based on these changes. Yet, incrementalization requires carefully crafted static analyses. Thus, a manual approach to incrementalization is unattractive. Automated incrementalization can alleviate these problems and allows analyses writers to formulate their analyses as queries with the full data set in mind, without worrying over the semantics of incremental changes. Existing approaches to automated incrementalization utilize standard technologies, such as deductive databases, that provide declarative query languages, yet also require to materialize the full dataset in main-memory, i.e., the memory is permanently blocked by the data required for the analyses. Other standard technologies such as relational databases offer better scalability due to persistence, yet require large transaction times for data. Both technologies are not a perfect match for integrating static analyses into an IDE, since the underlying data, i.e., the code base, is already persisted and managed by the IDE. Hence, transitioning the data into a database is redundant work. In this thesis a novel approach is proposed that provides a declarative query language and automated incrementalization, yet retains in memory only a necessary minimum of data, i.e., only the data that is required for the incrementalization. The approach allows to declare static analyses as incrementally maintained views, where the underlying formalism for incrementalization is the relational algebra with extensions for object-orientation and recursion. The algebra allows to deduce which data is the necessary minimum for incremental maintenance and indeed shows that many views are self-maintainable, i.e., do not require to materialize memory at all. In addition an optimization for the algebra is proposed that allows to widen the range of self-maintainable views, based on domain knowledge of the underlying data. The optimization works similar to declaring primary keys for databases, i.e., the optimization is declared on the schema of the data, and defines which data is incrementally maintained in the same scope. The scope makes all analyses (views) that correlate only data within the boundaries of the scope self-maintainable. The approach is implemented as an embedded domain specific language in a general-purpose programming language. The implementation can be understood as a database-like engine with an SQL-style query language and the execution semantics of the relational algebra. As such the system is a general purpose database-like query engine and can be used to incrementalize other domains than static analyses. To evaluate the approach a large variety of static analyses were sampled from real-world tools and formulated as incrementally maintained views in the implemented engine

    A abordagem POESIA para a integração de dados e serviços na Web semantica

    Get PDF
    Orientador: Claudia Bauzer MedeirosTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: POESIA (Processes for Open-Ended Systems for lnformation Analysis), a abordagem proposta neste trabalho, visa a construção de processos complexos envolvendo integração e análise de dados de diversas fontes, particularmente em aplicações científicas. A abordagem é centrada em dois tipos de mecanismos da Web semântica: workflows científicos, para especificar e compor serviços Web; e ontologias de domínio, para viabilizar a interoperabilidade e o gerenciamento semânticos dos dados e processos. As principais contribuições desta tese são: (i) um arcabouço teórico para a descrição, localização e composição de dados e serviços na Web, com regras para verificar a consistência semântica de composições desses recursos; (ii) métodos baseados em ontologias de domínio para auxiliar a integração de dados e estimar a proveniência de dados em processos cooperativos na Web; (iii) implementação e validação parcial das propostas, em urna aplicação real no domínio de planejamento agrícola, analisando os benefícios e as limitações de eficiência e escalabilidade da tecnologia atual da Web semântica, face a grandes volumes de dadosAbstract: POESIA (Processes for Open-Ended Systems for Information Analysis), the approach proposed in this work, supports the construction of complex processes that involve the integration and analysis of data from several sources, particularly in scientific applications. This approach is centered in two types of semantic Web mechanisms: scientific workflows, to specify and compose Web services; and domain ontologies, to enable semantic interoperability and management of data and processes. The main contributions of this thesis are: (i) a theoretical framework to describe, discover and compose data and services on the Web, inc1uding mIes to check the semantic consistency of resource compositions; (ii) ontology-based methods to help data integration and estimate data provenance in cooperative processes on the Web; (iii) partial implementation and validation of the proposal, in a real application for the domain of agricultural planning, analyzing the benefits and scalability problems of the current semantic Web technology, when faced with large volumes of dataDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã
    corecore