3 research outputs found

    Container-Managed ETL Applications for Integrating Data in Near Real-Time

    Get PDF
    As the analytical capabilities and applications of e-business systems expand, providing real-time access to critical business performance indicators to improve the speed and effectiveness of business operations has become crucial. The monitoring of business activities requires focused, yet incremental enterprise application integration (EAI) efforts and balancing information requirements in real-time with historical perspectives. The decision-making process in traditional data warehouse environments is often delayed because data cannot be propagated from the source system to the data warehouse in a timely manner. In this paper, we present an architecture for a container-based ETL (extraction, transformation, loading) environment, which supports a continual near real-time data integration with the aim of decreasing the time it takes to make business decisions and to attain minimized latency between the cause and effect of a business decision. Instead of using vendor proprietary ETL solutions, we use an ETL container for managing ETLets (pronounced “et-lets”) for the ETL processing tasks. The architecture takes full advantage of existing J2EE (Java 2 Platform, Enterprise Edition) technology and enables the implementation of a distributed, scalable, near real-time ETL environment. We have fully implemented the proposed architecture. Furthermore, we compare the ETL container to alternative continuous data integration approaches

    O processo de refrescamento nos sistemas de data warehouse: guião de modelação conceptual da tarefa de extracção de dados

    Get PDF
    Nos últimos anos, os Sistemas de Data Warehouse (SDW) têm sido os sistemas de apoio à decisão mais utilizados nas organizações, integrando dados de diferentes fontes nos Repositórios de Data Warehouse (RDW). Com o decorrer do tempo de funcionamento do sistema, coloca-se o problema do refrescamento, entendido como o problema de assegurar que os conteúdos dos RDW são periodicamente refrescados, de modo a reflectirem as alterações que ocorrem nos dados das fontes que lhes servem de base. Esta dissertação propõe uma abordagem que tem como objectivos principais tornar explícito e documentar o problema do refrescamento e apresentar um guião de modelação conceptual da tarefa de extracção de dados que possa enriquecer as fases subsequentes de desenho para a especificação formal do processo de refrescamento. São dois os contributos desta dissertação. Primeiro, providencia um quadro detalhado sobre o problema do refrescamento que inclui os conceitos e questões fundamentais que permitem caracterizar os SDW, na perspectiva das funcionalidades no apoio à decisão, das abordagens de integração de fontes de dados e dos componentes da arquitectura, os constrangimentos e tarefas que compreendem o processo de refrescamento, as principais abordagens disponíveis na literatura. Segundo, propõe um guião de apoio à modelação conceptual da tarefa de extracção de dados, com base na UML, apresentando os passos que devem ser seguidos pelo designer e disponibilizando as construções que permitem representar os dados que se extraem das fontes, de acordo com as regras que permitem isolar e extrair os dados relevantes para a tomada de decisão.Data Warehouse Systems (DWS) have become very popular in the last years for decision making, by integrating data from internal and external sources into data warehouse stores. As times advances and the sources from which warehouse data is integrated change, the data warehouse contents must be regularly refreshed, such that warehouse data reflect the state of the underlying data sources. This dissertation proposes an approach which main goals are to explicit and document the data warehouse refreshment problem and to present a guidelines for the conceptual modelling of data extraction in order to enrich the subsequent design steps for the formal specification of the refreshment process. The contributions of our approach are twofold. First, it provides a detailed outline of data warehouse refreshment problem, including the main concepts and issues that characterise the general domain of the DWS, such as decision making functionalities, data sources integration approaches and architecture and, the refreshment tasks and constraints as well as the main approaches. Second, it proposes a guidelines for an UML conceptual modelling of data extraction, by giving the sequence of steps for a designer to follow, the modelling constructs for the definition of extracting data, according to the rules that must be accomplished for extracting relevant data

    Detecting and Tolerating Byzantine Faults in Database Systems

    Get PDF
    This thesis describes the design, implementation, and evaluation of a replication scheme to handle Byzantine faults in transaction processing database systems. The scheme compares answers from queries and updates on multiple replicas which are off-the-shelf database systems, to provide a single database that is Byzantine fault tolerant. The scheme works when the replicas are homogeneous, but it also allows heterogeneous replication in which replicas come from different vendors. Heterogeneous replicas reduce the impact of bugs and security compromises because they are implemented independently and are thus less likely to suffer correlated failures. A final component of the scheme is a repair mechanism that can correct the state of a faulty replica, ensuring the longevity of the scheme.The main challenge in designing a replication scheme for transaction processingsystems is ensuring that the replicas state does not diverge while allowing a high degree of concurrency. We have developed two novel concurrency control protocols, commit barrier scheduling (CBS) and snapshot epoch scheduling (SES) that provide strong consistency and good performance. The two protocols provide different types of consistency: CBS provides single-copy serializability and SES provides single-copy snapshot isolation. We have implemented both protocols in the context of a replicated SQL database. Our implementation has been tested with production versions of several commercial and open source databases as replicas. Our experiments show a configuration that can tolerate one faulty replica has only a modest performance overhead (about 10-20% for the TPC-C benchmark). Our implementation successfully masks several Byzantine faults observed in practice and we have used it to find a new bug in MySQL
    corecore