2 research outputs found

    Captura de dados em tempo real em sistemas de data warehousing

    Get PDF
    Dissertação de mestrado em Engenharia Informáticamassificação dos sistemas de informação tem contribuído significativamente para a forma como os utilizadores interagem com as empresas e seus sistemas. Esta nova relação entre cliente e fornecedor tem aumentado significativamente o volume de dados gerados pelas organizações, criando novas necessidades de como manter e gerir toda esta informação. Assim, as empresas têm investido cada vez mais em soluções que permitam manter toda a informação tratada e consolidada num repositório único de dados. Estes sistemas são vulgarmente designados por sistemas de data warehousing. Tradicionalmente, estes sistemas são refrescados em modo offline, em períodos de tempo que podem ser diários ou semanais. Contudo, o aumento da competitividade no mundo empresarial torna este tipo de refrescamentos desadequados, originando uma reação atrasada à ação que despoletou essa informação. Na realidade, períodos longos de refrescamento tornam a informação desatualizada, diminuído consequentemente a sua importância e valor para a organização em causa. Assim sendo, é cada vez mais necessário que a informação armazenada num sistema de data warehousing, seja a mais recente possível, evitando interrupções na disponibilização da informação. A necessidade de obter a informação em tempo real, coloca alguns desafios, tais como manter os dados acessíveis 24 horas por dia, 7 dias por semana, 365 dias por ano, reduzir o período de latência dos dados ou evitar estrangulamentos operacionais nos sistemas transacionais. Assim, é imperativo a utilização de técnicas de coleta de dados não intrusivas, que atuem no momento em que determinado evento ocorreu num sistema operacional e reflitam a sua informação de forma imediata (ou quase imediata) num sistema de data warehousing. Neste trabalho de dissertação pretendese estudar a problemática relacionada com a captura de dados em tempo real e conceber um componente que capaz de suportar um sistema de extração de dados em tempo real universal, que capture as mudanças ocorridas nos sistemas transacionais, de forma não intrusiva, e as comunique na altura certa ao seu sistema de data warehousing.The mass of information systems has contributed significantly to the way users interact with companies and their systems. This new relation between customer and supplier hassignificantly increased the amount of data generated by organizations, creating new needs to maintain and manage all this information. Thus, companies haveincreasingly invested in solutions that allow them to maintain all the information processed and consolidated on a unique data repository. These systems are commonly called Data Warehousing Systems. Traditionally, these systems are refreshed in offline mode in periods of time that can be daily or weekly. Although, the increase of the competitively in the business world, makes this kind of refreshments unsustainable, resulting in a delayed reaction to the action that triggered this information. In truth, long periods between refreshments make the information out-dated, consequently decreasing his importance and the value of the organization. . In that case, it is increasingly necessary that the information stored on the data warehousing systems, is the more recent possible, taking back interruption on the share of that information. The need of obtain information in real time, puts some challenges, as keep all the data accessible 24 hours a day, 7 day a week, 365 days a year, reducingthe periods of data latency or avoiding operational strangulations in transactional systems. Thus, it is imperative the usage of techniques of data collection nonintrusive that can act when some particular event occurred on operational systems and reflect that information immediately (or almost immediately) on the data warehousing system.In this dissertation, we intend to study all the problematic related to real time change data capture, and conceiving a component capable to support an universal real time data extraction system, capable of capture the changes occurred on a transactional system, in a non-intrusive way and communicate with the data warehousing system in the right time

    Pragmatic development of service based real-time change data capture

    Get PDF
    This thesis makes a contribution to the Change Data Capture (CDC) field by providing an empirical evaluation on the performance of CDC architectures in the context of realtime data warehousing. CDC is a mechanism for providing data warehouse architectures with fresh data from Online Transaction Processing (OLTP) databases. There are two types of CDC architectures, pull architectures and push architectures. There is exiguous data on the performance of CDC architectures in a real-time environment. Performance data is required to determine the real-time viability of the two architectures. We propose that push CDC architectures are optimal for real-time CDC. However, push CDC architectures are seldom implemented because they are highly intrusive towards existing systems and arduous to maintain. As part of our contribution, we pragmatically develop a service based push CDC solution, which addresses the issues of intrusiveness and maintainability. Our solution uses Data Access Services (DAS) to decouple CDC logic from the applications. A requirement for the DAS is to place minimal overhead on a transaction in an OLTP environment. We synthesize DAS literature and pragmatically develop DAS that eciently execute transactions in an OLTP environment. Essentially we develop effeicient RESTful DAS, which expose Transactions As A Resource (TAAR). We evaluate the TAAR solution and three pull CDC mechanisms in a real-time environment, using the industry recognised TPC-C benchmark. The optimal CDC mechanism in a real-time environment, will capture change data with minimal latency and will have a negligible affect on the database's transactional throughput. Capture latency is the time it takes a CDC mechanism to capture a data change that has been applied to an OLTP database. A standard definition for capture latency and how to measure it does not exist in the field. We create this definition and extend the TPC-C benchmark to make the capture latency measurement. The results from our evaluation show that pull CDC is capable of real-time CDC at low levels of user concurrency. However, as the level of user concurrency scales upwards, pull CDC has a significant impact on the database's transaction rate, which affirms the theory that pull CDC architectures are not viable in a real-time architecture. TAAR CDC on the other hand is capable of real-time CDC, and places a minimal overhead on the transaction rate, although this performance is at the expense of CPU resources.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore