209 research outputs found

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    Improvement of data quality with timeliness in information manufacturing system (IMS)

    Get PDF
    Nowadays in the digital world, organizations or enterprises like banks, hospitals, telecommunications or retail shops etc. has an information manufacturing system (IMS) for storing the organization’s data in digital format. Every day, a large quantity of data is manipulated (inserted, deleted and updated) to the information manufacturing system of those enterprises or organizations. To be successful, the IMS must maintain the data and transform it into useful information for decision makers or users. Much of the value will rest in the quality of the data, which may be divided into two classes; objective and time related. In seeking to maintain quality both these classes the completeness, accuracy and consistency of the data and the timeliness of the information generation may be required. As a further complication, Objective data quality class may not be independent. It could be dependent on timeliness of time related class. The main purpose of this research is the improvement of data quality with timeliness in IMS. This starts with observing the reasons for the change of objective data quality over time by using both theoretical and experimental data quality measurements. Novel approaches to ensuring the best possible information quality is developed and evaluated by observing the change of objective data quality scenario with timeliness in a purpose built IMS

    Building Data Warehouse and Dashboard of Church Congregation Data

    Get PDF
    A data warehouse is essential for an organization to process and analyze data coming from the organization. Hence, a data warehouse together with a dashboard to visualize the processed data are built to accommodate the need of the church administrator to analyze a large set of church congregation data. The data warehouse is built using the Kimball principle. This Kimball principle emphasizes the implementation of a dimensional model in the data warehouse, not a relational model used in a regular transactional database. An ETL process that contains extract, transform and load processes is used to retrieve all data from the regular transactional database and transform the data so the data can be loaded into the data warehouse. A dashboard is then built to visualize the data from the data warehouse so the users can view the processed data easily. Users can also export the processed data into an excel file that can be downloaded from the dashboard. A web service is built to get data from the data warehouse and return it to the dashboard

    Updating Data Warehouses with Temporal Data

    Get PDF
    There has been a growing trend to use temporal data in a data warehouse for making strategic and tactical decisions. The key idea of temporal data management is to make data available at the right time with different time intervals. The temporal data storing enables this by making all the different time slices of data available to whoever needs it. Users with different data latency needs can all be accommodated. Data can be “frozen” via a view on the proper time slice. Data as of a point in time can be obtained across multiple tables or multiple subject areas, resolving consistency and synchronization issues. This paper will discuss implementations such as temporal data updates, coexistence of load and query against the same table, performance of load and report queries, and maintenance of views against the tables with temporal data

    Striving towards Near Real-Time Data Integration for Data Warehouses

    Full text link
    Abstract. The amount of information available to large-scale enterprises is growing rapidly. While operational systems are designed to meet well-specified (short) response time requirements, the focus of data warehouses is generally the strategic analysis of business data integrated from heterogeneous source systems. The decision making process in traditional data warehouse environments is often delayed because data cannot be propagated from the source system to the data warehouse in time. A real-time data warehouse aims at decreasing the time it takes to make business decisions and tries to attain zero latency between the cause and effect of a business decision. In this paper we present an architecture of an ETL environment for real-time data warehouses, which supports a continual near real-time data propagation. The architecture takes full advantage of existing J2EE (Java 2 Platform, Enterprise Edition) technology and enables the implementation of a distributed, scalable, near real-time ETL environment. Instead of using vendor proprietary ETL (extraction, transformation, loading) solutions, which are often hard to scale and often do not support an optimization of allocated time frames for data extracts, we propose in our approach ETLets (spoken “et-lets”) and Enterprise Java Beans (EJB) for the ETL processing tasks. 1

    O processo de refrescamento nos sistemas de data warehouse: guião de modelação conceptual da tarefa de extracção de dados

    Get PDF
    Nos últimos anos, os Sistemas de Data Warehouse (SDW) têm sido os sistemas de apoio à decisão mais utilizados nas organizações, integrando dados de diferentes fontes nos Repositórios de Data Warehouse (RDW). Com o decorrer do tempo de funcionamento do sistema, coloca-se o problema do refrescamento, entendido como o problema de assegurar que os conteúdos dos RDW são periodicamente refrescados, de modo a reflectirem as alterações que ocorrem nos dados das fontes que lhes servem de base. Esta dissertação propõe uma abordagem que tem como objectivos principais tornar explícito e documentar o problema do refrescamento e apresentar um guião de modelação conceptual da tarefa de extracção de dados que possa enriquecer as fases subsequentes de desenho para a especificação formal do processo de refrescamento. São dois os contributos desta dissertação. Primeiro, providencia um quadro detalhado sobre o problema do refrescamento que inclui os conceitos e questões fundamentais que permitem caracterizar os SDW, na perspectiva das funcionalidades no apoio à decisão, das abordagens de integração de fontes de dados e dos componentes da arquitectura, os constrangimentos e tarefas que compreendem o processo de refrescamento, as principais abordagens disponíveis na literatura. Segundo, propõe um guião de apoio à modelação conceptual da tarefa de extracção de dados, com base na UML, apresentando os passos que devem ser seguidos pelo designer e disponibilizando as construções que permitem representar os dados que se extraem das fontes, de acordo com as regras que permitem isolar e extrair os dados relevantes para a tomada de decisão.Data Warehouse Systems (DWS) have become very popular in the last years for decision making, by integrating data from internal and external sources into data warehouse stores. As times advances and the sources from which warehouse data is integrated change, the data warehouse contents must be regularly refreshed, such that warehouse data reflect the state of the underlying data sources. This dissertation proposes an approach which main goals are to explicit and document the data warehouse refreshment problem and to present a guidelines for the conceptual modelling of data extraction in order to enrich the subsequent design steps for the formal specification of the refreshment process. The contributions of our approach are twofold. First, it provides a detailed outline of data warehouse refreshment problem, including the main concepts and issues that characterise the general domain of the DWS, such as decision making functionalities, data sources integration approaches and architecture and, the refreshment tasks and constraints as well as the main approaches. Second, it proposes a guidelines for an UML conceptual modelling of data extraction, by giving the sequence of steps for a designer to follow, the modelling constructs for the definition of extracting data, according to the rules that must be accomplished for extracting relevant data

    Data freshness and data accuracy :a state of the art

    Get PDF
    In a context of Data Integration Systems (DIS) providing access to large amounts of data extracted and integrated from autonomous data sources, users are highly concerned about data quality. Traditionally, data quality is characterized via multiple quality factors. Among the quality dimensions that have been proposed in the literature, this report analyzes two main ones: data freshness and data accuracy. Concretely, we analyze the various definitions of both quality dimensions, their underlying metrics and the features of DIS that impact their evaluation. We present a taxonomy of existing works proposed for dealing with both quality dimensions in several kinds of DIS and we discuss open research problems
    corecore