15,515 research outputs found

    Quality measures for ETL processes: from goals to implementation

    Get PDF
    Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of business process management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated, and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper, we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using goal modeling techniques. We showcase the use of goal modeling for ETL process design through a use case, where we employ the use of a goal model that includes quantitative components (i.e., indicators) for evaluation and analysis of alternative design decisions.Peer ReviewedPostprint (author's final draft

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Full text link
    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

    An Open Source Based Data Warehouse Architecture to Support Decision Making in the Tourism Sector

    Get PDF
    In this paper an alternative Tourism oriented Data Warehousing architecture is proposed which makes use of the most recent free and open source technologies like Java, Postgresql and XML. Such architecture's aim will be to support the decision making process and giving an integrated view of the whole Tourism reality in an established context (local, regional, national, etc.) without requesting big investments for getting the necessary software.Tourism, Data warehousing architecture

    Supporting adaptiveness of cyber-physical processes through action-based formalisms

    Get PDF
    Cyber Physical Processes (CPPs) refer to a new generation of business processes enacted in many application environments (e.g., emergency management, smart manufacturing, etc.), in which the presence of Internet-of-Things devices and embedded ICT systems (e.g., smartphones, sensors, actuators) strongly influences the coordination of the real-world entities (e.g., humans, robots, etc.) inhabitating such environments. A Process Management System (PMS) employed for executing CPPs is required to automatically adapt its running processes to anomalous situations and exogenous events by minimising any human intervention. In this paper, we tackle this issue by introducing an approach and an adaptive Cognitive PMS, called SmartPM, which combines process execution monitoring, unanticipated exception detection and automated resolution strategies leveraging on three well-established action-based formalisms developed for reasoning about actions in Artificial Intelligence (AI), including the situation calculus, IndiGolog and automated planning. Interestingly, the use of SmartPM does not require any expertise of the internal working of the AI tools involved in the system

    Determining the Data Needs for Decision Making in Public Libraries

    Get PDF
    Library decision makers evaluate community needs and library capabilities in order to select the appropriate services offered by their particular institution. Evaluations of the programs and services may indicate that some are ineffective or inefficient, or that formerly popular services are no longer needed. The internal and external conditions used for decision making change. Monitoring these conditions and evaluations allows the library to make new decisions that maintain its relevance to the community. Administrators must have ready access to appropriate data that will give them the information they need for library decision making. Today’s computer-based libraries accumulate electronic data in their integrated library systems (ILS) and other operational databases; however, these systems do not provide tools for examining the data to reveal trends and patterns, nor do they have any means of integrating important information from other programs and files where the data are stored in incompatible formats. These restrictions are overcome by use of a data warehouse and a set of analytical software tools, forming a decision support system. The data warehouse must be tailored to specific needs and users to succeed. Libraries that wish to pursue decision support can begin by performing a needs analysis to determine the most important use of the proposed warehouse and to identify the data elements needed to support this use. The purpose of this study is to complete the needs analysis phase for a data warehouse for a certain public library that is interested in using its electronic data for data mining and other analytical processes. This study is applied research. Data on users’ needs were collected through two rounds of face-to-face interviews. Participants were selected purposively. The phase one interviews were semi-structured, designed to discover the uses of the data warehouse, and then what data were required for those uses. The phase two interviews were structured, and presented selected data elements from the ILS to interviewees who were asked to evaluate how they would use each element in decision making. Analysis of these interviews showed that the library needs data from sources that vary in physical format, in summary levels, and in data definitions. The library should construct data marts, carefully designed for future integration into a data warehouse. The only data source that is ready for a data mart is the bibliographic database of the integrated library system. Entities and relationships from the ILS are identified for a circulation data mart. The entities and their attributes are described. A second data mart is suggested for integrating vendor reports for the online databases. Vendor reports vary widely in how they define their variables and in the summary levels of their statistics. Unified data definitions need to be created for the variables of importance so that online database usage may be compared with other data on use of library resources, reflected in the circulation data mart. Administrators need data to address a number of other decision situations. These decisions require data from other library sources that are not optimized for data warehousing, or that are external to the library. Suggestions are made for future development of data marts using these sources. The study concludes by recommending that libraries wishing to undertake similar studies begin with a pre-assessment of the entire institution, its data sources, and its management structure, conducted by a consultant. The needs assessment itself should include a focus group session in addition to the interviews

    Implementation of business intelligence tools using open source approach

    Get PDF
    Discovering business intelligence is the modern organization’s way of gaining competitive advantage in their market, supported by Decisions Support Systems or Business Intelligence Systems. The first step in any decision support system is to create the repository of data for the system to collect and display any information requested. This repository is the source of all business intelligence and implementing it requires the right software tools, essential for the data warehouse. Therefore, when choosing the software tool, the project size, budget constraints and risks should be kept in mind. Overall the right choice depends on the organization’s needs and ambitions. The essential work to be done here is to demonstrate that open source software can be an accurate and reliable tool to implement data warehouse projects. The two ETL solutions used were: • Pentaho Kettle Data Integration Community Editions (Open Source Software) • SQL Server 2005 Integrations Services (SSIS) Enterprise Edition (Proprietary Software) The proprietary, commercial software in question (as well as others) is widely used. However, an open source solution has key features recognized by organizations worldwide and this work will show the different functionalities and benefits of this open source approach.Nas organizações a descoberta de conhecimento do negócio é o processo para alcançar vantagem competitiva sobre os seus concorrentes, e esta é apoiada por Sistemas de Suporte á decisão ou por Business Intelligence termo atualmente em voga.A primeira coisa a fazer em qualquer tipo de sistema de apoio à decisão é criar o repositório de dados de informação onde o sistema vai recolher e mostrar todas as informações solicitadas. Este repositório é a fonte de todo o conhecimento do negócio, e a sua construção exige as ferramentas de software corretas para o desenvolvimento do data warehouse. Deve-se por isso ao escolher a ferramenta de software pensar nos requisitos para a seleção do software do mercado, a escolha do software envolve o tamanho do projecto, orçamento, ou riscos a tomar em mente. Globalmente, a escolha certa depende das necessidades de organização e suas ambições. O trabalho essencial a ser feito aqui é demonstrar que o software open source pode ser uma ferramenta fiavél e eficaz para implementar projetos de data warehouse. As duas soluções ETL utilizadas foram: • Pentaho Data Integration Chaleira Editions Comunidade (Open Source Software) • SQL Server 2005 Integration Services (SSIS) Enterprise Edition (Software Proprietário) O software proprietario, comercial em questão (assim como outros) é amplamente utilizado. No entanto, uma solução de open source tem características fundamentais que são reconhecidas por organizações em todo o mundo e este trabalho irá mostrar as diferentes funcionalidades e benefícios desta abordagem de software open source

    ENHANCED BI SYSTEMS WITH ON-DEMAND DATA BASED ON SEMANTIC-ENABLED ENTERPRISE SOA

    Get PDF
    Since the 1990s, companies have been investing into IT infrastructure initiatives such as Enterprise Resource Planning (ERP) systems, Supply Chain Management (SCM) systems, and Customer Relationship Management (CRM) systems in order to increase efficiency, effectiveness, and internal process integration, among other goals. The current value of Business Intelligence (BI) for companies could be summarized by two main achievements: improvement of management of processes and improvement of operational processes. This paper will identify current requirements of BI and present a linkage to service-oriented architectures including added-values. Semantic-enabled Enterprise Service-Oriented Architecture (SESOA) is an enterprise solution that links businesses to external systems based on Web Services and SOA concept. It represents a lightweight web application that annotates Web Services that are coming from different service providers with semantics so that the indexing and discovery of these services can be more comprehensive. BI applications can be considered as service consumers in SESOA and can discover, select and invoke the services supplied by the external systems (service providers). In this way, SESOA forms the bridge between SOA and BI concepts to deliver in real time the ?on-demand? data as services and this opens the BI market to include SMEs as main resources of these services
    corecore