15,515 research outputs found
Quality measures for ETL processes: from goals to implementation
Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of business process management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated, and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper, we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using goal modeling techniques. We showcase the use of goal modeling for ETL process design through a use case, where we employ the use of a goal model that includes quantitative components (i.e., indicators) for evaluation and analysis of alternative design decisions.Peer ReviewedPostprint (author's final draft
A unified view of data-intensive flows in business intelligence systems : a survey
Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft
Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure
Big data research has attracted great attention in science, technology,
industry and society. It is developing with the evolving scientific paradigm,
the fourth industrial revolution, and the transformational innovation of
technologies. However, its nature and fundamental challenge have not been
recognized, and its own methodology has not been formed. This paper explores
and answers the following questions: What is big data? What are the basic
methods for representing, managing and analyzing big data? What is the
relationship between big data and knowledge? Can we find a mapping from big
data into knowledge space? What kind of infrastructure is required to support
not only big data management and analysis but also knowledge discovery, sharing
and management? What is the relationship between big data and science paradigm?
What is the nature and fundamental challenge of big data computing? A
multi-dimensional perspective is presented toward a methodology of big data
computing.Comment: 59 page
An Open Source Based Data Warehouse Architecture to Support Decision Making in the Tourism Sector
In this paper an alternative Tourism oriented Data Warehousing architecture is proposed which makes use of the most recent free and open source technologies like Java, Postgresql and XML. Such architecture's aim will be to support the decision making process and giving an integrated view of the whole Tourism reality in an established context (local, regional, national, etc.) without requesting big investments for getting the necessary software.Tourism, Data warehousing architecture
Recommended from our members
Selection process of auto-ID technology in warehouse management: A Delphi study
This thesis was submitted for the degree of Doctor of philosophy and awarded by Brunel UniversityIn a supply chain, a warehouse is a crucial component for linking all chain parties. Automatic identification and data capture (auto-ID) technology, e.g. RFID and barcodes are among the essential technologies in the 21st century knowledge-based economy. Selecting an auto-ID technology is a long term investment and it contributes to improving operational efficiency, achieving cost savings and creating opportunities for higher revenues. The interest in auto-ID research for warehouse management is rather stagnant and relatively small in comparison to other research domains such as transport, logistics and supply chain. However, although there are some previous studies that explored factors for the auto-ID selection decision in a warehouse environment, those factors (e.g., operational factors) have been examined separately and researchers have paid no attention to all key factors that may potentially affect this decision. In fact, yet there is no comprehensive framework in the literature that comprehensively investigates the critical factors influencing the auto-ID selection decision and how the factors should be combined to produce a successful auto-ID selection process in warehouse management. Therefore, the main aim of this research is to investigate empirically the auto-ID technology-selection process and to determine the key factors that influence decision makers when selecting auto-ID technology in the warehouse environment. This research is preceded by a comprehensive and systematic review of the relevant literature to identify the set of factors that may affect the technology selection decision. The Technology-Organisation-Environment (TOE) framework has been used as lens to categorise the identified factors (Tornatzky & Fleischer, 1990). Data were collected by conducting first a modified (mixed-method) two-round Delphi study with a worldwide panel of experts (107) including academics, industry practitioners and consultants in auto-ID technologies. The results of the Delphi study were then verified via follow-up interviews, both face-to-face and telephone, carried out with 19 experts across the world. This research in nature is positivist, exploratory/descriptive, deductive/inductive and quantitative/qualitative. The quantitative data were analysed using the statistical package for social sciences, SPSS V.18, while the qualitative data of the Delphi study and the interviews were analysed manually using quantitative content analysis approach and thematic content analysis approach respectively. The findings of this research are reported on the motivations/reasons of warehouses in seeking to use auto-ID technologies, the challenges in making an auto-ID decision, the recommendations to address the challenges, the key steps that should be followed in making auto-ID selection decision, the key factors and their relative importance that influence auto-ID selection decision in a warehouse. The results of the Delphi study show that the six major factors affecting the auto-ID selection decision in warehouse management are: organisational, operational, structural, resources, external environmental and technological factors (in decreasing order of importance). In addition, 54 key sub-factors have been identified from the list of each of the major factors and ranked in decreasing order of the importance mean scores. However, the importance of these factors depends on the objectives and strategic motivations of warehouse; size of warehouse; type of business; nature of business environment; sectors; market types; products and countries. Based on the Delphi study and the interviews findings, a comprehensive multi-stage framework for auto-ID technology selection process has been developed. This research indicates that the selection process is complex and needs support and closer collaboration from all participants involved in the process such as the IT team, top management, warehouse manager, functional managers, experts, stockholders and vendors. Moreover, warehouse managers should have this process for collaboration before adopting the technology in order to reduce the high risks involved and achieve successful implementation. This research makes several contributions for both academic and practitioners with auto-ID selection in a warehouse environment. Academically, it provides a holistic multi-stage framework that explains the critical issues within the decision making process of auto-ID technology in warehouse management. Moreover, it contributes to the body of auto-ID and warehouse management literature by synthesising the literature on key dimensions of auto-ID (RFID/barcode) selection decision in the warehouse field. This research also provides a theoretical basis upon which future research on auto-ID selection and implementation can be built. Practically, the findings provide valuable insights for warehouse managers and executives associated with auto-ID selection and advance their understanding of the issues involved in the technology selection process that need to be considered.Damascus University, Syria and The British Council, Mancheste
Supporting adaptiveness of cyber-physical processes through action-based formalisms
Cyber Physical Processes (CPPs) refer to a new generation of business processes enacted in many application environments (e.g., emergency management, smart manufacturing, etc.), in which the presence of Internet-of-Things devices and embedded ICT systems (e.g., smartphones, sensors, actuators) strongly influences the coordination of the real-world entities (e.g., humans, robots, etc.) inhabitating such environments. A Process Management System (PMS) employed for executing CPPs is required to automatically adapt its running processes to anomalous situations and exogenous events by minimising any human intervention. In this paper, we tackle this issue by introducing an approach and an adaptive Cognitive PMS, called SmartPM, which combines process execution monitoring, unanticipated exception detection and automated resolution strategies leveraging on three well-established action-based formalisms developed for reasoning about actions in Artificial Intelligence (AI), including the situation calculus, IndiGolog and automated planning. Interestingly, the use of SmartPM does not require any expertise of the internal working of the AI tools involved in the system
Determining the Data Needs for Decision Making in Public Libraries
Library decision makers evaluate community needs and library capabilities in order to select the appropriate services offered by their particular institution. Evaluations of the programs and services may indicate that some are ineffective or inefficient, or that formerly popular services are no longer needed. The internal and external conditions used for decision making change. Monitoring these conditions and evaluations allows the library to make new decisions that maintain its relevance to the community.
Administrators must have ready access to appropriate data that will give them the information they need for library decision making. Today’s computer-based libraries accumulate electronic data in their integrated library systems (ILS) and other operational databases; however, these systems do not provide tools for examining the data to reveal trends and patterns, nor do they have any means of integrating important information from other programs and files where the data are stored in incompatible formats. These restrictions are overcome by use of a data warehouse and a set of analytical software tools, forming a decision support system. The data warehouse must be tailored to specific needs and users to succeed. Libraries that wish to pursue decision support can begin by performing a needs analysis to determine the most important use of the proposed warehouse and to identify the data elements needed to support this use.
The purpose of this study is to complete the needs analysis phase for a data warehouse for a certain public library that is interested in using its electronic data for data mining and other analytical processes. This study is applied research. Data on users’ needs were collected through two rounds of face-to-face interviews. Participants were selected purposively. The phase one interviews were semi-structured, designed to discover the uses of the data warehouse, and then what data were required for those uses. The phase two interviews were structured, and presented selected data elements from the ILS to interviewees who were asked to evaluate how they would use each element in decision making.
Analysis of these interviews showed that the library needs data from sources that vary in physical format, in summary levels, and in data definitions. The library should construct data marts, carefully designed for future integration into a data warehouse. The only data source that is ready for a data mart is the bibliographic database of the integrated library system. Entities and relationships from the ILS are identified for a circulation data mart. The entities and their attributes are described.
A second data mart is suggested for integrating vendor reports for the online databases. Vendor reports vary widely in how they define their variables and in the summary levels of their statistics. Unified data definitions need to be created for the variables of importance so that online database usage may be compared with other data on use of library resources, reflected in the circulation data mart.
Administrators need data to address a number of other decision situations. These decisions require data from other library sources that are not optimized for data warehousing, or that are external to the library. Suggestions are made for future development of data marts using these sources.
The study concludes by recommending that libraries wishing to undertake similar studies begin with a pre-assessment of the entire institution, its data sources, and its management structure, conducted by a consultant. The needs assessment itself should include a focus group session in addition to the interviews
Implementation of business intelligence tools using open source approach
Discovering business intelligence is the modern organization’s way of gaining competitive advantage in their market, supported by Decisions Support Systems or Business Intelligence Systems. The first step in any decision support system is to create the repository of data for the system to collect and display any information requested. This repository is the source of all business intelligence and implementing it requires the right software tools, essential for the data warehouse. Therefore, when choosing the software tool, the project size, budget constraints and risks should be kept in mind. Overall the right choice depends on the organization’s needs and ambitions. The essential work to be done here is to demonstrate that open source software can be an accurate and reliable tool to implement data warehouse projects. The two ETL solutions used
were:
• Pentaho Kettle Data Integration Community Editions (Open Source Software)
• SQL Server 2005 Integrations Services (SSIS) Enterprise Edition (Proprietary Software)
The proprietary, commercial software in question (as well as others) is widely used. However,
an open source solution has key features recognized by organizations worldwide and this
work will show the different functionalities and benefits of this open source approach.Nas organizações a descoberta de conhecimento do negócio é o processo para alcançar vantagem competitiva sobre os seus concorrentes, e esta é apoiada por Sistemas de Suporte á decisão ou por Business Intelligence termo atualmente em voga.A primeira coisa a fazer em qualquer tipo de sistema de apoio à decisão é criar o repositório de dados de informação onde o sistema vai recolher e mostrar todas as informações solicitadas. Este repositório é a fonte de todo o conhecimento do negócio, e a sua construção exige as ferramentas de software corretas para o desenvolvimento do data warehouse. Deve-se por isso ao escolher a ferramenta de software pensar nos requisitos para a seleção do software do mercado, a escolha do software envolve o tamanho do projecto, orçamento, ou riscos a tomar em mente. Globalmente, a escolha certa depende das necessidades de organização e suas ambições. O trabalho essencial a ser feito aqui é demonstrar que o software open source pode ser uma ferramenta fiavél e eficaz para implementar projetos de data warehouse. As duas soluções ETL utilizadas foram:
• Pentaho Data Integration Chaleira Editions Comunidade (Open Source Software)
• SQL Server 2005 Integration Services (SSIS) Enterprise Edition (Software Proprietário)
O software proprietario, comercial em questão (assim como outros) é amplamente utilizado.
No entanto, uma solução de open source tem características fundamentais que são reconhecidas por organizações em todo o mundo e este trabalho irá mostrar as diferentes funcionalidades e benefícios desta abordagem de software open source
ENHANCED BI SYSTEMS WITH ON-DEMAND DATA BASED ON SEMANTIC-ENABLED ENTERPRISE SOA
Since the 1990s, companies have been investing into IT infrastructure initiatives such as Enterprise Resource Planning (ERP) systems, Supply Chain Management (SCM) systems, and Customer Relationship Management (CRM) systems in order to increase efficiency, effectiveness, and internal process integration, among other goals. The current value of Business Intelligence (BI) for companies could be summarized by two main achievements: improvement of management of processes and improvement of operational processes. This paper will identify current requirements of BI and present a linkage to service-oriented architectures including added-values. Semantic-enabled Enterprise Service-Oriented Architecture (SESOA) is an enterprise solution that links businesses to external systems based on Web Services and SOA concept. It represents a lightweight web application that annotates Web Services that are coming from different service providers with semantics so that the indexing and discovery of these services can be more comprehensive. BI applications can be considered as service consumers in SESOA and can discover, select and invoke the services supplied by the external systems (service providers). In this way, SESOA forms the bridge between SOA and BI concepts to deliver in real time the ?on-demand? data as services and this opens the BI market to include SMEs as main resources of these services
- …