2,069 research outputs found

    Classification of Metadata Categories in Data Warehousing - A Generic Approach

    Get PDF
    Using appropriate metadata is a central success factor for (re)engineering and using data warehouse systems effectively and efficiently. The approach presented in this paper aims to reduce the effort in developing and operating data warehouse systems and thus to increase the ability and acceptance of a data warehouse. To achieve these objectives identifying the appropriate metadata is an important task. To avoid processing the “wrong” object data and thus compromising the acceptance of a data warehouse system, a systematic approach to categorize and to identify the appropriate metadata is essential. This paper presents such a generic approach. After investing and structuring problem situations, that can occur in data warehousing, metadata categories are identified to solve a given problem situation. A use case illustrates the approach

    A model-based software architecture for XML data and metadata integration in data warehouse systems

    Get PDF
    This project is carried out to develop a system prototype of an electronic tendering (e-Tender) system.Several steps have been taken starting with information gathering and analyzing, developing a prototype, and ending in system testing.The prototype was further tested with real users to analyze the document flow speed.In conclusion, e-Tendering system has a better approach compared to the manual process of tender. The document flow speed was increased by 58.5%, which suggests a more efficient process

    Enterprise Metadata Management: Identifying Success Factors For Implementing Managed Metadata Environments

    Get PDF
    Managed metadata environments (MME) are being employed in organisations that need to assure a consistent and efficient capture, integration and delivery of enterprise metadata. Initiatives to implement a MME in an organisation may be a daunting endeavour and various information systems have evolved over time to support such environments. The expert study at hand used a multi-round Delphi research method in order to identify critical success factors of these initiatives. Out of the ten critical success factors nominated through the early rounds, nine factors were found to be very-toextremely important and one factor moderately important. The identified success factors can be used as a basis for implementation frameworks in metadata management initiatives. An effective and efficient metadata management system is one of the key components of data and information management, and can greatly aid organisations‟ efforts toward improved information quality and governanc

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    ETL for data science?: A case study

    Get PDF
    Big data has driven data science development and research over the last years. However, there is a problem - most of the data science projects don't make it to production. This can happen because many data scientists don’t use a reference data science methodology. Another aggravating element is data itself, its quality and processing. The problem can be mitigated through research, progress and case studies documentation about the topic, fostering knowledge dissemination and reuse. Namely, data mining can benefit from other mature fields’ knowledge that explores similar matters, like data warehousing. To address the problem, this dissertation performs a case study about the project “IA-SI - Artificial Intelligence in Incentives Management”, which aims to improve the management of European grant funds through data mining. The key contributions of this study, to the academia and to the project’s development and success are: (1) A combined process model of the most used data mining process models and their tasks, extended with the ETL’s subsystems and other selected data warehousing best practices. (2) Application of this combined process model to the project and all its documentation. (3) Contribution to the project’s prototype implementation, regarding the data understanding and data preparation tasks. This study concludes that CRISP-DM is still a reference, as it includes all the other data mining process models’ tasks and detailed descriptions, and that its combination with the data warehousing best practices is useful to the project IA-SI and potentially to other data mining projects.A big data tem impulsionado o desenvolvimento e a pesquisa da ciência de dados nos últimos anos. No entanto, há um problema - a maioria dos projetos de ciência de dados não chega à produção. Isto pode acontecer porque muitos deles não usam uma metodologia de ciência de dados de referência. Outro elemento agravador são os próprios dados, a sua qualidade e o seu processamento. O problema pode ser mitigado através da documentação de estudos de caso, pesquisas e desenvolvimento da área, nomeadamente o reaproveitamento de conhecimento de outros campos maduros que exploram questões semelhantes, como data warehousing. Para resolver o problema, esta dissertação realiza um estudo de caso sobre o projeto “IA-SI - Inteligência Artificial na Gestão de Incentivos”, que visa melhorar a gestão dos fundos europeus de investimento através de data mining. As principais contribuições deste estudo, para a academia e para o desenvolvimento e sucesso do projeto são: (1) Um modelo de processo combinado dos modelos de processo de data mining mais usados e as suas tarefas, ampliado com os subsistemas de ETL e outras recomendadas práticas de data warehousing selecionadas. (2) Aplicação deste modelo de processo combinado ao projeto e toda a sua documentação. (3) Contribuição para a implementação do protótipo do projeto, relativamente a tarefas de compreensão e preparação de dados. Este estudo conclui que CRISP-DM ainda é uma referência, pois inclui todas as tarefas dos outros modelos de processos de data mining e descrições detalhadas e que a sua combinação com as melhores práticas de data warehousing é útil para o projeto IA-SI e potencialmente para outros projetos de data mining

    Personalized Biomedical Data Integration

    Get PDF
    corecore