62 research outputs found

    Implementing data-driven decision support system based on independent educational data mart

    Get PDF
    Decision makers in the educational field always seek new technologies and tools, which provide solid, fast answers that can support decision-making process. They need a platform that utilize the students’ academic data and turn them into knowledge to make the right strategic decisions. In this paper, a roadmap for implementing a data driven decision support system (DSS) is presented based on an educational data mart. The independent data mart is implemented on the students’ degrees in 8 subjects in a private school (Al-Iskandaria Primary School in Basrah province, Iraq). The DSS implementation roadmap is started from pre-processing paper-based data source and ended with providing three categories of online analytical processing (OLAP) queries (multidimensional OLAP, desktop OLAP and web OLAP). Key performance indicator (KPI) is implemented as an essential part of educational DSS to measure school performance. The static evaluation method shows that the proposed DSS follows the privacy, security and performance aspects with no errors after inspecting the DSS knowledge base. The evaluation shows that the data driven DSS based on independent data mart with KPI, OLAP is one of the best platforms to support short-to-long term academic decisions

    XLDM: an xlink-based multidimensional metamodel

    Get PDF
    The growth of data available on the Internet and the improvement of ways to handle them consist of an important issue while designing a data model. In this context, XML provides the necessary formalism to establish a standard to represent and exchange data. Since the technologies of data warehouse are often used for data analysis, it is necessary to define a cube model data to XML. However, data representation in XML may generate syntactic, semantic and structural heterogeneity problems on XML documents, which are not considered by related approaches. To solve these problems, it is required the definition of a data schema. This paper proposes a metamodel to specify XML document cubes, based on relationships between elements and XML documents. This approach solves the XML data heterogeneity problems by taking advantages of data schema definition and relationships defined by XLink. The methodology used provides formal rules to define the concepts proposed. Following this formalism is then instantiated using XML Schema and XLink. It also presents a case study in the medical field and a comparison with XBRL Dimensions and a financial and multidimensional data model which uses XLink

    EM-OLAP Framework - Econometric Model Transformation Method for OLAP Design in Intelligence Systems

    Get PDF
    Econometrics is currently one of the most popular approaches to economic analysis. To better support advances in these areas as much as possible, it is necessary to apply econometric problems to econometric intelligent systems. The article describes an econometric OLAP framework that supports the design of a multidimensional database to secure econometric analyses to increase the effectiveness of the development of econometric intelligent systems. The first part of the article consists of the creation of formal rules for the new transformation of the econometric model (TEM) method for the econometric model transformation of multidimensional schema through the use of mathematical notation. In the proposed TEM method, the authors pay attention to the measurement of quality and understandability of the multidimensional schema, and compare the proposed method with the original TEM-CM method. In the second part of the article, the authors create a multidimensional database prototype according to the new TEM method and design an OLAP application for econometric Analysis

    Business Intelligence on Non-Conventional Data

    Get PDF
    The revolution in digital communications witnessed over the last decade had a significant impact on the world of Business Intelligence (BI). In the big data era, the amount and diversity of data that can be collected and analyzed for the decision-making process transcends the restricted and structured set of internal data that BI systems are conventionally limited to. This thesis investigates the unique challenges imposed by three specific categories of non-conventional data: social data, linked data and schemaless data. Social data comprises the user-generated contents published through websites and social media, which can provide a fresh and timely perception about people’s tastes and opinions. In Social BI (SBI), the analysis focuses on topics, meant as specific concepts of interest within the subject area. In this context, this thesis proposes meta-star, an alternative strategy to the traditional star-schema for modeling hierarchies of topics to enable OLAP analyses. The thesis also presents an architectural framework of a real SBI project and a cross-disciplinary benchmark for SBI. Linked data employ the Resource Description Framework (RDF) to provide a public network of interlinked, structured, cross-domain knowledge. In this context, this thesis proposes an interactive and collaborative approach to build aggregation hierarchies from linked data. Schemaless data refers to the storage of data in NoSQL databases that do not force a predefined schema, but let database instances embed their own local schemata. In this context, this thesis proposes an approach to determine the schema profile of a document-based database; the goal is to facilitate users in a schema-on-read analysis process by understanding the rules that drove the usage of the different schemata. A final and complementary contribution of this thesis is an innovative technique in the field of recommendation systems to overcome user disorientation in the analysis of a large and heterogeneous wealth of data

    Development of new data partitioning and allocation algorithms for query optimization of distributed data warehouse systems

    Get PDF
    Distributed databases and in particular distributed data warehousing are becoming an increasingly important technology for information integration and data analysis. Data Warehouse (DW) systems are used by decision makers for performance measurement and decision support. However, although data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, the OLAP query response time is strongly affected by the volume of data need to be accessed from storage disks. Data partitioning is one of the physical design techniques that may be used to optimize query processing cost in DWs. It is a non redundant optimization technique because it does not replicate data, contrary to redundant techniques like materialized views and indexes. The warehouse partitioning problem is concerned with determining the set of dimension tables to be partitioned and using them to generate the fact table fragments. In this work an enhanced grouping algorithm that avoids the limitations of some existing vertical partitioning algorithms is proposed. Furthermore, a static partitioning algorithm that allows fragmentation at early stages of schema design is presented. The thesis also, investigates the performance of the data warehouse after implementing a combination of Genetic Algorithm (GA) and Simulated Annealing (SA) techniques to horizontally partition the data warehouse star schema. It, then presents the experimentation and implementation results of the proposed algorithm. This research presented different approaches to optimize data fragments allocation cost using a greedy mathematical model and a combination of simulated annealing and genetic algorithm to determine the site by site allocation leading to optimal solutions for fragments distribution. Throughout this thesis, the term fragmentation and partitioning will be used interchangeably

    Evaluación de la calidad de datos en un sistema de Data Warehousing : un enfoque basado en contextos

    Get PDF
    Los Sistemas de Data Warehousing son de gran relevancia para el apoyo en la toma de decisiones y el análisis de los datos. Esto ha quedado demostrado a lo largo del tiempo, a través de la generalización de su desarrollo y uso a nivel industrial en todo tipo de organizaciones y mediante la gran cantidad de trabajos científicos que se han centrado en el estudio de este tipo de sistemas. Muchos investigadores han presentado la necesidad de incorporar y mantener la calidad de los datos en los Sistemas de Data Warehousing. Sin embargo, en las investigaciones no se encuentra un consenso acerca de como hacerlo, ni acerca de si es posible definir un único conjunto de dimensiones de calidad en el entorno de un Data Warehouse, dado que dicho conjunto puede depender del propósito con el cual se utilizan los datos. Por otro lado, una vez que los datos están en el Data Warehouse surge otro desafío, como serán utilizados los mismos. Los requerimientos de calidad pueden variar entre los diferentes dominios y entre los diferentes usuarios, no solo por el propósito de la tarea que necesiten realizar, sino también porque la calidad percibida por un usuario puede diferir respecto a la calidad percibida por otro usuario. Dado que, los datos vienen de diversas fuentes con niveles de calidad distintos, los dominios de análisis pueden ser variados y los usuarios pueden percibir la calidad de distintas formas, dependiendo esto de múltiples factores (su perfil, la tarea que va a realizar, etc.). Para la evaluación de la Calidad de Datos en los Sistemas de Data Warehousing, se considera un enfoque basado en el Contexto de los datos. En este trabajo se ejecuta una metodología de búsqueda bibliográfica para obtener una visión general de la investigación existente acerca del uso de contextos en los Sistemas de Data Warehousing y/o en la evaluación de Calidad de Datos. A partir de los resultados obtenidos con la aplicación de dicha metodología, se obtiene una visión general del estado del arte, lo que permite realizar el primer planteo de una propuesta para evaluar la Calidad de Datos en los Sistemas de Data Warehousing, con un enfoque basado en Contextos. Este primer planteo, es el punto de partida de una investigación mas amplia y profunda que permita la gestión de la calidad en este tipo de Sistemas.Data Warehousing Systems are of great relevance for supporting decision making and data analysis. This has been proven over time, through the generalization of its development and use at industrial level in all kind of organizations. Moreover, the large number of scientic studies that have focused on the study of such systems have also proven the importance of them. Many researchers have presented the need to incorporate and maintain data quality in Data Warehousing Systems. However, there is no consensus in the research community on how or whether it is possible to define a single set of quality dimensions for Data Warehouse systems, due to the fact that this set of dimensions may depend on the purpose for which the data are used. On the other hand, once the data are in the Data Warehouse another challenge arises, how they will be used. Quality requirements may vary among different domains and among different users, not only due to the task they need to perform, but also because the quality perceived by a user may differ from the quality perceived by another user. Since data come from different sources with different levels of quality, analysis domains can vary and users can perceive the quality in different ways, depending on many factors (their profile, the task to be performed, etc.), for the evaluation of Data Quality in Data Warehousing Systems it is considered a data-context based approach. In this thesis a systematic literature review is executed to obtain an overview of existing research on the use of contexts in Data Warehousing Systems and/or on the evaluation of Data Quality in this kind of systems. From the results obtained with the application of this methodology, an overview of the state-of-the-art is performed, which allows to do the first proposal to assess data quality in Data Warehousing Systems with an approach based on Contexts. This first proposal is the starting point of a broader and deeper investigation that will allow quality management in Data Warehousing Systems

    Open archival information systems for database preservation

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Universidade do Porto. Faculdade de Engenharia. 201
    • …
    corecore