8 research outputs found

    The Mechanics of Enterprise Architecture Principles

    Get PDF
    Inspired by the city planning metaphor, enterprise architecture (EA) has gained considerable attention from academia and industry for systematically planning an IT landscape. Since EA is a relatively young discipline, a great deal of its work focuses on architecture representations (descriptive EA) that conceptualize the different architecture layers, their components, and relationships. Beside architecture representations, EA should comprise principles that guide architecture design and evolution toward predefined value and outcomes (prescriptive EA). However, research on EA principles is still very limited. Notwithstanding the increasing consensus regarding EA principles’ role and definition, the limited publications neither discuss what can be considered suitable principles, nor explain how they can be turned into effective means to achieve expected EA outcomes. This study seeks to strengthen EA’s extant theoretical core by investigating EA principles through a mixed methods research design comprising a literature review, an expert study, and three case studies. The first contribution of this study is that it sheds light on the ambiguous interpretation of EA principles in extant research by ontologically distinguishing between principles and nonprinciples, as well as deriving a set of suitable EA (meta-)principles. The second contribution connects the nascent academic discourse on EA principles to studies on EA value and outcomes. This study conceptualizes the “mechanics” of EA principles as a value-creation process, where EA principles shape the architecture design and guide its evolution and thereby realize EA outcomes. Consequently, this study brings EA’s underserved, prescriptive aspect to the fore and helps enrich its theoretical foundations

    An Evaluation Framework for Data Quality Tools

    Get PDF
    International audienceData Quality is a major stake for large organizations and software companies are proposing increasing numbers of tools focusing on these issues. The scope of these tools is moving from specific applications (deduplication, address normalization etc ...) to a more global perspective integrating all areas of data quality (profiling, rule-detection...). A framework is needed to help managers to choose this type of tool. In this article, we focus on tool-functionalities which aim to measure the quality of data(bases). We explain what one can expect of such functionalities in a CRM context, and we propose a general matrix which can be used for the evaluation and comparison of these tools

    Mining climate data for shire level wheat yield predictions in Western Australia

    Get PDF
    Climate change and the reduction of available agricultural land are two of the most important factors that affect global food production especially in terms of wheat stores. An ever increasing world population places a huge demand on these resources. Consequently, there is a dire need to optimise food production. Estimations of crop yield for the South West agricultural region of Western Australia have usually been based on statistical analyses by the Department of Agriculture and Food in Western Australia. Their estimations involve a system of crop planting recommendations and yield prediction tools based on crop variety trials. However, many crop failures arise from adherence to these crop recommendations by farmers that were contrary to the reported estimations. Consequently, the Department has sought to investigate new avenues for analyses that improve their estimations and recommendations. This thesis explores a new approach in the way analyses are carried out. This is done through the introduction of new methods of analyses such as data mining and online analytical processing in the strategy. Additionally, this research attempts to provide a better understanding of the effects of both gradual variation parameters such as soil type, and continuous variation parameters such as rainfall and temperature, on the wheat yields. The ultimate aim of the research is to enhance the prediction efficiency of wheat yields. The task was formidable due to the complex and dichotomous mixture of gradual and continuous variability data that required successive information transformations. It necessitated the progressive moulding of the data into useful information, practical knowledge and effective industry practices. Ultimately, this new direction is to improve the crop predictions and to thereby reduce crop failures. The research journey involved data exploration, grappling with the complexity of Geographic Information System (GIS), discovering and learning data compatible software tools, and forging an effective processing method through an iterative cycle of action research experimentation. A series of trials was conducted to determine the combined effects of rainfall and temperature variations on wheat crop yields. These experiments specifically related to the South Western Agricultural region of Western Australia. The study focused on wheat producing shires within the study area. The investigations involved a combination of macro and micro analyses techniques for visual data mining and data mining classification techniques, respectively. The research activities revealed that wheat yield was most dependent upon rainfall and temperature. In addition, it showed that rainfall cyclically affected the temperature and soil type due to the moisture retention of crop growing locations. Results from the regression analyses, showed that the statistical prediction of wheat yields from historical data, may be enhanced by data mining techniques including classification. The main contribution to knowledge as a consequence of this research was the provision of an alternate and supplementary method of wheat crop prediction within the study area. Another contribution was the division of the study area into a GIS surface grid of 100 hectare cells upon which the interpolated data was projected. Furthermore, the proposed framework within this thesis offers other researchers, with similarly structured complex data, the benefits of a general processing pathway to enable them to navigate their own investigations through variegated analytical exploration spaces. In addition, it offers insights and suggestions for future directions in other contextual research explorations

    A conceptual framework and a risk management approach for interoperability between geospatial datacubes

    Get PDF
    De nos jours, nous observons un intĂ©rĂȘt grandissant pour les bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ces bases de donnĂ©es sont dĂ©veloppĂ©es pour faciliter la prise de dĂ©cisions stratĂ©giques des organisations, et plus spĂ©cifiquement lorsqu’il s’agit de donnĂ©es de diffĂ©rentes Ă©poques et de diffĂ©rents niveaux de granularitĂ©. Cependant, les utilisateurs peuvent avoir besoin d’utiliser plusieurs bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ces bases de donnĂ©es peuvent ĂȘtre sĂ©mantiquement hĂ©tĂ©rogĂšnes et caractĂ©risĂ©es par diffĂ©rent degrĂ©s de pertinence par rapport au contexte d’utilisation. RĂ©soudre les problĂšmes sĂ©mantiques liĂ©s Ă  l’hĂ©tĂ©rogĂ©nĂ©itĂ© et Ă  la diffĂ©rence de pertinence d’une maniĂšre transparente aux utilisateurs a Ă©tĂ© l’objectif principal de l’interopĂ©rabilitĂ© au cours des quinze derniĂšres annĂ©es. Dans ce contexte, diffĂ©rentes solutions ont Ă©tĂ© proposĂ©es pour traiter l’interopĂ©rabilitĂ©. Cependant, ces solutions ont adoptĂ© une approche non systĂ©matique. De plus, aucune solution pour rĂ©soudre des problĂšmes sĂ©mantiques spĂ©cifiques liĂ©s Ă  l’interopĂ©rabilitĂ© entre les bases de donnĂ©es gĂ©ospatiales multidimensionnelles n’a Ă©tĂ© trouvĂ©e. Dans cette thĂšse, nous supposons qu’il est possible de dĂ©finir une approche qui traite ces problĂšmes sĂ©mantiques pour assurer l’interopĂ©rabilitĂ© entre les bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ainsi, nous dĂ©finissons tout d’abord l’interopĂ©rabilitĂ© entre ces bases de donnĂ©es. Ensuite, nous dĂ©finissons et classifions les problĂšmes d’hĂ©tĂ©rogĂ©nĂ©itĂ© sĂ©mantique qui peuvent se produire au cours d’une telle interopĂ©rabilitĂ© de diffĂ©rentes bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Afin de rĂ©soudre ces problĂšmes d’hĂ©tĂ©rogĂ©nĂ©itĂ© sĂ©mantique, nous proposons un cadre conceptuel qui se base sur la communication humaine. Dans ce cadre, une communication s’établit entre deux agents systĂšme reprĂ©sentant les bases de donnĂ©es gĂ©ospatiales multidimensionnelles impliquĂ©es dans un processus d’interopĂ©rabilitĂ©. Cette communication vise Ă  Ă©changer de l’information sur le contenu de ces bases. Ensuite, dans l’intention d’aider les agents Ă  prendre des dĂ©cisions appropriĂ©es au cours du processus d’interopĂ©rabilitĂ©, nous Ă©valuons un ensemble d’indicateurs de la qualitĂ© externe (fitness-for-use) des schĂ©mas et du contexte de production (ex., les mĂ©tadonnĂ©es). Finalement, nous mettons en Ɠuvre l’approche afin de montrer sa faisabilitĂ©.Today, we observe wide use of geospatial databases that are implemented in many forms (e.g., transactional centralized systems, distributed databases, multidimensional datacubes). Among those possibilities, the multidimensional datacube is more appropriate to support interactive analysis and to guide the organization’s strategic decisions, especially when different epochs and levels of information granularity are involved. However, one may need to use several geospatial multidimensional datacubes which may be semantically heterogeneous and having different degrees of appropriateness to the context of use. Overcoming the semantic problems related to the semantic heterogeneity and to the difference in the appropriateness to the context of use in a manner that is transparent to users has been the principal aim of interoperability for the last fifteen years. However, in spite of successful initiatives, today's solutions have evolved in a non systematic way. Moreover, no solution has been found to address specific semantic problems related to interoperability between geospatial datacubes. In this thesis, we suppose that it is possible to define an approach that addresses these semantic problems to support interoperability between geospatial datacubes. For that, we first describe interoperability between geospatial datacubes. Then, we define and categorize the semantic heterogeneity problems that may occur during the interoperability process of different geospatial datacubes. In order to resolve semantic heterogeneity between geospatial datacubes, we propose a conceptual framework that is essentially based on human communication. In this framework, software agents representing geospatial datacubes involved in the interoperability process communicate together. Such communication aims at exchanging information about the content of geospatial datacubes. Then, in order to help agents to make appropriate decisions during the interoperability process, we evaluate a set of indicators of the external quality (fitness-for-use) of geospatial datacube schemas and of production context (e.g., metadata). Finally, we implement the proposed approach to show its feasibility

    Deteção de problemas de qualidade nos dados

    Get PDF
    Na vida das organizaçÔes, Ă© muito frequente o acesso a grandes quantidades de dados, imprescindĂ­veis aos processos de tomada de decisĂŁo. Por vezes, os dados apresentam problemas de qualidade que afetam a qualidade das decisĂ”es. Nos armazĂ©ns de dados, durante a manipulação dos dados, para se criar formatos que facilitem os processos de tomada de decisĂŁo, podem ser identificados mais problemas de qualidades nos dados (PQD). Existem diversas abordagens para detetar e corrigir PQD. Nestas abordagens pretende-se classificar os diferentes tipos de PQD que possam ocorrer nos dados e indicar caminhos possĂ­veis de deteção e correção dos PQD. Existem algumas ferramentas que se baseiam em abordagens de deteção e correção de PQD existentes. Estas ferramentas detetam e corrigem PQD, no entanto, normalmente servem para detetar e corrigir PQD especĂ­ficos e tĂȘm custos de aquisição considerĂĄveis. O processo de deteção e correção pode ser complexo e Ă© moroso. Normalmente estes processos de deteção e correção estĂŁo encadeados, e podem assumir nomes diferentes como “limpeza de dados” e “eliminação de dados sujos”. Foram definidos objetivos para desenvolver uma solução de deteção de PQD, freeware, com bom desempenho e de fĂĄcil utilização. Fez-se levantamento do estado da arte, apresentando conceitos importantes para a compreensĂŁo do tema da dissertação. Estudaram-se diferentes tecnologias uteis no desenvolvimento da solução. Foi desenvolvida uma solução de deteção de PQD, robusta, de fĂĄcil utilização que permite desenhar um workflow com uma sequĂȘncia de operaçÔes de PQD escolhida pelo utilizador. O utilizador pode guardar o workflow para execução posterior com o mesmo ou com outras fontes de dados. A solução contĂ©m uma estrutura facilmente expansĂ­vel para detetar novos tipos de PQD e com novos motores e algoritmos de deteção. A avaliação da solução revela que a solução disponibiliza uma interface grĂĄfica facilitadora do processo de desenho de workflow e configuração das operaçÔes de PQD. A solução apresenta um bom desempenho, utilizando programação de pipelines Java com streams paralelas.In organizations life, access to large amounts of data, which are essential to decision-making processes, is very common. Sometimes the data has quality problems that affect the quality of decisions. In data warehouses, during data manipulation, to create formats that facilitate decision-making processes, more problems of quality in the data (PQD) may arise. There are several approaches to detect and correct PQD. In these approaches it is intended to classify the different types of PQD that may occur in the data, and to indicate possible ways of detection and correction of PQD. There are some tools, which are based on existing PQD detection and correction approaches. These tools, detect and correct PQD, however they usually serve to detect and correct specific PQD, and have considerable acquisition costs. The PQD detection and correction process can be complex and time consuming. Usually, these detection and correction processes are linked, and may take different names such as “data cleaning” and “dirty data elimination”. The goals to develop a PQD detection solution were defined. Development of a solution freeware, with good performance and easy to use. A state-of-the-art survey was carried out, presenting important concepts for the understanding the dissertation theme. Different useful technologies were studied in the development of the solution. A robust, easy-to-use PQD detection solution was developed that allows designing a workflow with a sequence of PQD operations chosen by the user. The user can save the workflow for later execution with the same or with other data sources. The solution contains an easily expandable structure to detect new types of PQD, with new detection engines and algorithms. The evaluation of the solution reveals that the solution provides a graphical interface that facilitates designing the workflow and PQD operations configuration. The solution presents a good performance, using Java pipelines programming with parallel streams

    MediciĂłn de la calidad de datos :un enfoque parametrizable

    Get PDF
    En este momento, las bases de datos constituyen uno de los principales activos de las empresas. Los problemasde calidad de datos inducen a errores o falta de precisión en el anålisis de los mismos, lo cual puede derivar en un alto costo para la empresa. En tal sentido, en esta tesis nos enfocamos en el estudio de mecanismos de medición de la calidad de los datos. Presentamos un estado del arte sobre medición de algunas dimensiones de calidad y experimentamos en una aplicación real de un årea de negocio financiera, con el dominio de aplicación CRM, en un esquema de replicación de bases de datos. Para medir la calidad ponemos en pråctica una metodología en la que las métricas de calidad se obtienen refinando las metas de calidad de la organización. Como resultado obtuvimos una biblioteca de métodos de medición de la calidad y una base de datos con las medidas tomadas para la aplicación financiera. Los métodos propuestos son parametrizables y extensibles, pudiendo ser utilizados en diferentes aplicaciones. Nuestro enfoque puede ser utilizado en las empresas con diferentes objetivos: estadísticas, particionamiento de las tablas de acuerdo a su calidad, mejoras en la explotación de la información, tareas de data-cleaning, entre otros

    A framework for quality evaluation in data integration systems

    No full text
    International audienceEnsuring and maximizing the quality and integrity of information is a crucial process for today enterprise information systems (EIS). It requires a clear understanding of the interdependencies between the dimensions characterizing quality of data (QoD), quality of conceptual data model (QoM) of the database, keystone of the EIS, and quality of data management and integration processes (QoP). The improvement of one quality dimension (such as data accuracy or model expressiveness) may have negative consequences on other quality dimensions (e.g., freshness or completeness of data). In this paper we briefly present a framework, called QUADRIS, relevant for adopting a quality improvement strategy on one or many dimensions of QoD or QoM with considering the collateral effects on the other interdependent quality dimensions. We also present the scenarios of our ongoing validations on a CRM EIS

    A framework for quality evaluation in data integration systems

    No full text
    International audienceEnsuring and maximizing the quality and integrity of information is a crucial process for today enterprise information systems (EIS). It requires a clear understanding of the interdependencies between the dimensions characterizing quality of data (QoD), quality of conceptual data model (QoM) of the database, keystone of the EIS, and quality of data management and integration processes (QoP). The improvement of one quality dimension (such as data accuracy or model expressiveness) may have negative consequences on other quality dimensions (e.g., freshness or completeness of data). In this paper we briefly present a framework, called QUADRIS, relevant for adopting a quality improvement strategy on one or many dimensions of QoD or QoM with considering the collateral effects on the other interdependent quality dimensions. We also present the scenarios of our ongoing validations on a CRM EIS
    corecore