83 research outputs found

    Data interoperability across borders : a case study of the Abbotsford-Sumas aquifer (British Columbia-Washington State)

    Get PDF
    The ability to integrate data from multiple sources is central to geographic information science (GIS).  Although data integration is an active field of research in the GIS community, a number of challenges remain unresolved. Interoperability research addressing data integration challenges experienced by institutions in an international setting also remains sparse. Groundwater is an example of an environmental phenomenon which does not respect political borders, and its management requires data from multiple jurisdictions. The Abbotsford-Sumas aquifer, straddling the Canada US border, is used as a case study to explore integration challenges in an international setting. Development of groundwater management practices to ensure a sustained source of good quality groundwater is dependent, on an understanding of the conceptual model of the aquifer. Due to a lack of geophysical studies, geological information contained in the water well reports, is the chief source of depth-specific lithological information. The use of this information in constructing the conceptual model is constrained by poor data quality and a lack of an integrated and standardized lithological database. To achieve the research goals of exploring integration challenges in an international setting, lithological datasets from BC and Washington State are integrated. The resultant lithological database is used to test the usability of water well reports for constructing the conceptual model. Numerous interoperability challenges such as data availability, lack of metadata, data quality and formats, database structure, semantics, policies and cooperation are identified as inhibitors of data integration. Despite the numerous challenges the lithological database is useful in constructing a generalized conceptual model. This research is important as it presents challenges to data integration that should be considered as a starting point for environmental management projects

    Metadata-driven data integration

    Get PDF
    Cotutela: Universitat Politècnica de Catalunya i Université Libre de Bruxelles, IT4BI-DC programme for the joint Ph.D. degree in computer science.Data has an undoubtable impact on society. Storing and processing large amounts of available data is currently one of the key success factors for an organization. Nonetheless, we are recently witnessing a change represented by huge and heterogeneous amounts of data. Indeed, 90% of the data in the world has been generated in the last two years. Thus, in order to carry on these data exploitation tasks, organizations must first perform data integration combining data from multiple sources to yield a unified view over them. Yet, the integration of massive and heterogeneous amounts of data requires revisiting the traditional integration assumptions to cope with the new requirements posed by such data-intensive settings. This PhD thesis aims to provide a novel framework for data integration in the context of data-intensive ecosystems, which entails dealing with vast amounts of heterogeneous data, from multiple sources and in their original format. To this end, we advocate for an integration process consisting of sequential activities governed by a semantic layer, implemented via a shared repository of metadata. From an stewardship perspective, this activities are the deployment of a data integration architecture, followed by the population of such shared metadata. From a data consumption perspective, the activities are virtual and materialized data integration, the former an exploratory task and the latter a consolidation one. Following the proposed framework, we focus on providing contributions to each of the four activities. We begin proposing a software reference architecture for semantic-aware data-intensive systems. Such architecture serves as a blueprint to deploy a stack of systems, its core being the metadata repository. Next, we propose a graph-based metadata model as formalism for metadata management. We focus on supporting schema and data source evolution, a predominant factor on the heterogeneous sources at hand. For virtual integration, we propose query rewriting algorithms that rely on the previously proposed metadata model. We additionally consider semantic heterogeneities in the data sources, which the proposed algorithms are capable of automatically resolving. Finally, the thesis focuses on the materialized integration activity, and to this end, proposes a method to select intermediate results to materialize in data-intensive flows. Overall, the results of this thesis serve as contribution to the field of data integration in contemporary data-intensive ecosystems.Les dades tenen un impacte indubtable en la societat. La capacitat d’emmagatzemar i processar grans quantitats de dades disponibles és avui en dia un dels factors claus per l’èxit d’una organització. No obstant, avui en dia estem presenciant un canvi representat per grans volums de dades heterogenis. En efecte, el 90% de les dades mundials han sigut generades en els últims dos anys. Per tal de dur a terme aquestes tasques d’explotació de dades, les organitzacions primer han de realitzar una integració de les dades, combinantles a partir de diferents fonts amb l’objectiu de tenir-ne una vista unificada d’elles. Per això, aquest fet requereix reconsiderar les assumpcions tradicionals en integració amb l’objectiu de lidiar amb els requisits imposats per aquests sistemes de tractament massiu de dades. Aquesta tesi doctoral té com a objectiu proporcional un nou marc de treball per a la integració de dades en el context de sistemes de tractament massiu de dades, el qual implica lidiar amb una gran quantitat de dades heterogènies, provinents de múltiples fonts i en el seu format original. Per això, proposem un procés d’integració compost d’una seqüència d’activitats governades per una capa semàntica, la qual és implementada a partir d’un repositori de metadades compartides. Des d’una perspectiva d’administració, aquestes activitats són el desplegament d’una arquitectura d’integració de dades, seguit per la inserció d’aquestes metadades compartides. Des d’una perspectiva de consum de dades, les activitats són la integració virtual i materialització de les dades, la primera sent una tasca exploratòria i la segona una de consolidació. Seguint el marc de treball proposat, ens centrem en proporcionar contribucions a cada una de les quatre activitats. La tesi inicia proposant una arquitectura de referència de software per a sistemes de tractament massiu de dades amb coneixement semàntic. Aquesta arquitectura serveix com a planell per a desplegar un conjunt de sistemes, sent el repositori de metadades al seu nucli. Posteriorment, proposem un model basat en grafs per a la gestió de metadades. Concretament, ens centrem en donar suport a l’evolució d’esquemes i fonts de dades, un dels factors predominants en les fonts de dades heterogènies considerades. Per a l’integració virtual, proposem algorismes de rescriptura de consultes que usen el model de metadades previament proposat. Com a afegitó, considerem heterogeneïtat semàntica en les fonts de dades, les quals els algorismes de rescriptura poden resoldre automàticament. Finalment, la tesi es centra en l’activitat d’integració materialitzada. Per això proposa un mètode per a seleccionar els resultats intermedis a materialitzar un fluxes de tractament intensiu de dades. En general, els resultats d’aquesta tesi serveixen com a contribució al camp d’integració de dades en els ecosistemes de tractament massiu de dades contemporanisLes données ont un impact indéniable sur la société. Le stockage et le traitement de grandes quantités de données disponibles constituent actuellement l’un des facteurs clés de succès d’une entreprise. Néanmoins, nous assistons récemment à un changement représenté par des quantités de données massives et hétérogènes. En effet, 90% des données dans le monde ont été générées au cours des deux dernières années. Ainsi, pour mener à bien ces tâches d’exploitation des données, les organisations doivent d’abord réaliser une intégration des données en combinant des données provenant de sources multiples pour obtenir une vue unifiée de ces dernières. Cependant, l’intégration de quantités de données massives et hétérogènes nécessite de revoir les hypothèses d’intégration traditionnelles afin de faire face aux nouvelles exigences posées par les systèmes de gestion de données massives. Cette thèse de doctorat a pour objectif de fournir un nouveau cadre pour l’intégration de données dans le contexte d’écosystèmes à forte intensité de données, ce qui implique de traiter de grandes quantités de données hétérogènes, provenant de sources multiples et dans leur format d’origine. À cette fin, nous préconisons un processus d’intégration constitué d’activités séquentielles régies par une couche sémantique, mise en oeuvre via un dépôt partagé de métadonnées. Du point de vue de la gestion, ces activités consistent à déployer une architecture d’intégration de données, suivies de la population de métadonnées partagées. Du point de vue de la consommation de données, les activités sont l’intégration de données virtuelle et matérialisée, la première étant une tâche exploratoire et la seconde, une tâche de consolidation. Conformément au cadre proposé, nous nous attachons à fournir des contributions à chacune des quatre activités. Nous commençons par proposer une architecture logicielle de référence pour les systèmes de gestion de données massives et à connaissance sémantique. Une telle architecture consiste en un schéma directeur pour le déploiement d’une pile de systèmes, le dépôt de métadonnées étant son composant principal. Ensuite, nous proposons un modèle de métadonnées basé sur des graphes comme formalisme pour la gestion des métadonnées. Nous mettons l’accent sur la prise en charge de l’évolution des schémas et des sources de données, facteur prédominant des sources hétérogènes sous-jacentes. Pour l’intégration virtuelle, nous proposons des algorithmes de réécriture de requêtes qui s’appuient sur le modèle de métadonnées proposé précédemment. Nous considérons en outre les hétérogénéités sémantiques dans les sources de données, que les algorithmes proposés sont capables de résoudre automatiquement. Enfin, la thèse se concentre sur l’activité d’intégration matérialisée et propose à cette fin une méthode de sélection de résultats intermédiaires à matérialiser dans des flux des données massives. Dans l’ensemble, les résultats de cette thèse constituent une contribution au domaine de l’intégration des données dans les écosystèmes contemporains de gestion de données massivesPostprint (published version

    Role-based Data Management

    Get PDF
    Database systems build an integral component of today’s software systems and as such they are the central point for storing and sharing a software system’s data while ensuring global data consistency at the same time. Introducing the primitives of roles and their accompanied metatype distinction in modeling and programming languages, results in a novel paradigm of designing, extending, and programming modern software systems. In detail, roles as modeling concept enable a separation of concerns within an entity. Along with its rigid core, an entity may acquire various roles in different contexts during its lifetime and thus, adapts its behavior and structure dynamically during runtime. Unfortunately, database systems, as important component and global consistency provider of such systems, do not keep pace with this trend. The absence of a metatype distinction, in terms of an entity’s separation of concerns, in the database system results in various problems for the software system in general, for the application developers, and finally for the database system itself. In case of relational database systems, these problems are concentrated under the term role-relational impedance mismatch. In particular, the whole software system is designed by using different semantics on various layers. In case of role-based software systems in combination with relational database systems this gap in semantics between applications and the database system increases dramatically. Consequently, the database system cannot directly represent the richer semantics of roles as well as the accompanied consistency constraints. These constraints have to be ensured by the applications and the database system loses its single point of truth characteristic in the software system. As the applications are in charge of guaranteeing global consistency, their development requires more effort in data management. Moreover, the software system’s data management is distributed over several layers, which results in an unstructured software system architecture. To overcome the role-relational impedance mismatch and bring the database system back in its rightful position as single point of truth in a software system, this thesis introduces the novel and tripartite RSQL approach. It combines a novel database model that represents the metatype distinction as first class citizen in a database system, an adapted query language on the database model’s basis, and finally a proper result representation. Precisely, RSQL’s logical database model introduces Dynamic Data Types, to directly represent the separation of concerns within an entity type on the schema level. On the instance level, the database model defines the notion of a Dynamic Tuple that combines an entity with the notion of roles and thus, allows for dynamic structure adaptations during runtime without changing an entity’s overall type. These definitions build the main data structures on which the database system operates. Moreover, formal operators connecting the query language statements with the database model data structures, complete the database model. The query language, as external database system interface, features an individual data definition, data manipulation, and data query language. Their statements directly represent the metatype distinction to address Dynamic Data Types and Dynamic Tuples, respectively. As a consequence of the novel data structures, the query processing of Dynamic Tuples is completely redesigned. As last piece for a complete database integration of a role-based notion and its accompanied metatype distinction, we specify the RSQL Result Net as result representation. It provides a novel result structure and features functionalities to navigate through query results. Finally, we evaluate all three RSQL components in comparison to a relational database system. This assessment clearly demonstrates the benefits of the roles concept’s full database integration

    Integration of biological data: systems, infrastructures and programmable tools

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid. Escuela Politécnica Superior, Departamento de Ingeniería informática. Fecha de lectura: 19-05-200

    Designing Data Spaces

    Get PDF
    This open access book provides a comprehensive view on data ecosystems and platform economics from methodical and technological foundations up to reports from practical implementations and applications in various industries. To this end, the book is structured in four parts: Part I “Foundations and Contexts” provides a general overview about building, running, and governing data spaces and an introduction to the IDS and GAIA-X projects. Part II “Data Space Technologies” subsequently details various implementation aspects of IDS and GAIA-X, including eg data usage control, the usage of blockchain technologies, or semantic data integration and interoperability. Next, Part III describes various “Use Cases and Data Ecosystems” from various application areas such as agriculture, healthcare, industry, energy, and mobility. Part IV eventually offers an overview of several “Solutions and Applications”, eg including products and experiences from companies like Google, SAP, Huawei, T-Systems, Innopay and many more. Overall, the book provides professionals in industry with an encompassing overview of the technological and economic aspects of data spaces, based on the International Data Spaces and Gaia-X initiatives. It presents implementations and business cases and gives an outlook to future developments. In doing so, it aims at proliferating the vision of a social data market economy based on data spaces which embrace trust and data sovereignty

    A Development Method for the Conceptual Design of Multi-View Modeling Tools with an Emphasis on Consistency Requirements

    Get PDF
    The main objective of this thesis is to bridge the gap between modeling method experts on the one side and tool developers on the other. More precisely, the focus is on the specification of requirements for multi-view modeling tools. In this regard, the thesis introduces a methodological approach that supports the specification of conceptual designs for multi-view modeling tools in a stepwise manner: the MuVieMoT approach. MuVieMoT utilizes generic multi-view modeling concepts and the model-driven engineering paradigm to establish an overarching specification of multi-view modeling tools with an emphasis on consistency requirements. The approach builds on and extends the theoretical foundation of metamodeling and multi-view modeling: generic multi-view modeling concepts, integrated multi-view modeling approaches, and possibilities for formalized modeling method specifications. Applicability and utility of MuVieMoT are evaluated using an illustrative scenario, therefore specifying a conceptual design for a multi-view modeling tool for the Semantic Object Model enterprise modeling method. The thesis moreover introduces the MuVieMoT modeling environment, enabling the efficient application of the approach as well as the model-driven development of initial multi-view modeling tools based on the conceptual models created with MuVieMoT. Consequently, the approach fosters an intersubjective and unambiguous understanding of the tool requirements between method experts and tool developers

    An approach to enacting business process models in support of the life cycle of integrated manufacturing systems

    Get PDF
    The complexity of enterprise engineering processes requires the application of reference architectures as means of guiding the achievement of an adequate level of business integration. This research aims to address important aspects of this requirement by associating the formalism of reference architectures to various life cycle phases of integrating manufacturing systems (IMS) and enabling their use in addressing contemporary system engineering issues. In pursuit of this aim, the following research activities were carried out: (1) to devise a framework which supports key phases of the IMS life cycle and (2) to populate part of this framework with an initial combination of architectures which can be encapsulated into a computer-aided systems engineering environment. This has led to the creation of a workbench capable of providing support for modelling, analysis, simulation, rapid-prototyping, configuration and run-time operation of an IMS, based on a consistent set of models associated with the engineering processes involved. The research effort concentrated on selecting and investigating the use of appropriate formalisms which underpin a selection of architectures and tools (i. e. CIM-OSA, Petrinets, object-oriented methods and CIM-BIOSYS), this by designing, implementing, applying and testing the workbench. The main contribution of this research is to demonstrate that it is possible to retain an adequate level of formalism, via computational structures and models, which extend through the IMS life cycle from a conceptual description of the system through to actions that the system performs when operating. The underlying methodology which supported this contribution is based on enacting models of system behaviour which encode important coordination aspects of manufacturing systems. The strategy for demonstrating the incorporation of formalism to the IMS life cycle was to enable the aggregation into a workbench of knowledge of 'what' the system is expected to achieve (i. e. 'problems' to be addressed) and 'how' the system can achieve it (i. e possible 'solutions'). Within the workbench, such a knowledge is represented through an amalgamation of business process modelling and object-oriented modelling approaches which, when adequately manipulated, can lead to business integration

    Head-Driven Phrase Structure Grammar

    Get PDF
    Head-Driven Phrase Structure Grammar (HPSG) is a constraint-based or declarative approach to linguistic knowledge, which analyses all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) with feature value pairs, structure sharing, and relational constraints. In syntax it assumes that expressions have a single relatively simple constituent structure. This volume provides a state-of-the-art introduction to the framework. Various chapters discuss basic assumptions and formal foundations, describe the evolution of the framework, and go into the details of the main syntactic phenomena. Further chapters are devoted to non-syntactic levels of description. The book also considers related fields and research areas (gesture, sign languages, computational linguistics) and includes chapters comparing HPSG with other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar, and Minimalism)

    Suspect Until Proven Guilty, a Problematization of State Dossier Systems via Two Case Studies: The United States and China

    Get PDF
    This dissertation problematizes the state dossier system (SDS): the production and accumulation of personal information on citizen subjects exceeding the reasonable bounds of risk management. SDS - comprising interconnecting subsystems of records and identification - damage individual autonomy and self-determination, impacting not only human rights, but also the viability of the social system. The research, a hybrid of case-study and cross-national comparison, was guided in part by a theoretical model of four primary SDS driving forces: technology, political economy, law and public sentiment. Data sources included government documents, academic texts, investigative journalism, NGO reports and industry white papers. The primary analytical instrument was the juxtaposition of two individual cases: the U.S. and China. Research found that constraints on the extent of the U.S. SDS today may not be significantly different from China\u27s, a system undergoing significant change amidst growing public interest in privacy and anonymity. Much activity within the U.S., such as the practice of suspicious activity reporting, is taking place outside the domain of federal privacy laws, while ID systems appear to advance and expand despite clear public opposition. Momentum for increasingly comprehensive SDS appears to be growing, in part because the harms may not be immediately evident to the data subjects. The future of SDS globally will depend on an informed and active public; law and policy will need to adjust to better regulate the production and storage of personal information. To that end, the dissertation offers a general model and linguistic toolkit for the further analysis of SDS

    Head-Driven Phrase Structure Grammar

    Get PDF
    Head-Driven Phrase Structure Grammar (HPSG) is a constraint-based or declarative approach to linguistic knowledge, which analyses all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) with feature value pairs, structure sharing, and relational constraints. In syntax it assumes that expressions have a single relatively simple constituent structure. This volume provides a state-of-the-art introduction to the framework. Various chapters discuss basic assumptions and formal foundations, describe the evolution of the framework, and go into the details of the main syntactic phenomena. Further chapters are devoted to non-syntactic levels of description. The book also considers related fields and research areas (gesture, sign languages, computational linguistics) and includes chapters comparing HPSG with other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar, and Minimalism)
    corecore