74 research outputs found

    Metadata-driven data integration

    Get PDF
    Cotutela: Universitat Politècnica de Catalunya i Université Libre de Bruxelles, IT4BI-DC programme for the joint Ph.D. degree in computer science.Data has an undoubtable impact on society. Storing and processing large amounts of available data is currently one of the key success factors for an organization. Nonetheless, we are recently witnessing a change represented by huge and heterogeneous amounts of data. Indeed, 90% of the data in the world has been generated in the last two years. Thus, in order to carry on these data exploitation tasks, organizations must first perform data integration combining data from multiple sources to yield a unified view over them. Yet, the integration of massive and heterogeneous amounts of data requires revisiting the traditional integration assumptions to cope with the new requirements posed by such data-intensive settings. This PhD thesis aims to provide a novel framework for data integration in the context of data-intensive ecosystems, which entails dealing with vast amounts of heterogeneous data, from multiple sources and in their original format. To this end, we advocate for an integration process consisting of sequential activities governed by a semantic layer, implemented via a shared repository of metadata. From an stewardship perspective, this activities are the deployment of a data integration architecture, followed by the population of such shared metadata. From a data consumption perspective, the activities are virtual and materialized data integration, the former an exploratory task and the latter a consolidation one. Following the proposed framework, we focus on providing contributions to each of the four activities. We begin proposing a software reference architecture for semantic-aware data-intensive systems. Such architecture serves as a blueprint to deploy a stack of systems, its core being the metadata repository. Next, we propose a graph-based metadata model as formalism for metadata management. We focus on supporting schema and data source evolution, a predominant factor on the heterogeneous sources at hand. For virtual integration, we propose query rewriting algorithms that rely on the previously proposed metadata model. We additionally consider semantic heterogeneities in the data sources, which the proposed algorithms are capable of automatically resolving. Finally, the thesis focuses on the materialized integration activity, and to this end, proposes a method to select intermediate results to materialize in data-intensive flows. Overall, the results of this thesis serve as contribution to the field of data integration in contemporary data-intensive ecosystems.Les dades tenen un impacte indubtable en la societat. La capacitat d’emmagatzemar i processar grans quantitats de dades disponibles és avui en dia un dels factors claus per l’èxit d’una organització. No obstant, avui en dia estem presenciant un canvi representat per grans volums de dades heterogenis. En efecte, el 90% de les dades mundials han sigut generades en els últims dos anys. Per tal de dur a terme aquestes tasques d’explotació de dades, les organitzacions primer han de realitzar una integració de les dades, combinantles a partir de diferents fonts amb l’objectiu de tenir-ne una vista unificada d’elles. Per això, aquest fet requereix reconsiderar les assumpcions tradicionals en integració amb l’objectiu de lidiar amb els requisits imposats per aquests sistemes de tractament massiu de dades. Aquesta tesi doctoral té com a objectiu proporcional un nou marc de treball per a la integració de dades en el context de sistemes de tractament massiu de dades, el qual implica lidiar amb una gran quantitat de dades heterogènies, provinents de múltiples fonts i en el seu format original. Per això, proposem un procés d’integració compost d’una seqüència d’activitats governades per una capa semàntica, la qual és implementada a partir d’un repositori de metadades compartides. Des d’una perspectiva d’administració, aquestes activitats són el desplegament d’una arquitectura d’integració de dades, seguit per la inserció d’aquestes metadades compartides. Des d’una perspectiva de consum de dades, les activitats són la integració virtual i materialització de les dades, la primera sent una tasca exploratòria i la segona una de consolidació. Seguint el marc de treball proposat, ens centrem en proporcionar contribucions a cada una de les quatre activitats. La tesi inicia proposant una arquitectura de referència de software per a sistemes de tractament massiu de dades amb coneixement semàntic. Aquesta arquitectura serveix com a planell per a desplegar un conjunt de sistemes, sent el repositori de metadades al seu nucli. Posteriorment, proposem un model basat en grafs per a la gestió de metadades. Concretament, ens centrem en donar suport a l’evolució d’esquemes i fonts de dades, un dels factors predominants en les fonts de dades heterogènies considerades. Per a l’integració virtual, proposem algorismes de rescriptura de consultes que usen el model de metadades previament proposat. Com a afegitó, considerem heterogeneïtat semàntica en les fonts de dades, les quals els algorismes de rescriptura poden resoldre automàticament. Finalment, la tesi es centra en l’activitat d’integració materialitzada. Per això proposa un mètode per a seleccionar els resultats intermedis a materialitzar un fluxes de tractament intensiu de dades. En general, els resultats d’aquesta tesi serveixen com a contribució al camp d’integració de dades en els ecosistemes de tractament massiu de dades contemporanisLes données ont un impact indéniable sur la société. Le stockage et le traitement de grandes quantités de données disponibles constituent actuellement l’un des facteurs clés de succès d’une entreprise. Néanmoins, nous assistons récemment à un changement représenté par des quantités de données massives et hétérogènes. En effet, 90% des données dans le monde ont été générées au cours des deux dernières années. Ainsi, pour mener à bien ces tâches d’exploitation des données, les organisations doivent d’abord réaliser une intégration des données en combinant des données provenant de sources multiples pour obtenir une vue unifiée de ces dernières. Cependant, l’intégration de quantités de données massives et hétérogènes nécessite de revoir les hypothèses d’intégration traditionnelles afin de faire face aux nouvelles exigences posées par les systèmes de gestion de données massives. Cette thèse de doctorat a pour objectif de fournir un nouveau cadre pour l’intégration de données dans le contexte d’écosystèmes à forte intensité de données, ce qui implique de traiter de grandes quantités de données hétérogènes, provenant de sources multiples et dans leur format d’origine. À cette fin, nous préconisons un processus d’intégration constitué d’activités séquentielles régies par une couche sémantique, mise en oeuvre via un dépôt partagé de métadonnées. Du point de vue de la gestion, ces activités consistent à déployer une architecture d’intégration de données, suivies de la population de métadonnées partagées. Du point de vue de la consommation de données, les activités sont l’intégration de données virtuelle et matérialisée, la première étant une tâche exploratoire et la seconde, une tâche de consolidation. Conformément au cadre proposé, nous nous attachons à fournir des contributions à chacune des quatre activités. Nous commençons par proposer une architecture logicielle de référence pour les systèmes de gestion de données massives et à connaissance sémantique. Une telle architecture consiste en un schéma directeur pour le déploiement d’une pile de systèmes, le dépôt de métadonnées étant son composant principal. Ensuite, nous proposons un modèle de métadonnées basé sur des graphes comme formalisme pour la gestion des métadonnées. Nous mettons l’accent sur la prise en charge de l’évolution des schémas et des sources de données, facteur prédominant des sources hétérogènes sous-jacentes. Pour l’intégration virtuelle, nous proposons des algorithmes de réécriture de requêtes qui s’appuient sur le modèle de métadonnées proposé précédemment. Nous considérons en outre les hétérogénéités sémantiques dans les sources de données, que les algorithmes proposés sont capables de résoudre automatiquement. Enfin, la thèse se concentre sur l’activité d’intégration matérialisée et propose à cette fin une méthode de sélection de résultats intermédiaires à matérialiser dans des flux des données massives. Dans l’ensemble, les résultats de cette thèse constituent une contribution au domaine de l’intégration des données dans les écosystèmes contemporains de gestion de données massivesPostprint (published version

    Weiterentwicklung analytischer Datenbanksysteme

    Get PDF
    This thesis contributes to the state of the art in analytical database systems. First, we identify and explore extensions to better support analytics on event streams. Second, we propose a novel polygon index to enable efficient geospatial data processing in main memory. Third, we contribute a new deep learning approach to cardinality estimation, which is the core problem in cost-based query optimization.Diese Arbeit trägt zum aktuellen Forschungsstand von analytischen Datenbanksystemen bei. Wir identifizieren und explorieren Erweiterungen um Analysen auf Eventströmen besser zu unterstützen. Wir stellen eine neue Indexstruktur für Polygone vor, die eine effiziente Verarbeitung von Geodaten im Hauptspeicher ermöglicht. Zudem präsentieren wir einen neuen Ansatz für Kardinalitätsschätzungen mittels maschinellen Lernens

    Efficient Incremental Data Analysis

    Get PDF
    Many data-intensive applications require real-time analytics over streaming data. In a growing number of domains -- sensor network monitoring, social web applications, clickstream analysis, high-frequency algorithmic trading, and fraud detections to name a few -- applications continuously monitor stream events to promptly react to certain data conditions. These applications demand responsive analytics even when faced with high volume and velocity of incoming changes, large numbers of users, and complex processing requirements. Developing suitable online analytics engine that meets these requirements is challenging. In this thesis, we study techniques for efficient online processing of complex analytical queries, ranging from standard database queries to complex machine learning and digital signal processing workflows. First, we focus on the problem of efficient incremental computation for database queries. We have developed a system, called DBToaster, that compiles declarative queries into high-performance stream processing engines that keep query results (views) fresh at very high update rates. At the heart of our system is a recursive query compilation algorithm that materializes a set of supporting higher-order delta views to achieve a substantially lower view maintenance cost. We study the trade-offs between single-tuple and batch incremental processing in local execution, and we present a novel approach for compiling view maintenance code into data-parallel programs optimized for distributed execution. DBToaster supports millions of complete view refreshes per second for a broad range of queries and outperforms commercial database and stream engines by orders of magnitude. We also study the incremental computation for queries written as iterative linear algebra, which can capture many machine learning and scientific calculations. We have developed a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost. Linear algebra operations tend to cause an avalanche effect where even very local changes to the input matrices spread out and infect all of the intermediate results and the final view, causing incremental view maintenance to lose its performance benefit over re-evaluation. We develop techniques based on matrix factorizations to contain such epidemics of change and make incremental view maintenance of linear algebra practical and usually substantially cheaper than re-evaluation. We show, both analytically and experimentally, the usefulness of these techniques when applied to standard analytics tasks. Our last research question concerns the integration of general-purpose query processors and domain-specific operations to enable deep data exploration in both online and offline analysis. We advocate a deep integration of signal processing operations and general-purpose query processors. We demonstrate that in-situ processing of tempo-relational and signal data through a unified query language empowers users to express end-to-end workflows more succinctly inside one system while at the same time offering orders of magnitude better performance than existing popular data management systems

    Monitoring Network Data Streams

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Flexible Integration and Efficient Analysis of Multidimensional Datasets from the Web

    Get PDF
    If numeric data from the Web are brought together, natural scientists can compare climate measurements with estimations, financial analysts can evaluate companies based on balance sheets and daily stock market values, and citizens can explore the GDP per capita from several data sources. However, heterogeneities and size of data remain a problem. This work presents methods to query a uniform view - the Global Cube - of available datasets from the Web and builds on Linked Data query approaches

    An integrated approach to deliver OLAP for multidimensional Semantic Web Databases

    Get PDF
    Semantic Webs (SW) and web data have become increasingly important sources to support Business Intelligence (BI), but they are difficult to manage due to the exponential increase in their volumes, inconsistency in semantics and complexity in representations. On-Line Analytical Processing (OLAP) is an important tool in analysing large and complex BI data, but it lacks the capability of processing disperse SW data due to the nature of its design. A new concept with a richer vocabulary than the existing ones for OLAP is needed to model distributed multidimensional semantic web databases. A new OLAP framework is developed, with multiple layers including additional vocabulary, extended OLAP operators, and usage of SPARQL to model heterogeneous semantic web data, unify multidimensional structures, and provide new enabling functions for interoperability. The framework is presented with examples to demonstrate its capability to unify existing vocabularies with additional vocabulary elements to handle both informational and topological data in Graph OLAP. The vocabularies used in this work are: the RDF Cube Vocabulary (QB) – proposed by the W3C to allow multi-dimensional, mostly statistical, data to be published in RDF; and the QB4OLAP – a QB extension introducing standard OLAP operators. The framework enables the composition of multiple databases (e.g. energy consumptions and property market values etc.) to generate observations through semantic pipe-like operators. This approach is demonstrated through Use Cases containing highly valuable data collected from a real-life environment. Its usability is proved through the development and usage of semantic pipe-like operators able to deliver OLAP specific functionalities. To the best of my knowledge there is no available data modelling approach handling both informational and topological Semantic Web data, which is designed either to provide OLAP capabilities over Semantic Web databases or to provide a means to connect such databases for further OLAP analysis. The thesis proposes that the presented work provides a wider understanding of: ways to access Semantic Web data; ways to build specialised Semantic Web databases, and, how to enrich them with powerful capabilities for further Business Intelligence

    Towards Prescriptive Analytics in Cyber-Physical Systems

    Get PDF
    More and more of our physical world today is being monitored and controlled by so-called cyber-physical systems (CPSs). These are compositions of networked autonomous cyber and physical agents such as sensors, actuators, computational elements, and humans in the loop. Today, CPSs are still relatively small-scale and very limited compared to CPSs to be witnessed in the future. Future CPSs are expected to be far more complex, large-scale, wide-spread, and mission-critical, and found in a variety of domains such as transportation, medicine, manufacturing, and energy, where they will bring many advantages such as the increased efficiency, sustainability, reliability, and security. To unleash their full potential, CPSs need to be equipped with, among other features, the support for automated planning and control, where computing agents collaboratively and continuously plan and control their actions in an intelligent and well-coordinated manner to secure and optimize a physical process, e.g., electricity flow in the power grid. In today’s CPSs, the control is typically automated, but the planning is solely performed by humans. Unfortunately, it is intractable and infeasible for humans to plan every action in a future CPS due to the complexity, scale, and volatility of a physical process. Due to these properties, the control and planning has to be continuous and automated in future CPSs. Humans may only analyse and tweak the system’s operation using the set of tools supporting prescriptive analytics that allows them (1) to make predictions, (2) to get the suggestions of the most prominent set of actions (decisions) to be taken, and (3) to analyse the implications as if such actions were taken. This thesis considers the planning and control in the context of a large-scale multi-agent CPS. Based on the smart-grid use-case, it presents a so-called PrescriptiveCPS – which is (the conceptual model of) a multi-agent, multi-role, and multi-level CPS automatically and continuously taking and realizing decisions in near real-time and providing (human) users prescriptive analytics tools to analyse and manage the performance of the underlying physical system (or process). Acknowledging the complexity of CPSs, this thesis provides contributions at the following three levels of scale: (1) the level of a (full) PrescriptiveCPS, (2) the level of a single PrescriptiveCPS agent, and (3) the level of a component of a CPS agent software system. At the CPS level, the contributions include the definition of PrescriptiveCPS, according to which it is the system of interacting physical and cyber (sub-)systems. Here, the cyber system consists of hierarchically organized inter-connected agents, collectively managing instances of so-called flexibility, decision, and prescription models, which are short-lived, focus on the future, and represent a capability, an (user’s) intention, and actions to change the behaviour (state) of a physical system, respectively. At the agent level, the contributions include the three-layer architecture of an agent software system, integrating the number of components specially designed or enhanced to support the functionality of PrescriptiveCPS. At the component level, the most of the thesis contribution is provided. The contributions include the description, design, and experimental evaluation of (1) a unified multi-dimensional schema for storing flexibility and prescription models (and related data), (2) techniques to incrementally aggregate flexibility model instances and disaggregate prescription model instances, (3) a database management system (DBMS) with built-in optimization problem solving capability allowing to formulate optimization problems using SQL-like queries and to solve them “inside a database”, (4) a real-time data management architecture for processing instances of flexibility and prescription models under (soft or hard) timing constraints, and (5) a graphical user interface (GUI) to visually analyse the flexibility and prescription model instances. Additionally, the thesis discusses and exemplifies (but provides no evaluations of) (1) domain-specific and in-DBMS generic forecasting techniques allowing to forecast instances of flexibility models based on historical data, and (2) powerful ways to analyse past, current, and future based on so-called hypothetical what-if scenarios and flexibility and prescription model instances stored in a database. Most of the contributions at this level are based on the smart-grid use-case. In summary, the thesis provides (1) the model of a CPS with planning capabilities, (2) the design and experimental evaluation of prescriptive analytics techniques allowing to effectively forecast, aggregate, disaggregate, visualize, and analyse complex models of the physical world, and (3) the use-case from the energy domain, showing how the introduced concepts are applicable in the real world. We believe that all this contribution makes a significant step towards developing planning-capable CPSs in the future.Mehr und mehr wird heute unsere physische Welt überwacht und durch sogenannte Cyber-Physical-Systems (CPS) geregelt. Dies sind Kombinationen von vernetzten autonomen cyber und physischen Agenten wie Sensoren, Aktoren, Rechenelementen und Menschen. Heute sind CPS noch relativ klein und im Vergleich zu CPS der Zukunft sehr begrenzt. Zukünftige CPS werden voraussichtlich weit komplexer, größer, weit verbreiteter und unternehmenskritischer sein sowie in einer Vielzahl von Bereichen wie Transport, Medizin, Fertigung und Energie – in denen sie viele Vorteile wie erhöhte Effizienz, Nachhaltigkeit, Zuverlässigkeit und Sicherheit bringen – anzutreffen sein. Um ihr volles Potenzial entfalten zu können, müssen CPS unter anderem mit der Unterstützung automatisierter Planungs- und Steuerungsfunktionalität ausgestattet sein, so dass Agents ihre Aktionen gemeinsam und kontinuierlich auf intelligente und gut koordinierte Weise planen und kontrollieren können, um einen physischen Prozess wie den Stromfluss im Stromnetz sicherzustellen und zu optimieren. Zwar sind in den heutigen CPS Steuerung und Kontrolle typischerweise automatisiert, aber die Planung wird weiterhin allein von Menschen durchgeführt. Leider ist diese Aufgabe nur schwer zu bewältigen, und es ist für den Menschen schlicht unmöglich, jede Aktion in einem zukünftigen CPS auf Basis der Komplexität, des Umfangs und der Volatilität eines physikalischen Prozesses zu planen. Aufgrund dieser Eigenschaften müssen Steuerung und Planung in CPS der Zukunft kontinuierlich und automatisiert ablaufen. Der Mensch soll sich dabei ganz auf die Analyse und Einflussnahme auf das System mit Hilfe einer Reihe von Werkzeugen konzentrieren können. Derartige Werkzeuge erlauben (1) Vorhersagen, (2) Vorschläge der wichtigsten auszuführenden Aktionen (Entscheidungen) und (3) die Analyse und potentiellen Auswirkungen der zu fällenden Entscheidungen. Diese Arbeit beschäftigt sich mit der Planung und Kontrolle im Rahmen großer Multi-Agent-CPS. Basierend auf dem Smart-Grid als Anwendungsfall wird ein sogenanntes PrescriptiveCPS vorgestellt, welches einem Multi-Agent-, Multi-Role- und Multi-Level-CPS bzw. dessen konzeptionellem Modell entspricht. Diese PrescriptiveCPS treffen und realisieren automatisch und kontinuierlich Entscheidungen in naher Echtzeit und stellen Benutzern (Menschen) Prescriptive-Analytics-Werkzeuge und Verwaltung der Leistung der zugrundeliegenden physischen Systeme bzw. Prozesse zur Verfügung. In Anbetracht der Komplexität von CPS leistet diese Arbeit Beiträge auf folgenden Ebenen: (1) Gesamtsystem eines PrescriptiveCPS, (2) PrescriptiveCPS-Agenten und (3) Komponenten eines CPS-Agent-Software-Systems. Auf CPS-Ebene umfassen die Beiträge die Definition von PrescriptiveCPS als ein System von wechselwirkenden physischen und cyber (Sub-)Systemen. Das Cyber-System besteht hierbei aus hierarchisch organisierten verbundenen Agenten, die zusammen Instanzen sogenannter Flexibility-, Decision- und Prescription-Models verwalten, welche von kurzer Dauer sind, sich auf die Zukunft konzentrieren und Fähigkeiten, Absichten (des Benutzers) und Aktionen darstellen, die das Verhalten des physischen Systems verändern. Auf Agenten-Ebene umfassen die Beiträge die Drei-Ebenen-Architektur eines Agentensoftwaresystems sowie die Integration von Komponenten, die insbesondere zur besseren Unterstützung der Funktionalität von PrescriptiveCPS entwickelt wurden. Der Schwerpunkt dieser Arbeit bilden die Beiträge auf der Komponenten-Ebene, diese umfassen Beschreibung, Design und experimentelle Evaluation (1) eines einheitlichen multidimensionalen Schemas für die Speicherung von Flexibility- and Prescription-Models (und verwandten Daten), (2) der Techniken zur inkrementellen Aggregation von Instanzen eines Flexibilitätsmodells und Disaggregation von Prescription-Models, (3) eines Datenbankmanagementsystem (DBMS) mit integrierter Optimierungskomponente, die es erlaubt, Optimierungsprobleme mit Hilfe von SQL-ähnlichen Anfragen zu formulieren und sie „in einer Datenbank zu lösen“, (4) einer Echtzeit-Datenmanagementarchitektur zur Verarbeitung von Instanzen der Flexibility- and Prescription-Models unter (weichen oder harten) Zeitvorgaben und (5) einer grafische Benutzeroberfläche (GUI) zur Visualisierung und Analyse von Instanzen der Flexibility- and Prescription-Models. Darüber hinaus diskutiert und veranschaulicht diese Arbeit beispielhaft ohne detaillierte Evaluation (1) anwendungsspezifische und im DBMS integrierte Vorhersageverfahren, die die Vorhersage von Instanzen der Flexibility- and Prescription-Models auf Basis historischer Daten ermöglichen, und (2) leistungsfähige Möglichkeiten zur Analyse von Vergangenheit, Gegenwart und Zukunft auf Basis sogenannter hypothetischer „What-if“-Szenarien und der in der Datenbank hinterlegten Instanzen der Flexibility- and Prescription-Models. Die meisten der Beiträge auf dieser Ebene basieren auf dem Smart-Grid-Anwendungsfall. Zusammenfassend befasst sich diese Arbeit mit (1) dem Modell eines CPS mit Planungsfunktionen, (2) dem Design und der experimentellen Evaluierung von Prescriptive-Analytics-Techniken, die eine effektive Vorhersage, Aggregation, Disaggregation, Visualisierung und Analyse komplexer Modelle der physischen Welt ermöglichen und (3) dem Anwendungsfall der Energiedomäne, der zeigt, wie die vorgestellten Konzepte in der Praxis Anwendung finden. Wir glauben, dass diese Beiträge einen wesentlichen Schritt in der zukünftigen Entwicklung planender CPS darstellen.Mere og mere af vores fysiske verden bliver overvåget og kontrolleret af såkaldte cyber-fysiske systemer (CPSer). Disse er sammensætninger af netværksbaserede autonome IT (cyber) og fysiske (physical) agenter, såsom sensorer, aktuatorer, beregningsenheder, og mennesker. I dag er CPSer stadig forholdsvis små og meget begrænsede i forhold til de CPSer vi kan forvente i fremtiden. Fremtidige CPSer forventes at være langt mere komplekse, storstilede, udbredte, og missionskritiske, og vil kunne findes i en række områder såsom transport, medicin, produktion og energi, hvor de vil give mange fordele, såsom øget effektivitet, bæredygtighed, pålidelighed og sikkerhed. For at frigøre CPSernes fulde potentiale, skal de bl.a. udstyres med støtte til automatiseret planlægning og kontrol, hvor beregningsagenter i samspil og løbende planlægger og styrer deres handlinger på en intelligent og velkoordineret måde for at sikre og optimere en fysisk proces, såsom elforsyningen i elnettet. I nuværende CPSer er styringen typisk automatiseret, mens planlægningen udelukkende er foretaget af mennesker. Det er umuligt for mennesker at planlægge hver handling i et fremtidigt CPS på grund af kompleksiteten, skalaen, og omskifteligheden af en fysisk proces. På grund af disse egenskaber, skal kontrol og planlægning være kontinuerlig og automatiseret i fremtidens CPSer. Mennesker kan kun analysere og justere systemets drift ved hjælp af det sæt af værktøjer, der understøtter præskriptive analyser (prescriptive analytics), der giver dem mulighed for (1) at lave forudsigelser, (2) at få forslagene fra de mest fremtrædende sæt handlinger (beslutninger), der skal tages, og (3) at analysere konsekvenserne, hvis sådanne handlinger blev udført. Denne afhandling omhandler planlægning og kontrol i forbindelse med store multi-agent CPSer. Baseret på en smart-grid use case, præsenterer afhandlingen det såkaldte PrescriptiveCPS hvilket er (den konceptuelle model af) et multi-agent, multi-rolle, og multi-level CPS, der automatisk og kontinuerligt tager beslutninger i nær-realtid og leverer (menneskelige) brugere præskriptiveanalyseværktøjer til at analysere og håndtere det underliggende fysiske system (eller proces). I erkendelse af kompleksiteten af CPSer, giver denne afhandling bidrag til følgende tre niveauer: (1) niveauet for et (fuldt) PrescriptiveCPS, (2) niveauet for en enkelt PrescriptiveCPS agent, og (3) niveauet for en komponent af et CPS agent software system. På CPS-niveau, omfatter bidragene definitionen af PrescriptiveCPS, i henhold til hvilken det er det system med interagerende fysiske- og IT- (under-) systemer. Her består IT-systemet af hierarkisk organiserede forbundne agenter der sammen styrer instanser af såkaldte fleksibilitet (flexibility), beslutning (decision) og præskriptive (prescription) modeller, som henholdsvis er kortvarige, fokuserer på fremtiden, og repræsenterer en kapacitet, en (brugers) intention, og måder til at ændre adfærd (tilstand) af et fysisk system. På agentniveau omfatter bidragene en tre-lags arkitektur af et agent software system, der integrerer antallet af komponenter, der er specielt konstrueret eller udbygges til at understøtte funktionaliteten af PrescriptiveCPS. Komponentniveauet er hvor afhandlingen har sit hovedbidrag. Bidragene omfatter beskrivelse, design og eksperimentel evaluering af (1) et samlet multi- dimensionelt skema til at opbevare fleksibilitet og præskriptive modeller (og data), (2) teknikker til trinvis aggregering af fleksibilitet modelinstanser og disaggregering af præskriptive modelinstanser (3) et database management system (DBMS) med indbygget optimeringsproblemløsning (optimization problem solving) der gør det muligt at formulere optimeringsproblemer ved hjælp af SQL-lignende forespørgsler og at løse dem "inde i en database", (4) en realtids data management arkitektur til at behandle instanser af fleksibilitet og præskriptive modeller under (bløde eller hårde) tidsbegrænsninger, og (5) en grafisk brugergrænseflade (GUI) til visuelt at analysere fleksibilitet og præskriptive modelinstanser. Derudover diskuterer og eksemplificerer afhandlingen (men giver ingen evalueringer af) (1) domæne-specifikke og in-DBMS generiske prognosemetoder der gør det muligt at forudsige instanser af fleksibilitet modeller baseret på historiske data, og (2) kraftfulde måder at analysere tidligere-, nutids- og fremtidsbaserede såkaldte hypotetiske hvad-hvis scenarier og fleksibilitet og præskriptive modelinstanser gemt i en database. De fleste af bidragene på dette niveau er baseret på et smart-grid brugsscenarie. Sammenfattende giver afhandlingen (1) modellen for et CPS med planlægningsmulighed, (2) design og eksperimentel evaluering af præskriptive analyse teknikker der gør det muligt effektivt at forudsige, aggregere, disaggregere, visualisere og analysere komplekse modeller af den fysiske verden, og (3) brugsscenariet fra energiområdet, der viser, hvordan de indførte begreber kan anvendes i den virkelige verden. Vi mener, at dette bidrag udgør et betydeligt skridt i retning af at udvikle CPSer til planlægningsbrug i fremtiden

    BIG DATA AND ANALYTICS AS A NEW FRONTIER OF ENTERPRISE DATA MANAGEMENT

    Get PDF
    Big Data and Analytics (BDA) promises significant value generation opportunities across industries. Even though companies increase their investments, their BDA initiatives fall short of expectations and they struggle to guarantee a return on investments. In order to create business value from BDA, companies must build and extend their data-related capabilities. While BDA literature has emphasized the capabilities needed to analyze the increasing volumes of data from heterogeneous sources, EDM researchers have suggested organizational capabilities to improve data quality. However, to date, little is known how companies actually orchestrate the allocated resources, especially regarding the quality and use of data to create value from BDA. Considering these gaps, this thesis – through five interrelated essays – investigates how companies adapt their EDM capabilities to create additional business value from BDA. The first essay lays the foundation of the thesis by investigating how companies extend their Business Intelligence and Analytics (BI&A) capabilities to build more comprehensive enterprise analytics platforms. The second and third essays contribute to fundamental reflections on how organizations are changing and designing data governance in the context of BDA. The fourth and fifth essays look at how companies provide high quality data to an increasing number of users with innovative EDM tools, that are, machine learning (ML) and enterprise data catalogs (EDC). The thesis outcomes show that BDA has profound implications on EDM practices. In the past, operational data processing and analytical data processing were two “worlds” that were managed separately from each other. With BDA, these "worlds" are becoming increasingly interdependent and organizations must manage the lifecycles of data and analytics products in close coordination. Also, with BDA, data have become the long-expected, strategically relevant resource. As such data must now be viewed as a distinct value driver separate from IT as it requires specific mechanisms to foster value creation from BDA. BDA thus extends data governance goals: in addition to data quality and regulatory compliance, governance should facilitate data use by broadening data availability and enabling data monetization. Accordingly, companies establish comprehensive data governance designs including structural, procedural, and relational mechanisms to enable a broad network of employees to work with data. Existing EDM practices therefore need to be rethought to meet the emerging BDA requirements. While ML is a promising solution to improve data quality in a scalable and adaptable way, EDCs help companies democratize data to a broader range of employees
    corecore