48 research outputs found

    Semantic data ingestion for intelligent, value-driven big data analytics

    Get PDF
    In this position paper we describe a conceptual model for intelligent Big Data analytics based on both semantic and machine learning AI techniques (called AI ensembles). These processes are linked to business outcomes by explicitly modelling data value and using semantic technologies as the underlying mode for communication between the diverse processes and organisations creating AI ensembles. Furthermore, we show how data governance can direct and enhance these ensembles by providing recommendations and insights that to ensure the output generated produces the highest possible value for the organisation

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Tokenized Ecosystem of Personal Data - Exemplified on the Context of the Smart City

    Get PDF
    Data driven businesses, services, and even smart cities of tomorrow depend on access to data not only from machines, but also personal data of consumers, clients, citizens. Sustain-able utilization of such data must base on legal compliancy, ethical soundness, and consent. Data subjects nowadays largely lack empowerment over utilization and monetization of their personal data. To change this, we propose a tokenized ecosystem of personal data (TokPD), combining anonymization, referencing, encryption, decentralization, and functional layering to establish a privacy preserving solution for processing of personal data. This tokenized ecosys-tem is a more generalized variant of the smart city ecosystem described in the preceding publi-cation "Smart Cities of Self-Determined Data Subjects" (Frecè & Selzam 2017) with focus to-wards further options of decentralization. We use the example of a smart city to demonstrate, how TokPD ensures the data subjects’ privacy, grants the smart city access to a high number of new data sources, and simultaneously handles the user-consent to ensure compliance with mod-ern data protection regulation

    The changing face of floodplains in the Mississippi River Basin detected by a 60-year land use change dataset

    Get PDF
    Floodplains provide essential ecosystem functions, yet \u3e80% of European and North American floodplains are substantially modified. Despite floodplain changes over the past century, comprehensive, long-term land use change data within large river basin floodplains are limited. Long-term land use data can be used to quantify floodplain functions and provide spatially explicit information for management, restoration, and flood-risk mitigation. We present a comprehensive dataset quantifying floodplain land use change along the 3.3 million km2 Mississippi River Basin (MRB) covering 60 years (1941–2000) at 250-m resolution. We developed four unique products as part of this work, a(n): (i) Google Earth Engine interactive map visualization interface, (ii) Python code that runs in any internet browser, (iii) online tutorial with visualizations facilitating classroom code application, and (iv) instructional video demonstrating code application and database reproduction. Our data show that MRB’s natural floodplain ecosystems have been substantially altered to agricultural and developed land uses. These products will support MRB resilience and sustainability goals by advancing data-driven decision making on floodplain restoration, buyout, and conservation scenarios

    Einsatz und Bewertung komponentenbasierter Metadaten in einer föderierten Infrastruktur für Sprachressourcen am Beispiel der CMDI

    Get PDF
    Die Arbeit setzt sich mit dem Einsatz der Component Metadata Infrastructure CMDI im Rahmen der föderierten Infrastruktur CLARIN auseinander, wobei diverse konkrete Problemfälle aufgezeigt werden. Für die Erarbeitung entsprechender Lösungsstrategien werden unterschiedliche Verfahren adaptiert und für die Qualitätsanalyse von Metadaten und zur Optimierung ihres Einsatzes in einer föderierten Umgebung genutzt. Konkret betrifft dies vor allem die Übernahme von Modellierungsstrategien der Linked Data Community, die Übernahme von Prinzipien und Qualitätsmetriken der objektorientierten Programmierung für CMD-Metadatenkomponenten, sowie den Einsatz von Zentralitätsmaßen der Graph- bzw. Netzwerkanalyse für die Bewertung des Zusammenhalts des gesamten Metadatenverbundes. Dabei wird im Rahmen der Arbeit die Analyse verwendeter Schema- bzw. Schemabestandteile sowie die Betrachtung verwendeter Individuenvokabulare im Zusammenspiel aller beteiligten Zentren in den Vordergrund gestellt

    Automating Global Geospatial Data Set Analysis : Visualizing flood disasters in the cities of the Global South

    Get PDF
    Flooding is the most devastating natural hazard affecting tens of millions of people yearly and causing billions of USD dollars in damages globally. The people most affected by flooding globally are those with a high level of everyday vulnerability and limited resources for flood protection and recovery. Geospatial data from the Global South is severely lacking, and geospatial proficiency needs to be improved at a local level so that geospatial data and data analysis can be efficiently utilized in disaster risk reduction schemes and urban planning in the Global South. This thesis focuses on the use of automated global geospatial dataset analysis in disaster risk reduction in the Global South by using the Python programming language to produce an automated flood analysis and visualization model. In this study, the automated model was developed and tested in two, highly relevant cases: in the city of Bangkok, Thailand, and in the urban area of Tula de Allende, Mexico. The results of the thesis show that with minimal user interaction, the automated flood model ingests flood extent and depth data produced by ICEYE, a global population estimation raster produced by the German Aerospace Agency (DLR) and OpenStreetMap (OSM) data, performs multiple relevant analyses of these data, and produces an interactive map highlighting the severity and effects of a flooding event. The automated flood model performs consistently and accurately while producing key statistics and standardized visualizations of flooding events which offers first responders a very fast first estimation of the scale of a flooding event and helps plan an appropriate response anywhere around the globe. Global geospatial data sets are often created to examine large scale geographical phenomena; however, the results of this thesis show that they can also be used to analyze detailed local-level phenomena when paired together with supporting data. The advantage of using global geospatial data sets is that when sufficiently accurate and precise, they remove the most time-consuming part of geospatial analysis: finding suitable data. Fast reaction is of utmost importance in the first hours of a natural hazard like flooding, thus, automated analysis produced on a global scale could significantly help international humanitarian aid and first responders. Using an automated model also standardizes the results removing human errors and interpretation from the results enabling the accurate comparison of historical flood data in due time.Tulvat ovat luonnonilmiöihin liittyvistä riskeistä tuhoisimpia, ja ne vaikuttavat kymmeniin miljooniin ihmisiin vuosittain sekä aiheuttavat miljardien dollarien vahingot maailmanlaajuisesti. Tulvista kärsivät usein maailmanlaajuisesti ne ihmiset, jotka ovat jo ennestään haavoittuvia ja joilla on suhteellisesti heikoimmat keinot suojautua tulvilta ja selviytyä tulvan aiheuttamista tuhoista. Monissa globaalin etelän maissa on niukasti paikkatietoaineistoa ja paikkatieto-osaamista on syytä lisätä erityisesti paikallisella tasolla, jotta paikkatietoaineistoa ja analyysin hyödynnettävyyttä voidaan parantaa katastrofiriskien vähentämissuunnitelmissa sekä kaupunkisuunnittelussa globaalissa etelässä. Tämä opinnäytetyö keskittyy automatisoidun globaalin paikkatietoaineiston analyysin hyödyntämiseen katastrofiriskien vähentämisessä globaalissa etelässä käyttämällä Python-ohjelmointikieltä automatisoidun tulva-analyysi- ja visualisointimallin tuottamiseen. Tässä tutkimuksessa automatisoitua mallia kehitettiin ja testattiin kahdessa tulvariskien kannalta erittäin relevantissa tapauksessa: Bangkokissa, Thaimaassa ja Tula de Allende:n kaupunkialueella, Meksikossa. Tämän tutkielman tulokset osoittavat, että automatisoitu tulvamalli osaa lukea ICEYE:n tuottaman tulvan laajuus- ja syvyysaineiston, Saksan ilmailu- ja avaruuskeskuksen (DLR) tuottaman maailmanlaajuisen väestönarviorasterin, sekä OpenStreetMap (OSM) -aineiston, suorittaa aineistolle tulvan tuhojen tulkinnan kannalta olennaisia analyyseja, ja tuottaa lopputuloksena interaktiivisen kartan, joka korostaa tulvatapahtuman laajuutta ja vaikutuksia. Automatisoitu tulvamalli toimii johdonmukaisesti ja tuottaa tilastoja sekä standardoituja visualisointeja tulvatapahtumista, mikä tarjoaa ensivastehenkilöille erittäin nopean ensimmäisen arvion tulvatapahtuman laajuudesta. Tämä auttaa kohdentamaan pelastustoimenpiteitä riskitilanteessa vaihtelevissa ympäristöissä eri puolilla maailmaa. Globaalit paikkatietoaineistot luodaan usein laajojen maantieteellisten ilmiöiden tutkimiseen, mutta tämän tutkielman tulokset osoittavat kuitenkin, että niillä voidaan analysoida myös hyvin paikallistason ilmiöitä, kun ne yhdistetään muihin relevantteihin tietolähteisiin. Globaalien paikkatietoaineistojen käytön etuna on, että ollessaan riittävän tarkkoja ne poistavat paikkatietoanalyysin aikaa vievimmän osan: sopivan tiedon löytämisen. Nopea reagointi on äärimmäisen tärkeää luonnonuhkien, kuten tulvien, ensimmäisinä tunteina ja kansainvälisen humanitaarisen avun ja ensivastetoimijoiden tulisi hyödyntää maailmanlaajuisia automatisoituja analyysejä. Automaattinen malli myös standardoi tulokset poistaen tuloksista inhimilliset virheet ja tulkinnat, mikä mahdollistaa historiallisten tulvatietojen tarkan vertailun

    RESEARCH DATA MANAGEMENT AND A SYSTEM DESIGN TO SEMI-AUTOMATICALLY COMPLETE INTEGRATED DATA MANAGEMENT PLANS [POSITION PAPER]

    Get PDF
    Data is an integral part of modern scientific work. Good research data management (RDM) and the communication of the related information is extremely an important matter. It is not only crucial for the ongoing research and its claims but also for the future uses of data. In recent years some guiding principles, e.g. FAIR principles and initiatives at the national and international level, e.g. NFDI, NFDI4Ing have also been founded to improve RDM. The data and its metadata are often handled in file system like structures which are versioned and logged. The information relating to the data handling are documented in data management plan (DMP). DMPs are also usually managed in similar file structures. These are made available in editable document formats as well as online free-text editable forms to which users are required to keep updating manually. These are isolated documents which have neither direct relation to data for verification nor are common to understand with consistency. In this paper, research data management of large-scale interdisciplinary projects is presented. On one hand it introduces, contemporary practices of RDM and on the other hand it helps researchers to determine the features of RDM system in the situations when it comes to select or develop a system for the same purpose. It further introduces a system design for semi-automatic completion of DMP functions in collaborative environment a.k.a. virtual research environment (VRE). It is assumed that the proposed system will assist and enable users to update semi-automatically integrated DMP during all phases of data life cycle. Direct relation to the data for verification, common understanding and consistency will also be maintainable

    Design and Implementation of a Research Data Management System: The CRC/TR32 Project Database (TR32DB)

    Get PDF
    Research data management (RDM) includes all processes and measures which ensure that research data are well-organised, documented, preserved, stored, backed up, accessible, available, and re-usable. Corresponding RDM systems or repositories form the technical framework to support the collection, accurate documentation, storage, back-up, sharing, and provision of research data, which are created in a specific environment, like a research group or institution. The required measures for the implementation of a RDM system vary according to the discipline or purpose of data (re-)use. In the context of RDM, the documentation of research data is an essential duty. This has to be conducted by accurate, standardized, and interoperable metadata to ensure the interpretability, understandability, shareability, and long-lasting usability of the data. RDM is achieving an increasing importance, as digital information increases. New technologies enable to create more digital data, also automatically. Consequently, the volume of digital data, including big data and small data, will approximately double every two years in size. With regard to e-science, this increase of data was entitled and predicted as the data deluge. Furthermore, the paradigm change in science has led to data intensive science. Particularly scientific data that were financed by public funding are significantly demanded to be archived, documented, provided or even open accessible by different policy makers, funding agencies, journals and other institutions. RDM can prevent the loss of data, otherwise around 80-90 % of the generated research data disappear and are not available for re-use or further studies. This will lead to empty archives or RDM systems. The reasons for this course are well known and are of a technical, socio-cultural, and ethical nature, like missing user participation and data sharing knowledge, as well as lack of time or resources. In addition, the fear of exploitation and missing or limited reward for publishing and sharing data has an important role. This thesis presents an approach in handling research data of the collaborative, multidisciplinary, long-term DFG-funded research project Collaborative Research Centre/Transregio 32 (CRC/TR32) “Patterns in Soil-Vegetation-Atmosphere Systems: Monitoring, Modelling, and Data Assimilation”. In this context, a RDM system, the so-called CRC/TR32 project database (TR32DB), was designed and implemented. The TR32DB considers the demands of the project participants (e.g. heterogeneous data from different disciplines with various file sizes) and the requirements of the DFG, as well as general challenges in RDM. For this purpose, a RDM system was established that comprises a well-described self-designed metadata schema, a file-based data storage, a well-elaborated database of metadata, and a corresponding user-friendly web interface. The whole system is developed in close cooperation with the local Regional Computing Centre of the University of Cologne (RRZK), where it is also hosted. The documentation of the research data with accurate metadata is of key importance. For this purpose, an own specific TR32DB Metadata Schema was designed, consisting of multi-level metadata properties. This is distinguished in general and data type specific (e.g. data, publication, report) properties and is developed according to the project background, demands of the various data types, as well as recent associated metadata standards and principles. Consequently, it is interoperable to recent metadata standards, such as the Dublin Core, the DataCite Metadata Schema, as well as core elements of the ISO19115:2003 Metadata Standard and INSPIRE Directive. Furthermore, the schema supports optional, mandatory, and automatically generated metadata properties, as well as it provides predefined, obligatory and self-established controlled vocabulary lists. The integrated mapping to the DataCite Metadata Schema facilitates the simple application of a Digital Object Identifier (DOI) for a dataset. The file-based data storage is organized in a folder system, corresponding to the structure of the CRC/TR32 and additionally distinguishes between several data types (e.g. data, publication, report). It is embedded in the Andrew File System hosted by the RRZK. The file system is capable to store and backup all data, is highly scalable, supports location independence, and enables easy administration by Access Control Lists. In addition, the relational database management system MySQL stores the metadata according to the previous mentioned TR32DB Metadata Schema as well as further necessary administrative data. A user-friendly web-based graphical user interface enables the access to the TR32DB system. The web-interface provides metadata input, search, and download of data, as well as the visualization of important geodata is handled by an internal WebGIS. This web-interface, as well as the entire RDM system, is self-developed and adjusted to the specific demands. Overall, the TR32DB system is developed according to the needs and requirements of the CRC/TR32 scientists, fits the demands of the DFG, and considers general problems and challenges of RDM as well. With regard to changing demands of the CRC/TR32 and technologic advances, the system is and will be consequently further developed. The established TR32DB approach was already successfully applied to another interdisciplinary research project. Thus, this approach is transferable and generally capable to archive all data, generated by the CRC/TR32, with accurately, interoperable metadata to ensure the re-use of the data, beyond the end of the project
    corecore