25 research outputs found

    Verification and Validation of Semantic Annotations

    Full text link
    In this paper, we propose a framework to perform verification and validation of semantically annotated data. The annotations, extracted from websites, are verified against the schema.org vocabulary and Domain Specifications to ensure the syntactic correctness and completeness of the annotations. The Domain Specifications allow checking the compliance of annotations against corresponding domain-specific constraints. The validation mechanism will detect errors and inconsistencies between the content of the analyzed schema.org annotations and the content of the web pages where the annotations were found.Comment: Accepted for the A.P. Ershov Informatics Conference 2019(the PSI Conference Series, 12th edition) proceedin

    Approaching location-based services from a place-based perspective: from data to services?

    Full text link
    Despite the seemingly obvious importance of a link between notions of place and the provision of context in location-based services (LBS), truly place-based LBS remain rare. Place is attractive as a concept for designing services as it focuses on ways in which people, rather than machines, represent and talk about places. We review papers which have extracted place-relevant information from a variety of sources, examining their rationales, the data sources used, the characteristics of the data under study and the ways in which place is represented. Although the data sources used are subject to a wide range of biases, we find that existing methods and data sources are capable of extracting a wide range of place-related information. We suggest categories of LBS which could profit from such information, for example, by using place-related natural language (e.g. vernacular placenames) in tracking and routing services and moving the focus from geometry to place semantics in location-based retrieval. A key future challenge will be to integrate data derived from multiple sources if we are to advance from individual case studies focusing on a single aspect of place to services which can deal with multiple aspects of place

    An Application Programming Interface (API) framework for digital government

    Get PDF
    The digital transformation of society obliges governments to evolve towards increasingly complex digital environments. These environments require strong coordination efforts to ensure a synergistic integration of different systems and actors. Application programming interfaces (APIs) are the connective nodes of digital components and thus instrumental enablers of this integration. Yet today, the integration of digital components is often done on an ‘ad hoc’ basis, disregarding the potential value-added for the whole digital environment. This work proposes a framework for a cohesive API adoption in government environments. The framework is distilled from the analysis of an extensive literature review conducted on government API adoption practices to date. The output of this analysis identifies actions to be taken by governments to improve their API infrastructure and related organisational processes. The framework offers 12 ‘proposals’ arranged around four organisational pillars, namely, policy support, platforms and ecosystems, people, and processes. Actions are then organised into the three levels of organisational management, i.e. strategic, tactical and operational. Motivations, implementation details and a self-assessment checklist are provided for each of the proposals. Given that the maturity of digital government structures is uneven, the framework has been designed to be flexible enough to help governments identify the specific actions they need to focus on. The work outlines the basis of an API maturity assessment tool.JRC.B.6-Digital Econom

    Proposing a Methodology for Designing an Enterprise Knowledge Graph to Ensure Interoperability Between Heterogeneous Data Sources

    Get PDF
    Ο κύριος ερευνητικός στόχος αυτής της διπλωματικής εργασίας είναι να προτείνει μια προσέγγιση για το σχεδιασμό ενός Γνωσιακού Γράφου Επιχειρήσεων (EKG) για τη διασφάλιση της διαλειτουργικότητας μεταξύ διαφορετικών ετερογενών πηγών, λαμβάνοντας υπόψη τις ήδη υπάρχουσες προσπάθειες και διαδικασίες αυτοματισμού που αναπτύχθηκαν από την ENGIE στην προσπάθειά τους να δημιουργήσουν ένα Γνωσιακό Γράφο ειδικού σκοπού. Για την επίτευξη αυτού του στόχου, απαιτείται η βαθιά κατανόηση των ήδη υπαρχόντων σύγχρονων προσεγγίσεων EKG, των τεχνολογιών τους με έμφαση στη μετατροπή δεδομένων και στις μεθόδους επερώτησης και τέλος η συγκριτική παρουσίαση τυχόν νέων ευρημάτων σε αυτή τη νέα πρόκληση καθορισμού ενός EKG. Τα κριτήρια αξιολόγησης των διαφόρων διαδικασιών έχουν αποφασιστεί με τρόπο που να καλύπτει τις ακόλουθες ερωτήσεις. (i) Ποιες είναι οι επιπτώσεις και τα πρακτικά αποτελέσματα των διαφορετικών στρατηγικών σχεδιασμού για τον ορισμό ενός EKG; (ii) Πώς αυτές οι στρατηγικές επηρεάζουν τη σημασιολογική πολυπλοκότητα και μειώνουν ή αυξάνουν την απόδοση; (iii) Είναι δυνατόν να διατηρηθεί χαμηλά η καθυστέρηση και να έχουμε μόνιμες ενημερώσεις; Επιπλέον, η εργασία μας περιορίστηκε σε ένα σενάριο χρήσης που ορίστηκε από την ENGIE για να διερευνήσει τα ανοιχτά δεδομένα ατυχημάτων και τα δεδομένα του οδικού χάρτη ως μια αφετηρία στην κατασκευή ενός EKG. Θα πειραματιστούμε με τον μετασχηματισμό δεδομένων από ετερογενείς πηγές δεδομένων σε μία τελική ενιαία συλλογή δεδομένων (RDF) έτοιμη να χρησιμοποιηθεί ως τη βάση του Γνωσιακού Γράφου. Στη συνέχεια, θα παρουσιάσουμε τις τεχνικές προκλήσεις, το λεξιλόγιο και τις μεθόδους που χρησιμοποιούνται για την επίλυση του προβλήματος ορισμού ενός EKG. Τέλος, ένας παράλληλος στόχος της διπλωματικής εργασίας είναι η πρακτική δοκιμή και σύγκριση τεχνικών μεθόδων για την ενσωμάτωση, τον εμπλουτισμό και τον μετασχηματισμό δεδομένων. Το σημαντικότερο για εμάς, είναι ότι θα δοκιμάσουμε την ικανότητα επερωτήσεων γεωχωρικών πληροφοριών που αποτελούν βασικό στοιχείο για αυτό το Γνωσιακό Γράφο ειδικού σκοπού. Σε αυτήν την εργασία, παρουσιάζουμε τις διαφορετικές υλοποιήσεις των RDF stores που υποστηρίζουν μια γλώσσα γεωγραφικών επερωτήσεων για δεδομένα RDF (GeoSPARQL), ένα πρότυπο του W3C για γεωγραφική και σημασιολογική αναπαράσταση γεωγραφικών δεδομένων. Επιπλέον, δημιουργήθηκαν δοκιμαστικά δεδομένα, τα οποία συγκρίνουν τον μετασχηματισμό και τη σύνδεση διαφορετικών δεδομένων, αξιοποιώντας τις αρχές του Σημασιολογικού Ιστού.The main research goal of this thesis is to propose an approach for designing an Enterprise Knowledge Graph (EKG) to ensure interoperability between different heterogeneous sources, taking into account the already existing efforts and automation processes developed by ENGIE in their attempt to build a domain-based EKG. Reaching this goal, demands a deep understanding of the already existing state-of-the-art on EKG approaches, their technologies with a focus on data-transformation and query methods and finally a comparative presentation of any new findings in this new challenge of defining an end-to-end formula for EKG construction. The criteria of evaluating the different works have been decided in a way to cover the following questions. (i) Which are the Implications and practical expectations of different design strategies to realize an EKG? (ii) How do those strategies affect semantic complexity and decrease or increase performance? (iii) Is it possible to maintain low latency and permanent updates? Furthermore, our work was limited to one use case defined by ENGIE to explore the open data of accident and the data of road map as a starting point experience in EKG construction. We shall experiment with data transformation from heterogenous data sources into a final unified RDF datastore ready to be used as the foundation of an EKG. After, we are going to present the technical challenges, the vocabulary and the methods used to achieve a solution to the EKG definition problem. Finally, a side goal of the thesis is to practically test and compare technical methods for data integration, enrichment and transformation. Most importantly we are going to test the ability to query Geospatial information which is a key element for this domain-based EKG. In this work, we are presenting the different implementations of RDF stores which support a Geographic Query Language for RDF Data (GeoSPARQL), a W3C standard for geo-related and Semantic representation of geographical data. Furthermore, we have formed our test data as a subset coming from ENGIE's big data. Our test data have been put and benchmarked against the knowledge transformation and linkage phases, using state-of-the-art Semantic tools

    Enriching and validating geographic information on the web

    Get PDF
    The continuous growth of available data on the World Wide Web has led to an unprecedented amount of available information. However, the enormous variance in data quality and trustworthiness of information sources impairs the great potential of the large amount of vacant information. This observation especially applies to geographic information on the Web, i.e., information describing entities that are located on the Earth’s surface. With the advent of mobile devices, the impact of geographic Web information on our everyday life has substantially grown. The mobile devices have also enabled the creation of novel data sources such as OpenStreetMap (OSM), a collaborative crowd-sourced map providing open cartographic information. Today, we use geographic information in many applications, including routing, location recommendation, or geographic question answering. The processing of geographic Web information yields unique challenges. First, the descriptions of geographic entities on the Web are typically not validated. Since not all Web information sources are trustworthy, the correctness of some geographic Web entities is questionable. Second, geographic information sources on the Web are typically isolated from each other. The missing integration of information sources hinders the efficient use of geographic Web information for many applications. Third, the description of geographic entities is typically incomplete. Depending on the application, missing information is a decisive criterion for (not) using a particular data source. Due to the large scale of the Web, the manual correction of these problems is usually not feasible such that automated approaches are required. In this thesis, we tackle these challenges from three different angles. (i) Validation of geographic Web information: We validate geographic Web information by detecting vandalism in OpenStreetMap, for instance, the replacement of a street name with advertisement. To this end, we present the OVID model for automated vandalism detection in OpenStreetMap. (ii) Enrichment of geographic Web information through integration: We integrate OpenStreetMap with other geographic Web information sources, namely knowledge graphs, by identifying entries corresponding to the same world real-world entities in both data sources. We present the OSM2KG model for automated identity link discovery between OSM and knowledge graphs. (iii) Enrichment of missing information in geographic Web information: We consider semantic annotations of geographic entities on Web pages as an additional data source. We exploit existing annotations of categorical properties of Web entities as training data to enrich missing categorical properties in geographic Web information. For all of the proposed models, we conduct extensive evaluations on real-world datasets. Our experimental results confirm that the proposed solutions reliably outperform existing baselines. Furthermore, we demonstrate the utility of geographic Web Information in two application scenarios. (i) Corpus of geographic entity embeddings: We introduce the GeoVectors corpus, a linked open dataset of ready-to-use embeddings of geographic entities. With GeoVectors, we substantially lower the burden to use geographic data in machine learning applications. (ii) Application to event impact prediction: We employ several geographic Web information sources to predict the impact of public events on road traffic. To this end, we use cartographic, event, and event venue information from the Web.Durch die kontinuierliche Zunahme verfügbarer Daten im World Wide Web, besteht heute eine noch nie da gewesene Menge verfügbarer Informationen. Das große Potential dieser Daten wird jedoch durch hohe Schwankungen in der Datenqualität und in der Vertrauenswürdigkeit der Datenquellen geschmälert. Dies kann vor allem am Beispiel von geografischen Web-Informationen beobachtet werden. Geografische Web-Informationen sind Informationen über Entitäten, die über Koordinaten auf der Erdoberfläche verfügen. Die Relevanz von geografischen Web-Informationen für den Alltag ist durch die Verbreitung von internetfähigen, mobilen Endgeräten, zum Beispiel Smartphones, extrem gestiegen. Weiterhin hat die Verfügbarkeit der mobilen Endgeräte auch zur Erstellung neuartiger Datenquellen wie OpenStreetMap (OSM) geführt. OSM ist eine offene, kollaborative Webkarte, die von Freiwilligen dezentral erstellt wird. Mittlerweile ist die Nutzung geografischer Informationen die Grundlage für eine Vielzahl von Anwendungen, wie zum Beispiel Navigation, Reiseempfehlungen oder geografische Frage-Antwort-Systeme. Bei der Verarbeitung geografischer Web-Informationen müssen einzigartige Herausforderungen berücksichtigt werden. Erstens werden die Beschreibungen geografischer Web-Entitäten typischerweise nicht validiert. Da nicht alle Informationsquellen im Web vertrauenswürdig sind, ist die Korrektheit der Darstellung mancher Web-Entitäten fragwürdig. Zweitens sind Informationsquellen im Web oft voneinander isoliert. Die fehlende Integration von Informationsquellen erschwert die effektive Nutzung von geografischen Web-Information in vielen Anwendungsfällen. Drittens sind die Beschreibungen von geografischen Entitäten typischerweise unvollständig. Je nach Anwendung kann das Fehlen von bestimmten Informationen ein entscheidendes Kriterium für die Nutzung einer Datenquelle sein. Da die Größe des Webs eine manuelle Behebung dieser Probleme nicht zulässt, sind automatisierte Verfahren notwendig. In dieser Arbeit nähern wir uns diesen Herausforderungen von drei verschiedenen Richtungen. (i) Validierung von geografischen Web-Informationen: Wir validieren geografische Web-Informationen, indem wir Vandalismus in OpenStreetMap identifizieren, zum Beispiel das Ersetzen von Straßennamen mit Werbetexten. (ii) Anreicherung von geografischen Web-Information durch Integration: Wir integrieren OpenStreetMap mit anderen Informationsquellen im Web (Wissensgraphen), indem wir Einträge in beiden Informationsquellen identifizieren, die den gleichen Echtwelt-Entitäten entsprechen. (iii) Anreicherung von fehlenden geografischen Informationen: Wir nutzen semantische Annotationen von geografischen Entitäten auf Webseiten als weitere Datenquelle. Wir nutzen existierende Annotationen kategorischer Attribute von Web-Entitäten als Trainingsdaten, um fehlende kategorische Attribute in geografischen Web-Informationen zu ergänzen. Wir führen ausführliche Evaluationen für alle beschriebenen Modelle durch. Die vorgestellten Lösungsansätze erzielen verlässlich bessere Ergebnisse als existierende Ansätze. Weiterhin demonstrieren wir den Nutzen von geografischen Web-Informationen in zwei Anwendungsszenarien. (i) Korpus mit Embeddings von geografischen Entitäten: Wir stellen den GeoVectors-Korpus vor, einen verlinkten, offenen Datensatz mit direkt nutzbaren Embeddings von geografischen Web-Entitäten. Der GeoVectors-Korpus erleichtert die Nutzung von geografischen Daten in Anwendungen von maschinellen Lernen erheblich. (ii) Anwendung zur Prognose von Veranstaltungsauswirkungen: Wir nutzen Karten-, Veranstaltungs- und Veranstaltungsstätten-Daten aus dem Web, um die Auswirkungen von Veranstaltungen auf den Straßenverkehr zu prognostizieren

    Technologies for a FAIRer use of Ocean Best Practices

    Get PDF
    The publication and dissemination of best practices in ocean observing is pivotal for multiple aspects of modern marine science, including cross-disciplinary interoperability, improved reproducibility of observations and analyses, and training of new practitioners. Often, best practices are not published in a scientific journal and may not even be formally documented, residing solely within the minds of individuals who pass the information along through direct instruction. Naturally, documenting best practices is essential to accelerate high-quality marine science; however, documentation in a drawer has little impact. To enhance the application and development of best practices, we must leverage contemporary document handling technologies to make best practices discoverable, accessible, and interlinked, echoing the logic of the FAIR data principles [1]

    Development and Evaluation of a Holistic, Cloud-driven and Microservices-based Architecture for Automated Semantic Annotation of Web Documents

    Get PDF
    The Semantic Web is based on the concept of representing information on the web such that computers can both understand and process them. This implies defining context for web information to give them a well-defined meaning. Semantic Annotation defines the process of adding annotation data to web information for the much-needed context. However, despite several solutions and techniques for semantic annotation, it is still faced with challenges which have hindered the growth of the semantic web. With recent significant technological innovations such as Cloud Computing, Internet of Things as well as Mobile Computing and their various integrations with semantic technologies to proffer solutions in IT, little has been done towards leveraging these technologies to address semantic annotation challenges. Hence, this research investigates leveraging cloud computing paradigm to address some semantic annotation challenges, with focus on an automated system for providing semantic annotation as a service. Firstly, considering the current disparate nature observable with most semantic annotation solutions, a holistic perspective to semantic annotation is proposed based on a set of requirements. Then, a capability assessment towards the feasibility of leveraging cloud computing is conducted which produces a Cloud Computing Capability Model for Holistic Semantic Annotation. Furthermore, an investigation into application deployment patterns in the cloud and how they relate to holistic semantic annotation was conducted. A set of determinant factors that define different patterns for application deployment in the cloud were identified and these resulted into the development of a Cloud Computing Maturity Model and the conceptualisation of a “Cloud-Driven” development methodology for holistic semantic annotation in the cloud. Some key components of the “Cloud-Driven” concept include Microservices, Operating System-Level Virtualisation and Orchestration. With the role Microservices Software Architectural Patterns play towards developing solutions that can fully maximise cloud computing benefits; CloudSea: a holistic, cloud-driven and microservices-based architecture for automated semantic annotation of web documents is proposed as a novel approach to semantic annotation. The architecture draws from the theory of “Design Patterns” in Software Engineering towards its design and development which subsequently resulted into the development of twelve Design Patterns and a Pattern Language for Holistic Semantic Annotation, based on the CloudSea architectural design. As proof-of-concept, a prototype implementation for CloudSea was developed and deployed in the cloud based on the “Cloud-Driven” methodology and a functionality evaluation was carried out on it. A comparative evaluation of the CloudSea architecture was also conducted in relation to current semantic annotation solutions; both proposed in academic literature and existing as industry solutions. In addition, to evaluate the proposed Cloud Computing Maturity Model for Holistic Semantic Annotation, an experimental evaluation of the model was conducted by developing and deploying six instances of the prototype and deploying them differently, based on the patterns described in the model. This empirical investigation was implemented by testing the instances for performance through series of API load tests and results obtained confirmed the validity of both the “Cloud-Driven” methodology and the entire model

    Development and Evaluation of a Holistic, Cloud-driven and Microservices-based Architecture for Automated Semantic Annotation of Web Documents

    Get PDF
    The Semantic Web is based on the concept of representing information on the web such that computers can both understand and process them. This implies defining context for web information to give them a well-defined meaning. Semantic Annotation defines the process of adding annotation data to web information for the much-needed context. However, despite several solutions and techniques for semantic annotation, it is still faced with challenges which have hindered the growth of the semantic web. With recent significant technological innovations such as Cloud Computing, Internet of Things as well as Mobile Computing and their various integrations with semantic technologies to proffer solutions in IT, little has been done towards leveraging these technologies to address semantic annotation challenges. Hence, this research investigates leveraging cloud computing paradigm to address some semantic annotation challenges, with focus on an automated system for providing semantic annotation as a service. Firstly, considering the current disparate nature observable with most semantic annotation solutions, a holistic perspective to semantic annotation is proposed based on a set of requirements. Then, a capability assessment towards the feasibility of leveraging cloud computing is conducted which produces a Cloud Computing Capability Model for Holistic Semantic Annotation. Furthermore, an investigation into application deployment patterns in the cloud and how they relate to holistic semantic annotation was conducted. A set of determinant factors that define different patterns for application deployment in the cloud were identified and these resulted into the development of a Cloud Computing Maturity Model and the conceptualisation of a “Cloud-Driven” development methodology for holistic semantic annotation in the cloud. Some key components of the “Cloud-Driven” concept include Microservices, Operating System-Level Virtualisation and Orchestration. With the role Microservices Software Architectural Patterns play towards developing solutions that can fully maximise cloud computing benefits; CloudSea: a holistic, cloud-driven and microservices-based architecture for automated semantic annotation of web documents is proposed as a novel approach to semantic annotation. The architecture draws from the theory of “Design Patterns” in Software Engineering towards its design and development which subsequently resulted into the development of twelve Design Patterns and a Pattern Language for Holistic Semantic Annotation, based on the CloudSea architectural design. As proof-of-concept, a prototype implementation for CloudSea was developed and deployed in the cloud based on the “Cloud-Driven” methodology and a functionality evaluation was carried out on it. A comparative evaluation of the CloudSea architecture was also conducted in relation to current semantic annotation solutions; both proposed in academic literature and existing as industry solutions. In addition, to evaluate the proposed Cloud Computing Maturity Model for Holistic Semantic Annotation, an experimental evaluation of the model was conducted by developing and deploying six instances of the prototype and deploying them differently, based on the patterns described in the model. This empirical investigation was implemented by testing the instances for performance through series of API load tests and results obtained confirmed the validity of both the “Cloud-Driven” methodology and the entire model

    Representação aberta e semântica de anotações de incidentes em mapas web

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2013.É crescente o número de ferramentas Web que adotam crowdsourcing, via anotações colaborativas em mapas, como modelo de resolução de problemas e produção de dados. Nestas iniciativas, grandes grupos de usuários podem anotar, colaborativamente, coisas espaciais em um mapa, como lugares e/ou incidentes relacionados à segurança, saúde e transporte. Idealmente, estas iniciativas deveriam produzir dados abertos interligados (LOD - Linked Open Data), de forma a permitir que as pessoas ou sistemas possam compartilhar as anotações e o conhecimento gerado na forma de dados estruturados e, consequentemente, melhorar o mapeamento e a resolução de incidentes. Este trabalho propõe uma abordagem de produção de anotações abertas e semânticas de incidentes em mapas. Nesta abordagem, as anotações são representadas usando o modelo de dados proposto pelo W3C Open Annotation e adota como alvo das anotações coordenadas geoespaciais referenciadas por meio do URI geo. Além disso, foram combinadas às anotações em mapas Web tecnologias da Web Semântica, com o intuito de enriquecer os mapas com informações semânticas. Para demonstrar a viabilidade da abordagem proposta, são desenvolvidos dois protótipos que utilizam a abordagem proposta, de forma a permitir a representação aberta e semântica de anotações, associadas a coordenadas geoespaciais.Abstract : There is an increasing number of initiatives using Web-based mapping systems that rely on crowdsourcing as a collaborative problem-solving and data production model. In these initiatives, large groups of users can collaboratively annotate spatial things on a map, such as places or incidents related to security, health and transportation. Ideally, these crowdsourcing initiatives should produce Linked Open Data (LOD) to enable people or systems to share the annotations and the generated knowledge as structured data and, consequently, improve mapping and resolution of incidents. This work presents an approach for producing open and semantic annotations of incidents on maps. In this approach, annotations are represented using the W3C Open Annotation data model and adopts geospatial coordinates referenced using the geo URI as the annotation target. Moreover, it is combined some semantic Web technologies with crowdsourced map annotations, in a way that it enriches maps with semantic information. To demonstrate the feasibility of our approach,two prototypes using the proposed approach are developed, in order to allow the open and semantic representation of annotations associated with geospatial coordinates
    corecore