47 research outputs found

    Scalable Data Integration for Linked Data

    Get PDF
    Linked Data describes an extensive set of structured but heterogeneous datasources where entities are connected by formal semantic descriptions. In thevision of the Semantic Web, these semantic links are extended towards theWorld Wide Web to provide as much machine-readable data as possible forsearch queries. The resulting connections allow an automatic evaluation to findnew insights into the data. Identifying these semantic connections betweentwo data sources with automatic approaches is called link discovery. We derivecommon requirements and a generic link discovery workflow based on similaritiesbetween entity properties and associated properties of ontology concepts. Mostof the existing link discovery approaches disregard the fact that in times ofBig Data, an increasing volume of data sources poses new demands on linkdiscovery. In particular, the problem of complex and time-consuming linkdetermination escalates with an increasing number of intersecting data sources.To overcome the restriction of pairwise linking of entities, holistic clusteringapproaches are needed to link equivalent entities of multiple data sources toconstruct integrated knowledge bases. In this context, the focus on efficiencyand scalability is essential. For example, reusing existing links or backgroundinformation can help to avoid redundant calculations. However, when dealingwith multiple data sources, additional data quality problems must also be dealtwith. This dissertation addresses these comprehensive challenges by designingholistic linking and clustering approaches that enable reuse of existing links.Unlike previous systems, we execute the complete data integration workflowvia a distributed processing system. At first, the LinkLion portal will beintroduced to provide existing links for new applications. These links act asa basis for a physical data integration process to create a unified representationfor equivalent entities from many data sources. We then propose a holisticclustering approach to form consolidated clusters for same real-world entitiesfrom many different sources. At the same time, we exploit the semantic typeof entities to improve the quality of the result. The process identifies errorsin existing links and can find numerous additional links. Additionally, theentity clustering has to react to the high dynamics of the data. In particular,this requires scalable approaches for continuously growing data sources withmany entities as well as additional new sources. Previous entity clusteringapproaches are mostly static, focusing on the one-time linking and clustering ofentities from few sources. Therefore, we propose and evaluate new approaches for incremental entity clustering that supports the continuous addition of newentities and data sources. To cope with the ever-increasing number of LinkedData sources, efficient and scalable methods based on distributed processingsystems are required. Thus we propose distributed holistic approaches to linkmany data sources based on a clustering of entities that represent the samereal-world object. The implementation is realized on Apache Flink. In contrastto previous approaches, we utilize efficiency-enhancing optimizations for bothdistributed static and dynamic clustering. An extensive comparative evaluationof the proposed approaches with various distributed clustering strategies showshigh effectiveness for datasets from multiple domains as well as scalability on amulti-machine Apache Flink cluster

    Deliverable D9.3 Final Project Report

    Get PDF
    This document comprises the final report of LinkedTV. It includes a publishable summary, a plan for use and dissemination of foreground and a report covering the wider societal implications of the project in the form of a questionnaire

    Linked Data Entity Summarization

    Get PDF
    On the Web, the amount of structured and Linked Data about entities is constantly growing. Descriptions of single entities often include thousands of statements and it becomes difficult to comprehend the data, unless a selection of the most relevant facts is provided. This doctoral thesis addresses the problem of Linked Data entity summarization. The contributions involve two entity summarization approaches, a common API for entity summarization, and an approach for entity data fusion

    Distributed Semantic Social Networks: Architecture, Protocols and Applications

    Get PDF
    Online social networking has become one of the most popular services on the Web. Especially Facebook with its 845Mio+ monthly active users and 100Mrd+ friendship relations creates a Web inside the Web. Drawing on the metaphor of islands, Facebook is becoming more like a continent. However, users are locked up on this continent with hardly any opportunity to communicate easily with users on other islands and continents or even to relocate trans-continentally. In addition to that, privacy, data ownership and freedom of communication issues are problematically in centralized environments. The idea of distributed social networking enables users to overcome the drawbacks of centralized social networks. The goal of this thesis is to provide an architecture for distributed social networking based on semantic technologies. This architecture consists of semantic artifacts, protocols and services which enable social network applications to work in a distributed environment and with semantic interoperability. Furthermore, this thesis presents applications for distributed semantic social networking and discusses user interfaces, architecture and communication strategies for this application category.Soziale Netzwerke gehören zu den beliebtesten Online Diensten im World Wide Web. Insbesondere Facebook mit seinen mehr als 845 Mio. aktiven Nutzern im Monat und mehr als 100 Mrd. Nutzer- Beziehungen erzeugt ein eigenständiges Web im Web. Den Nutzern dieser Sozialen Netzwerke ist es jedoch schwer möglich mit Nutzern in anderen Sozialen Netzwerken zu kommunizieren oder aber mit ihren Daten in ein anderes Netzwerk zu ziehen. Zusätzlich dazu werden u.a. Privatsphäre, Eigentumsrechte an den eigenen Daten und uneingeschränkte Freiheit in der Kommunikation als problematisch empfunden. Die Idee verteilter Soziale Netzwerke ermöglicht es, diese Probleme zentralisierter Sozialer Netzwerke zu überwinden. Das Ziel dieser Arbeit ist die Darstellung einer Architektur verteilter Soziale Netzwerke welche auf semantischen Technologien basiert. Diese Architektur besteht aus semantischen Artefakten, Protokollen und Diensten und ermöglicht die Kommunikation von Sozialen Anwendungen in einer verteilten Infrastruktur. Darüber hinaus präsentiert diese Arbeit mehrere Applikationen für verteilte semantische Soziale Netzwerke und diskutiert deren Nutzer-Schnittstellen, Architektur und Kommunikationsstrategien. 

    What are Links in Linked Open Data? A Characterization and Evaluation of Links between Knowledge Graphs on the Web

    Get PDF
    Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable and reusable datasets. We argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. Firstly, in order to define boundaries of single coherent knowledge graphs within Linked Data, a principled notion of what a dataset is, or, respectively, what links within and between datasets are, has been missing. Secondly, we argue that in order to enable FAIR knowledge graphs, Linked Data misses standardised findability and accessability mechanism, via a single entry link. In order to address the first issue, we (i) propose a rigorous definition of a naming authority for a Linked Data dataset (ii) define different link types for data in Linked datasets, (iii) provide an empirical analysis of linkage among the datasets of the Linked Open Data cloud, and (iv) analyse the dereferenceability of those links. We base our analyses and link computations on a scalable mechanism implemented on top of the HDT format, which allows us to analyse quantity and quality of different link types at scale.Series: Working Papers on Information Systems, Information Business and Operation

    Knowledge-Driven Harmonization of Sensor Observations: Exploiting Linked Open Data for IoT Data Streams

    Get PDF
    The rise of the Internet of Things leads to an unprecedented number of continuous sensor observations that are available as IoT data streams. Harmonization of such observations is a labor-intensive task due to heterogeneity in format, syntax, and semantics. We aim to reduce the effort for such harmonization tasks by employing a knowledge-driven approach. To this end, we pursue the idea of exploiting the large body of formalized public knowledge represented as statements in Linked Open Data

    A survey of large-scale reasoning on the Web of data

    Get PDF
    As more and more data is being generated by sensor networks, social media and organizations, the Webinterlinking this wealth of information becomes more complex. This is particularly true for the so-calledWeb of Data, in which data is semantically enriched and interlinked using ontologies. In this large anduncoordinated environment, reasoning can be used to check the consistency of the data and of asso-ciated ontologies, or to infer logical consequences which, in turn, can be used to obtain new insightsfrom the data. However, reasoning approaches need to be scalable in order to enable reasoning over theentire Web of Data. To address this problem, several high-performance reasoning systems, whichmainly implement distributed or parallel algorithms, have been proposed in the last few years. Thesesystems differ significantly; for instance in terms of reasoning expressivity, computational propertiessuch as completeness, or reasoning objectives. In order to provide afirst complete overview of thefield,this paper reports a systematic review of such scalable reasoning approaches over various ontologicallanguages, reporting details about the methods and over the conducted experiments. We highlight theshortcomings of these approaches and discuss some of the open problems related to performing scalablereasoning

    Efficient Management for Geospatial and Temporal Data using Ontology-based Data Access Techniques

    Get PDF
    Το μοντέλο δεδομένων RDF και η γλώσσα επερωτήσεων SPARQL είναι ευρέως διαδεδομένα για την χρήση τους με σκοπό την ενοποίηση πληροφορίας που προέρχεται από διαφορετικές πηγές. Ο αυξανόμενος αριθμός των γεωχωρικών συνόλων δεδομένων που είναι πλέον διαθέσιμα σαν γεωχωρικά διασυνδεδεμένα δεδομένα οδήγησε στην εμφάνιση επεκτάσεων του μοντέλου δεδομένων RDF και της γλώσσας επερωτήσεων SPARQL. Δύο από τις σημαντικότερες επεκτάσεις αυτές είναι η γλώσσα GeoSPARQL, η οποία έγινε OGC πρότυπο, και το πλαίσιο του μοντέλου δεδομένων stRDF και της γλώσσας επερωτήσεων stSPARQL. Και οι δύο προσεγγίσεις μπορούν να χρησιμοποιηθούν για την αναπαράσταση και επερώτηση διασυνδεδεμένων γεωχωρικών δεδομένων, ενώ το μοντέλο stRDF και η γλώσσα stSPARQL παρέχουν επίσης επιπλέον λειτουργικότητα για την αναπαράσταση και επερώτηση χρονικών δεδομένων. Παρότι ο αριθμός των δεδομένων που είναι διαθέσιμα σαν γεωχωρικά ή και χρονικά διασυνδεδεμένα δεδομένα αυξάνεται, η μετατροπή των γεωχωρικών δεδομένων σε RDF και η αποθήκευσή τους σε αποθετήρια RDF δεν είναι πάντα η βέλτιστη λύση, ειδικά όταν τα δεδομένα βρίσκονται εξαρχής σε σχεσιακές βάσεις οι οποίες μπορεί να έχουν αρκετά μεγάλο μέγεθος ή και να ενημερώνονται πολύ συχνά. Στα πλαίσια αυτής της διδακτορικής διατριβής, προτείνουμε μια λύση βασισμένη στην ανάκτηση πληροφορίας με χρήση οντολογιών και αντιστοιχίσεων για την επερώτηση δεδομένων πάνω από γεωχωρικές σχεσιακές βάσεις δεδομένων. Επεκτείνουμε τεχνικές επανεγγραφής GeoSPARQL ερωτημάτων σε SQL ώστε η αποτίμηση των επερωτήσεων να γίνεται εξολοκλήρου στο γεωχωρικό σύστημα διαχείρισης βάσεων δεδομένων. Επίσης, εισαγάγουμε επιπλέον λειτουργικότητα στη χρονική συνιστώσα του μοντέλου δεδομένων stRDF και της γλώσσας επερωτήσεων stSPARQL, προκειμένου να διευκολυνθεί η υποστήριξη χρονικών τελεστών σε συστήματα OBDA. Στη συνέχεια, επεκτείνουμε τις παραπάνω μεθόδους με την υποστήριξη διαφορετικών πηγών δεδομένων πέρα από σχεσιακές βάσεις και παρουσιάζουμε μια OBDA προσέγγιση που επιτρέπει τη δημιουργία εικονικών RDF γράφων πάνω από δεδομένα που βρίσκονται διαθέσιμα στο διαδίκτυο σε διάφορες μορφές (πχ. HTML πίνακες, web διεπαφές), με χρήση οντολογιών και αντιστοιχίσεων. Συγκρίναμε την απόδοση του συστήματός μας με ένα σχετικό σύστημα και τα αποτελέσματα έδειξαν ότι πέραν του ότι το σύστημά μας παρέχει μεγαλύτερη λειτουργικότητα (πχ. υποστηρίζει περισσότερα είδη πηγών δεδομένων, περιλαμβάνει απλούστερες διαδικασίες και εξασφαλίζει καλύτερη απόδοση. Τέλος, παρουσιάζουμε την εφαρμογή των μεθόδων και συστημάτων που περιγράφονται στη διατριβή σε πραγματικά σενάρια χρήσης.The data model RDF and query language SPARQL have been widely used for the integration of data coming from different souces. Due to the increasing number of geospatial datasets that are being available as linked open data, a lot of effort focuses in the development of geospatial (and temporal, accordingly) extensions of the framework of RDF and SPARQL. Two highlights of these efforts are the query language GeoSPARQL, that is an OGC standard, and the framework of stRDF and stSPARQL. Both frameworks can be used for the representation and querying of linked geospatial data, and stSPARQL also includes a temporal dimension. Although a lot of geospatial (and some temporal) RDF stores started to emerge, converting geospatial data into RDF and then storing it into an RDF stores is not always best practice, especially when the data exists in a relational database that is fairly large and/or it gets updated frequently. In this thesis, we propose an Ontology-based Data Access (OBDA) approach for accessing geospatial data stored in geospatial relational databases, using the OGC standard GeoSPARQL and R2RML or OBDA mappings. We introduce extensions to an existing SPARQL-to-SQL translation method to support GeoSPARQL features. We describe the implementation of our approach in the system Ontop-spatial, an extension of the OBDA system Ontop for creating virtual geospatial RDF graphs on top of geospatial relational databases. Ontop-spatial is the first geospatial OBDA system and outperforms state-of-the-art geospatial RDF stores. We also show how to answer queries with temproal operators in the OBDA framework, by utilizing the framework stRDF and the query language stSPARQL which we extend with some new features. Next, we extend the data sources supported by Ontop-spatial going beyond relational database management systems, and we present our OBDA solutions for creating virtual RDF graphs on top of various web data sources (e.g., HTML tables, Web APIs) using ontologies and mappings. We compared the performance of our approach with a related implementation and evaluation results showed that not only does Ontop-spatial support more functionalities (e.g., more data sources, more simple workflow), but it also achieves better performance. Last, we describe how the work described in this thesis is applied in real-world application scenarios
    corecore