121 research outputs found

    Scalable Data Integration for Linked Data

    Get PDF
    Linked Data describes an extensive set of structured but heterogeneous datasources where entities are connected by formal semantic descriptions. In thevision of the Semantic Web, these semantic links are extended towards theWorld Wide Web to provide as much machine-readable data as possible forsearch queries. The resulting connections allow an automatic evaluation to findnew insights into the data. Identifying these semantic connections betweentwo data sources with automatic approaches is called link discovery. We derivecommon requirements and a generic link discovery workflow based on similaritiesbetween entity properties and associated properties of ontology concepts. Mostof the existing link discovery approaches disregard the fact that in times ofBig Data, an increasing volume of data sources poses new demands on linkdiscovery. In particular, the problem of complex and time-consuming linkdetermination escalates with an increasing number of intersecting data sources.To overcome the restriction of pairwise linking of entities, holistic clusteringapproaches are needed to link equivalent entities of multiple data sources toconstruct integrated knowledge bases. In this context, the focus on efficiencyand scalability is essential. For example, reusing existing links or backgroundinformation can help to avoid redundant calculations. However, when dealingwith multiple data sources, additional data quality problems must also be dealtwith. This dissertation addresses these comprehensive challenges by designingholistic linking and clustering approaches that enable reuse of existing links.Unlike previous systems, we execute the complete data integration workflowvia a distributed processing system. At first, the LinkLion portal will beintroduced to provide existing links for new applications. These links act asa basis for a physical data integration process to create a unified representationfor equivalent entities from many data sources. We then propose a holisticclustering approach to form consolidated clusters for same real-world entitiesfrom many different sources. At the same time, we exploit the semantic typeof entities to improve the quality of the result. The process identifies errorsin existing links and can find numerous additional links. Additionally, theentity clustering has to react to the high dynamics of the data. In particular,this requires scalable approaches for continuously growing data sources withmany entities as well as additional new sources. Previous entity clusteringapproaches are mostly static, focusing on the one-time linking and clustering ofentities from few sources. Therefore, we propose and evaluate new approaches for incremental entity clustering that supports the continuous addition of newentities and data sources. To cope with the ever-increasing number of LinkedData sources, efficient and scalable methods based on distributed processingsystems are required. Thus we propose distributed holistic approaches to linkmany data sources based on a clustering of entities that represent the samereal-world object. The implementation is realized on Apache Flink. In contrastto previous approaches, we utilize efficiency-enhancing optimizations for bothdistributed static and dynamic clustering. An extensive comparative evaluationof the proposed approaches with various distributed clustering strategies showshigh effectiveness for datasets from multiple domains as well as scalability on amulti-machine Apache Flink cluster

    Acta Polytechnica Hungarica 2016

    Get PDF

    Semantic Systems. The Power of AI and Knowledge Graphs

    Get PDF
    This open access book constitutes the refereed proceedings of the 15th International Conference on Semantic Systems, SEMANTiCS 2019, held in Karlsruhe, Germany, in September 2019. The 20 full papers and 8 short papers presented in this volume were carefully reviewed and selected from 88 submissions. They cover topics such as: web semantics and linked (open) data; machine learning and deep learning techniques; semantic information management and knowledge integration; terminology, thesaurus and ontology management; data mining and knowledge discovery; semantics in blockchain and distributed ledger technologies

    frances : cloud-based historical text mining with deep learning and parallel processing

    Get PDF
    frances is an advanced cloud-based text mining digital platform that leverages information extraction, knowledge graphs, natural language processing (NLP), deep learning, and parallel processing techniques. It has been specifically designed to unlock the full potential of historical digital textual collections, such as those from the National Library of Scotland, offering cloud-based capabilities and extended support for complex NLP analyses and data visualizations. frances enables realtime recurrent operational text mining and provides robust capabilities for temporal analysis, accompanied by automatic visualizations for easy result inspection. In this paper, we present the motivation behind the development of frances, emphasizing its innovative design and novel implementation aspects. We also outline future development directions. Additionally, we evaluate the platform through two comprehensive case studies in history and publishing history. Feedback from participants in these studies demonstrates that frances accelerates their work and facilitates rapid testing and dissemination of ideas.Postprin

    Alexandria: Extensible Framework for Rapid Exploration of Social Media

    Full text link
    The Alexandria system under development at IBM Research provides an extensible framework and platform for supporting a variety of big-data analytics and visualizations. The system is currently focused on enabling rapid exploration of text-based social media data. The system provides tools to help with constructing "domain models" (i.e., families of keywords and extractors to enable focus on tweets and other social media documents relevant to a project), to rapidly extract and segment the relevant social media and its authors, to apply further analytics (such as finding trends and anomalous terms), and visualizing the results. The system architecture is centered around a variety of REST-based service APIs to enable flexible orchestration of the system capabilities; these are especially useful to support knowledge-worker driven iterative exploration of social phenomena. The architecture also enables rapid integration of Alexandria capabilities with other social media analytics system, as has been demonstrated through an integration with IBM Research's SystemG. This paper describes a prototypical usage scenario for Alexandria, along with the architecture and key underlying analytics.Comment: 8 page

    ΠžΠΊΡ€ΡƒΠΆΠ΅ΡšΠ΅ Π·Π° Π°Π½Π°Π»ΠΈΠ·Ρƒ ΠΈ ΠΎΡ†Π΅Π½Ρƒ ΠΊΠ²Π°Π»ΠΈΡ‚Π΅Ρ‚Π° Π²Π΅Π»ΠΈΠΊΠΈΡ… ΠΈ ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΡ… ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ°

    Get PDF
    Linking and publishing data in the Linked Open Data format increases the interoperability and discoverability of resources over the Web. To accomplish this, the process comprises several design decisions, based on the Linked Data principles that, on one hand, recommend to use standards for the representation and the access to data on the Web, and on the other hand to set hyperlinks between data from different sources. Despite the efforts of the World Wide Web Consortium (W3C), being the main international standards organization for the World Wide Web, there is no one tailored formula for publishing data as Linked Data. In addition, the quality of the published Linked Open Data (LOD) is a fundamental issue, and it is yet to be thoroughly managed and considered. In this doctoral thesis, the main objective is to design and implement a novel framework for selecting, analyzing, converting, interlinking, and publishing data from diverse sources, simultaneously paying great attention to quality assessment throughout all steps and modules of the framework. The goal is to examine whether and to what extent are the Semantic Web technologies applicable for merging data from different sources and enabling end-users to obtain additional information that was not available in individual datasets, in addition to the integration into the Semantic Web community space. Additionally, the Ph.D. thesis intends to validate the applicability of the process in the specific and demanding use case, i.e. for creating and publishing an Arabic Linked Drug Dataset, based on open drug datasets from selected Arabic countries and to discuss the quality issues observed in the linked data life-cycle. To that end, in this doctoral thesis, a Semantic Data Lake was established in the pharmaceutical domain that allows further integration and developing different business services on top of the integrated data sources. Through data representation in an open machine-readable format, the approach offers an optimum solution for information and data dissemination for building domain-specific applications, and to enrich and gain value from the original dataset. This thesis showcases how the pharmaceutical domain benefits from the evolving research trends for building competitive advantages. However, as it is elaborated in this thesis, a better understanding of the specifics of the Arabic language is required to extend linked data technologies utilization in targeted Arabic organizations.ПовСзивањС ΠΈ ΠΎΠ±Ρ˜Π°Π²Ρ™ΠΈΠ²Π°ΡšΠ΅ ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° Ρƒ Ρ„ΠΎΡ€ΠΌΠ°Ρ‚Ρƒ "ПовСзани ΠΎΡ‚Π²ΠΎΡ€Π΅Π½ΠΈ ΠΏΠΎΠ΄Π°Ρ†ΠΈ" (Π΅Π½Π³. Linked Open Data) ΠΏΠΎΠ²Π΅Ρ›Π°Π²Π° интСропСрабилност ΠΈ могућности Π·Π° ΠΏΡ€Π΅Ρ‚Ρ€Π°ΠΆΠΈΠ²Π°ΡšΠ΅ рСсурса ΠΏΡ€Π΅ΠΊΠΎ Web-Π°. ΠŸΡ€ΠΎΡ†Π΅Ρ јС заснован Π½Π° Linked Data ΠΏΡ€ΠΈΠ½Ρ†ΠΈΠΏΠΈΠΌΠ° (W3C, 2006) који са јСднС странС Π΅Π»Π°Π±ΠΎΡ€ΠΈΡ€Π° стандардС Π·Π° ΠΏΡ€Π΅Π΄ΡΡ‚Π°Π²Ρ™Π°ΡšΠ΅ ΠΈ приступ ΠΏΠΎΠ΄Π°Ρ†ΠΈΠΌΠ° Π½Π° WΠ΅Π±Ρƒ (RDF, OWL, SPARQL), Π° са Π΄Ρ€ΡƒΠ³Π΅ странС, ΠΏΡ€ΠΈΠ½Ρ†ΠΈΠΏΠΈ ΡΡƒΠ³Π΅Ρ€ΠΈΡˆΡƒ ΠΊΠΎΡ€ΠΈΡˆΡ›Π΅ΡšΠ΅ Ρ…ΠΈΠΏΠ΅Ρ€Π²Π΅Π·Π° ΠΈΠ·ΠΌΠ΅Ρ’Ρƒ ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° ΠΈΠ· Ρ€Π°Π·Π»ΠΈΡ‡ΠΈΡ‚ΠΈΡ… ΠΈΠ·Π²ΠΎΡ€Π°. Упркос Π½Π°ΠΏΠΎΡ€ΠΈΠΌΠ° W3C ΠΊΠΎΠ½Π·ΠΎΡ€Ρ†ΠΈΡ˜ΡƒΠΌΠ° (W3C јС Π³Π»Π°Π²Π½Π° ΠΌΠ΅Ρ’ΡƒΠ½Π°Ρ€ΠΎΠ΄Π½Π° ΠΎΡ€Π³Π°Π½ΠΈΠ·Π°Ρ†ΠΈΡ˜Π° Π·Π° стандардС Π·Π° Web-Ρƒ), Π½Π΅ ΠΏΠΎΡΡ‚ΠΎΡ˜ΠΈ Ρ˜Π΅Π΄ΠΈΠ½ΡΡ‚Π²Π΅Π½Π° Ρ„ΠΎΡ€ΠΌΡƒΠ»Π° Π·Π° ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½Ρ‚Π°Ρ†ΠΈΡ˜Ρƒ процСса ΠΎΠ±Ρ˜Π°Π²Ρ™ΠΈΠ²Π°ΡšΠ΅ ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° Ρƒ Linked Data Ρ„ΠΎΡ€ΠΌΠ°Ρ‚Ρƒ. Π£Π·ΠΈΠΌΠ°Ρ˜ΡƒΡ›ΠΈ Ρƒ ΠΎΠ±Π·ΠΈΡ€ Π΄Π° јС ΠΊΠ²Π°Π»ΠΈΡ‚Π΅Ρ‚ ΠΎΠ±Ρ˜Π°Π²Ρ™Π΅Π½ΠΈΡ… ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΡ… ΠΎΡ‚Π²ΠΎΡ€Π΅Π½ΠΈΡ… ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° ΠΎΠ΄Π»ΡƒΡ‡ΡƒΡ˜ΡƒΡ›ΠΈ Π·Π° Π±ΡƒΠ΄ΡƒΡ›ΠΈ Ρ€Π°Π·Π²ΠΎΡ˜ Web-Π°, Ρƒ овој Π΄ΠΎΠΊΡ‚ΠΎΡ€ΡΠΊΠΎΡ˜ Π΄ΠΈΡΠ΅Ρ€Ρ‚Π°Ρ†ΠΈΡ˜ΠΈ, Π³Π»Π°Π²Π½ΠΈ Ρ†ΠΈΡ™ јС (1) дизајн ΠΈ ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½Ρ‚Π°Ρ†ΠΈΡ˜Π° ΠΈΠ½ΠΎΠ²Π°Ρ‚ΠΈΠ²Π½ΠΎΠ³ ΠΎΠΊΠ²ΠΈΡ€Π° Π·Π° ΠΈΠ·Π±ΠΎΡ€, Π°Π½Π°Π»ΠΈΠ·Ρƒ, ΠΊΠΎΠ½Π²Π΅Ρ€Π·ΠΈΡ˜Ρƒ, мСђусобно повСзивањС ΠΈ ΠΎΠ±Ρ˜Π°Π²Ρ™ΠΈΠ²Π°ΡšΠ΅ ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° ΠΈΠ· Ρ€Π°Π·Π»ΠΈΡ‡ΠΈΡ‚ΠΈΡ… ΠΈΠ·Π²ΠΎΡ€Π° ΠΈ (2) Π°Π½Π°Π»ΠΈΠ·Π° ΠΏΡ€ΠΈΠΌΠ΅Π½Π° ΠΎΠ²ΠΎΠ³ приступа Ρƒ Ρ„Π°Ρ€ΠΌΠ°Ρ†eутском Π΄ΠΎΠΌΠ΅Π½Ρƒ. ΠŸΡ€Π΅Π΄Π»ΠΎΠΆΠ΅Π½Π° докторска Π΄ΠΈΡΠ΅Ρ€Ρ‚Π°Ρ†ΠΈΡ˜Π° Π΄Π΅Ρ‚Π°Ρ™Π½ΠΎ ΠΈΡΡ‚Ρ€Π°ΠΆΡƒΡ˜Π΅ ΠΏΠΈΡ‚Π°ΡšΠ΅ ΠΊΠ²Π°Π»ΠΈΡ‚Π΅Ρ‚Π° Π²Π΅Π»ΠΈΠΊΠΈΡ… ΠΈ ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΡ… СкосистСма ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° (Π΅Π½Π³. Linked Data Ecosystems), ΡƒΠ·ΠΈΠΌΠ°Ρ˜ΡƒΡ›ΠΈ Ρƒ ΠΎΠ±Π·ΠΈΡ€ могућност ΠΏΠΎΠ½ΠΎΠ²Π½ΠΎΠ³ ΠΊΠΎΡ€ΠΈΡˆΡ›Π΅ΡšΠ° ΠΎΡ‚Π²ΠΎΡ€Π΅Π½ΠΈΡ… ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ°. Π Π°Π΄ јС мотивисан ΠΏΠΎΡ‚Ρ€Π΅Π±ΠΎΠΌ Π΄Π° сС ΠΎΠΌΠΎΠ³ΡƒΡ›ΠΈ истраТивачима ΠΈΠ· арапских Π·Π΅ΠΌΠ°Ρ™Π° Π΄Π° ΡƒΠΏΠΎΡ‚Ρ€Π΅Π±ΠΎΠΌ сСмантичких Π²Π΅Π± Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΡ˜Π° ΠΏΠΎΠ²Π΅ΠΆΡƒ својС ΠΏΠΎΠ΄Π°Ρ‚ΠΊΠ΅ са ΠΎΡ‚Π²ΠΎΡ€Π΅Π½ΠΈΠΌ ΠΏΠΎΠ΄Π°Ρ†ΠΈΠΌΠ°, ΠΊΠ°ΠΎ Π½ΠΏΡ€. DBpedia-јом. Π¦ΠΈΡ™ јС Π΄Π° сС испита Π΄Π° Π»ΠΈ ΠΎΡ‚Π²ΠΎΡ€Π΅Π½ΠΈ ΠΏΠΎΠ΄Π°Ρ†ΠΈ ΠΈΠ· Арапских Π·Π΅ΠΌΠ°Ρ™Π° ΠΎΠΌΠΎΠ³ΡƒΡ›Π°Π²Π°Ρ˜Ρƒ ΠΊΡ€Π°Ρ˜ΡšΠΈΠΌ корисницима Π΄Π° Π΄ΠΎΠ±ΠΈΡ˜Ρƒ Π΄ΠΎΠ΄Π°Ρ‚Π½Π΅ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡ˜Π΅ којС нису доступнС Ρƒ ΠΏΠΎΡ˜Π΅Π΄ΠΈΠ½Π°Ρ‡Π½ΠΈΠΌ скуповима ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ°, ΠΏΠΎΡ€Π΅Π΄ ΠΈΠ½Ρ‚Π΅Π³Ρ€Π°Ρ†ΠΈΡ˜Π΅ Ρƒ сСмантички WΠ΅Π± простор. Докторска Π΄ΠΈΡΠ΅Ρ€Ρ‚Π°Ρ†ΠΈΡ˜Π° ΠΏΡ€Π΅Π΄Π»Π°ΠΆΠ΅ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ»ΠΎΠ³ΠΈΡ˜Ρƒ Π·Π° Ρ€Π°Π·Π²ΠΎΡ˜ Π°ΠΏΠ»ΠΈΠΊΠ°Ρ†ΠΈΡ˜Π΅ Π·Π° Ρ€Π°Π΄ са ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΠΌ (Linked) ΠΏΠΎΠ΄Π°Ρ†ΠΈΠΌΠ° ΠΈ ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½Ρ‚ΠΈΡ€Π° софтвСрско Ρ€Π΅ΡˆΠ΅ΡšΠ΅ којС ΠΎΠΌΠΎΠ³ΡƒΡ›ΡƒΡ˜Π΅ ΠΏΡ€Π΅Ρ‚Ρ€Π°ΠΆΠΈΠ²Π°ΡšΠ΅ консолидованог скупа ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° ΠΎ Π»Π΅ΠΊΠΎΠ²ΠΈΠΌΠ° ΠΈΠ· ΠΈΠ·Π°Π±Ρ€Π°Π½ΠΈΡ… арапских Π·Π΅ΠΌΠ°Ρ™Π°. Консолидовани скуп ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° јС ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½Ρ‚ΠΈΡ€Π°Π½ Ρƒ ΠΎΠ±Π»ΠΈΠΊΡƒ Π‘Π΅ΠΌΠ°Π½Ρ‚ΠΈΡ‡ΠΊΠΎΠ³ Ρ˜Π΅Π·Π΅Ρ€Π° ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° (Π΅Π½Π³. Semantic Data Lake). Ова Ρ‚Π΅Π·Π° ΠΏΠΎΠΊΠ°Π·ΡƒΡ˜Π΅ ΠΊΠ°ΠΊΠΎ фармацСутска ΠΈΠ½Π΄ΡƒΡΡ‚Ρ€ΠΈΡ˜Π° ΠΈΠΌΠ° користи ΠΎΠ΄ ΠΏΡ€ΠΈΠΌΠ΅Π½Π΅ ΠΈΠ½ΠΎΠ²Π°Ρ‚ΠΈΠ²Π½ΠΈΡ… Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΡ˜Π° ΠΈ истраТивачких Ρ‚Ρ€Π΅Π½Π΄ΠΎΠ²Π° ΠΈΠ· области сСмантичких Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΡ˜Π°. ΠœΠ΅Ρ’ΡƒΡ‚ΠΈΠΌ, ΠΊΠ°ΠΊΠΎ јС Π΅Π»Π°Π±ΠΎΡ€ΠΈΡ€Π°Π½ΠΎ Ρƒ овој Ρ‚Π΅Π·ΠΈ, ΠΏΠΎΡ‚Ρ€Π΅Π±Π½ΠΎ јС Π±ΠΎΡ™Π΅ Ρ€Π°Π·ΡƒΠΌΠ΅Π²Π°ΡšΠ΅ спСцифичности арапског јСзика Π·Π° ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½Ρ‚Π°Ρ†ΠΈΡ˜Ρƒ Linked Data Π°Π»Π°Ρ‚Π° ΠΈ ΡšΡƒΡ…ΠΎΠ²Ρƒ ΠΏΡ€ΠΈΠΌΠ΅Π½Ρƒ са ΠΏΠΎΠ΄Π°Ρ†ΠΈΠΌΠ° ΠΈΠ· Арапских Π·Π΅ΠΌΠ°Ρ™Π°

    Distributed Holistic Clustering on Linked Data

    Full text link
    Link discovery is an active field of research to support data integration in the Web of Data. Due to the huge size and number of available data sources, efficient and effective link discovery is a very challenging task. Common pairwise link discovery approaches do not scale to many sources with very large entity sets. We here propose a distributed holistic approach to link many data sources based on a clustering of entities that represent the same real-world object. Our clustering approach provides a compact and fused representation of entities, and can identify errors in existing links as well as many new links. We support a distributed execution of the clustering approach to achieve faster execution times and scalability for large real-world data sets. We provide a novel gold standard for multi-source clustering, and evaluate our methods with respect to effectiveness and efficiency for large data sets from the geographic and music domains
    • …
    corecore