Search CORE

12 research outputs found

Semantic Retrieval of Relevant Sources for Large Scale Virtual Documents

Author: Khuthbudin T.
Priyadarshini R.
Saravanan S.
Satish S.
Tamilselvan Latha
Publication venue: The Authors. Published by Elsevier B.V.
Publication date: 31/12/2015
Field of study

AbstractThe term big data has come into use in recent years. It is used to refer to the ever-increasing amount of data that organizations are storing, processing and analyzing. An Interesting fact with bigdata is that it differ in Volume, Variety, Velocity characteristics which makes it difficult to process using the conventional Database Management System. Hence there is a need of schema less Management Systems even this will never be complete solution to bigdata analysis since the processing has no focus on the semantic information as they consider only the structural information. Content Management System like Wikipedia stores and links huge amount of documents and files. There is lack of semantic linking and analysis in such systems even though this kind of CMS uses clusters and distributed framework for storing big data. The retrieved references for a particular article are random and enormous. In order to reduce the number of references for a selected content there is a need for semantic matching. In this paper we propose framework which make use of the distributed parallel processing capability of Hadoop Distributed File System (HDFS) to perform semantic analysis over the volume of documents (bigdata) to find the best matched source document from the collection source documents for the same virtual document

Elsevier - Publisher Connector

Self-adaptive Based Model for Ambiguity Resolution of The Linked Data Query for Big Data Analytics

Author: Ahmad Nazri Mohd Zakree
Husin Nor Azura
M. Shafazand Yasser
Mohd Sharef Nurfadhlina
Publication venue: 'Penerbit UTHM'
Publication date: 25/11/2018
Field of study

Integration of heterogeneous data sources is a crucial step in big data analytics, although it creates ambiguity issues during mapping between the sources due to the variation in the query terms, data structure and granularity conflicts. However, there are limited researches on effective big data integration to address the ambiguity issue for big data analytics. This paper introduces a self-adaptive model for big data integration by exploiting the data structure during querying in order to mitigate and resolve ambiguities. An assessment of a preliminary work on the Geography and Quran dataset is reported to illustrate the feasibility of the proposed model that motivates future work such as solving complex query

Journals of Universiti Tun Hussein Onn Malaysia (UTHM)

International Journal of Integrated Engineering

Using Provenance for Quality Assessment and Repair in Linked Open Data

Author: Flouris G
Fundulaki I.
Mendes Pablo N.
Poveda-Villalón M
Roussakis Y
Publication venue: Facultad de Informática (UPM)
Publication date: 01/11/2012
Field of study

As the number of data sources publishing their data on the Web of Data is growing, we are experiencing an immense growth of the Linked Open Data cloud. The lack of control on the published sources, which could be untrustworthy or unreliable, along with their dynamic nature that often invalidates links and causes conflicts or other discrepancies, could lead to poor quality data. In order to judge data quality, a number of quality indicators have been proposed, coupled with quality metrics that quantify the “quality level” of a dataset. In addition to the above, some approaches address how to improve the quality of the datasets through a repair process that focuses on how to correct invalidities caused by constraint violations by either removing or adding triples. In this paper we argue that provenance is a critical factor that should be taken into account during repairs to ensure that the most reliable data is kept. Based on this idea, we propose quality metrics that take into account provenance and evaluate their applicability as repair guidelines in a particular data fusion setting

Archivo Digital UPM

A guided walk into link key candidate extraction with relational concept analysis

Author: Atencia Manuel
David Jérôme
Euzenat Jérôme
Napoli Amedeo
Vizzini Jérémy
Publication venue: No commercial editor.
Publication date: 26/10/2019
Field of study

International audienceData interlinking is an important task for linked data interoperability. One of the possible techniques for finding links is the use of link keys which generalise relational keys to pairs of RDF models. We show how link key candidates may be directly extracted from RDF data sets by encoding the extraction problem in relational concept analysis. This method deals with non functional properties and circular dependent link key expressions. As such, it generalises those presented for non dependent link keys and link keys over the relational model. The proposed method is able to return link key candidates involving several classes at once

INRIA a CCSD electronic archive server

A Combinational Method to Determining Identical Entities from Heterogeneous Knowledge Graphs

Author: Haklae Kim
Publication venue: 'Korean Institute of Science and Technology Information (KISTI)'
Publication date: 01/09/2018
Field of study

With the increasing demand for intelligent services, knowledge graph technologies have attracted much attention. Various application-specific knowledge bases have been developed in industry and academia. In particular, open knowledge bases play an important role for constructing a new knowledge base by serving as a reference data source. However, identifying the same entities among heterogeneous knowledge sources is not trivial. This study focuses on extracting and determining exact and precise entities, which is essential for merging and fusing various knowledge sources. To achieve this, several algorithms for extracting the same entities are proposed and then their performance is evaluated using real-world knowledge sources

Directory of Open Access Journals

Scalable and Distributed Methods for Entity Matching, Consolidation and Disambiguation over Linked Data Corpora

Author: Aidan Hogan
Antoine Zimmermann
Axel Polleres
Jürgen Umbrich
Stefan Decker
Publication venue
Publication date: 01/01/2011
Field of study

With respect to large-scale, static, Linked Data corpora, in this paper we discuss scalable and distributed methods for entity consolidation (aka. smushing, entity resolution, object consolidation, etc.) to locate and process names that signify the same entity. We investigate (i) a baseline approach, which uses explicit owl:sameAs relations to perform consolidation; (ii) extended entity consolidation which additionally uses a subset of OWL 2 RL/RDF rules to derive novel owl:sameAs relations through the semantics of inverse-functional properties, functional-properties and (max-)cardinality restrictions with value one; (iii) deriving weighted concurrence measures between entities in the corpus based on shared inlinks/outlinks and attribute values using statistical analyses; (iv) disambiguating (initially) consolidated entities based on inconsistency detection using OWL 2 RL/RDF rules. Our methods are based upon distributed sorts and scans of the corpus, where we deliberately avoid the requirement for indexing all data. Throughout, we offer evaluation over a diverse Linked Data corpus consisting of 1.118 billion quadruples derived from a domain-agnostic, open crawl of 3.985 million RDF/XML Web documents, demonstrating the feasibility of our methods at that scale, and giving insights into the quality of the results for real-world data

CiteSeerX

Access to Research at National University of Ireland, Galway

Scalable and Distributed Methods for Entity Matching, Consolidation and Disambiguation over Linked Data Corpora

Author: Aidan Hogan
Jürgen Umbrich
Polleres Axel
Zimmermann Antoine
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

International audienceWith respect to large-scale, static, Linked Data corpora, in this paper we discuss scalable and distributed methods for entity consolidation (aka. smushing, entity resolution, object consolidation, etc.) to locate and process names that signify the same entity. We investigate (i) a baseline approach, which uses explicit owl: sameAs relations to perform consolidation; (ii) extended entity consolidation which additionally uses a subset of OWL 2 RL/RDF rules to derive novel owl:sameAs relations through the semantics of inverse-functional properties, functional-properties and (max-)cardinality restrictions with value one; (iii) deriving weighted concurrence measures between entities in the corpus based on shared inlinks/outlinks and attribute values using statistical analyses; (iv) disambiguating (initially) consolidated entities based on inconsistency detection using OWL 2 RL/RDF rules. Our methods are based upon distributed sorts and scans of the corpus, where we deliberately avoid the requirement for indexing all data. Throughout, we offer evaluation over a diverse Linked Data corpus consisting of 1.118 billion quadruples derived from a domain-agnostic, open crawl of 3.985 million RDF/XML Web documents, demonstrating the feasibility of our methods at that scale, and giving insights into the quality of the results for real-world data

HAL-EMSE

Artificial intelligence for ocean science data integration:current state, gaps, and way forward

Author: Bar Koby
Lehahn Yoav
Sagi Tomer
Publication venue: 'University of California Press'
Publication date: 15/05/2020
Field of study

VBN

A Survey of the First 20 Years of Research on Semantic Web and Linked Data

Author: Gandon Fabien
Publication venue: 'Lavoisier'
Publication date: 01/12/2018
Field of study

International audienceThis paper is a survey of the research topics in the field of Semantic Web, Linked Data and Web of Data. This study looks at the contributions of this research community over its first twenty years of existence. Compiling several bibliographical sources and bibliometric indicators , we identify the main research trends and we reference some of their major publications to provide an overview of that initial period. We conclude with some perspectives for the future research challenges.Cet article est une étude des sujets de recherche dans le domaine du Web sémantique, des données liées et du Web des données. Cette étude se penche sur les contributions de cette communauté de recherche au cours de ses vingt premières années d'existence. En compilant plusieurs sources bibliographiques et indicateurs bibliométriques, nous identifions les principales tendances de la recherche et nous référençons certaines de leurs publications majeures pour donner un aperçu de cette période initiale. Nous concluons avec une discussion sur les tendances et perspectives de recherche

INRIA a CCSD electronic archive server

HAL-Rennes 1