247 research outputs found

    Open Government Data (OGD) Publication as Linked Open Data (LOD): A Survey

    Get PDF
    Open Government Data (OGD) is a movement that has spread worldwide, enabling the publication of thousands of datasets on the Web, aiming to concretize transparency and citizen participatory governance. This initiative can create value by linking data describing the same phenomenon from different perspectives using the traditional Web and semantic web technologies. A framework of these technologies is linked data movement that guides the publication of data and their interconnection in a machine-readable means enabling automatic interpretation and exploitation. Nevertheless, Open Government Data publication as Linked Open Data (LOD) is not a trivial task due to several obstacles, such as data heterogeneity issues. Many works dealing with this transformation process have been published that need to be investigated thoroughly to deduce the general trends and the issues related to this field. The current work proposes a classification of existing methods dealing with OGD-LOD transformation and a synthesis study to highlight their main trends and challenges

    Application of Semantics to Solve Problems in Life Sciences

    Get PDF
    Fecha de lectura de Tesis: 10 de diciembre de 2018La cantidad de informaciĆ³n que se genera en la Web se ha incrementado en los Ćŗltimos aƱos. La mayor parte de esta informaciĆ³n se encuentra accesible en texto, siendo el ser humano el principal usuario de la Web. Sin embargo, a pesar de todos los avances producidos en el Ć”rea del procesamiento del lenguaje natural, los ordenadores tienen problemas para procesar esta informaciĆ³n textual. En este cotexto, existen dominios de aplicaciĆ³n en los que se estĆ”n publicando grandes cantidades de informaciĆ³n disponible como datos estructurados como en el Ć”rea de las Ciencias de la Vida. El anĆ”lisis de estos datos es de vital importancia no sĆ³lo para el avance de la ciencia, sino para producir avances en el Ć”mbito de la salud. Sin embargo, estos datos estĆ”n localizados en diferentes repositorios y almacenados en diferentes formatos que hacen difĆ­cil su integraciĆ³n. En este contexto, el paradigma de los Datos Vinculados como una tecnologĆ­a que incluye la aplicaciĆ³n de algunos estĆ”ndares propuestos por la comunidad W3C tales como HTTP URIs, los estĆ”ndares RDF y OWL. Haciendo uso de esta tecnologĆ­a, se ha desarrollado esta tesis doctoral basada en cubrir los siguientes objetivos principales: 1) promover el uso de los datos vinculados por parte de la comunidad de usuarios del Ć”mbito de las Ciencias de la Vida 2) facilitar el diseƱo de consultas SPARQL mediante el descubrimiento del modelo subyacente en los repositorios RDF 3) crear un entorno colaborativo que facilite el consumo de Datos Vinculados por usuarios finales, 4) desarrollar un algoritmo que, de forma automĆ”tica, permita descubrir el modelo semĆ”ntico en OWL de un repositorio RDF, 5) desarrollar una representaciĆ³n en OWL de ICD-10-CM llamada Dione que ofrezca una metodologĆ­a automĆ”tica para la clasificaciĆ³n de enfermedades de pacientes y su posterior validaciĆ³n haciendo uso de un razonador OWL

    Scalable Data Integration for Linked Data

    Get PDF
    Linked Data describes an extensive set of structured but heterogeneous datasources where entities are connected by formal semantic descriptions. In thevision of the Semantic Web, these semantic links are extended towards theWorld Wide Web to provide as much machine-readable data as possible forsearch queries. The resulting connections allow an automatic evaluation to findnew insights into the data. Identifying these semantic connections betweentwo data sources with automatic approaches is called link discovery. We derivecommon requirements and a generic link discovery workflow based on similaritiesbetween entity properties and associated properties of ontology concepts. Mostof the existing link discovery approaches disregard the fact that in times ofBig Data, an increasing volume of data sources poses new demands on linkdiscovery. In particular, the problem of complex and time-consuming linkdetermination escalates with an increasing number of intersecting data sources.To overcome the restriction of pairwise linking of entities, holistic clusteringapproaches are needed to link equivalent entities of multiple data sources toconstruct integrated knowledge bases. In this context, the focus on efficiencyand scalability is essential. For example, reusing existing links or backgroundinformation can help to avoid redundant calculations. However, when dealingwith multiple data sources, additional data quality problems must also be dealtwith. This dissertation addresses these comprehensive challenges by designingholistic linking and clustering approaches that enable reuse of existing links.Unlike previous systems, we execute the complete data integration workflowvia a distributed processing system. At first, the LinkLion portal will beintroduced to provide existing links for new applications. These links act asa basis for a physical data integration process to create a unified representationfor equivalent entities from many data sources. We then propose a holisticclustering approach to form consolidated clusters for same real-world entitiesfrom many different sources. At the same time, we exploit the semantic typeof entities to improve the quality of the result. The process identifies errorsin existing links and can find numerous additional links. Additionally, theentity clustering has to react to the high dynamics of the data. In particular,this requires scalable approaches for continuously growing data sources withmany entities as well as additional new sources. Previous entity clusteringapproaches are mostly static, focusing on the one-time linking and clustering ofentities from few sources. Therefore, we propose and evaluate new approaches for incremental entity clustering that supports the continuous addition of newentities and data sources. To cope with the ever-increasing number of LinkedData sources, efficient and scalable methods based on distributed processingsystems are required. Thus we propose distributed holistic approaches to linkmany data sources based on a clustering of entities that represent the samereal-world object. The implementation is realized on Apache Flink. In contrastto previous approaches, we utilize efficiency-enhancing optimizations for bothdistributed static and dynamic clustering. An extensive comparative evaluationof the proposed approaches with various distributed clustering strategies showshigh effectiveness for datasets from multiple domains as well as scalability on amulti-machine Apache Flink cluster

    Knowledge extraction from unstructured data and classification through distributed ontologies

    Get PDF
    The World Wide Web has changed the way humans use and share any kind of information. The Web removed several access barriers to the information published and has became an enormous space where users can easily navigate through heterogeneous resources (such as linked documents) and can easily edit, modify, or produce them. Documents implicitly enclose information and relationships among them which become only accessible to human beings. Indeed, the Web of documents evolved towards a space of data silos, linked each other only through untyped references (such as hypertext references) where only humans were able to understand. A growing desire to programmatically access to pieces of data implicitly enclosed in documents has characterized the last efforts of the Web research community. Direct access means structured data, thus enabling computing machinery to easily exploit the linking of different data sources. It has became crucial for the Web community to provide a technology stack for easing data integration at large scale, first structuring the data using standard ontologies and afterwards linking them to external data. Ontologies became the best practices to define axioms and relationships among classes and the Resource Description Framework (RDF) became the basic data model chosen to represent the ontology instances (i.e. an instance is a value of an axiom, class or attribute). Data becomes the new oil, in particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. In the literature these problems have been addressed with several proposals and standards, that mainly focus on technologies to access the data and on formats to represent the semantics of the data and their relationships. With the increasing of the volume of interconnected and serialized RDF data, RDF repositories may suffer from data overloading and may become a single point of failure for the overall Linked Data vision. One of the goals of this dissertation is to propose a thorough approach to manage the large scale RDF repositories, and to distribute them in a redundant and reliable peer-to-peer RDF architecture. The architecture consists of a logic to distribute and mine the knowledge and of a set of physical peer nodes organized in a ring topology based on a Distributed Hash Table (DHT). Each node shares the same logic and provides an entry point that enables clients to query the knowledge base using atomic, disjunctive and conjunctive SPARQL queries. The consistency of the results is increased using data redundancy algorithm that replicates each RDF triple in multiple nodes so that, in the case of peer failure, other peers can retrieve the data needed to resolve the queries. Additionally, a distributed load balancing algorithm is used to maintain a uniform distribution of the data among the participating peers by dynamically changing the key space assigned to each node in the DHT. Recently, the process of data structuring has gained more and more attention when applied to the large volume of text information spread on the Web, such as legacy data, news papers, scientific papers or (micro-)blog posts. This process mainly consists in three steps: \emph{i)} the extraction from the text of atomic pieces of information, called named entities; \emph{ii)} the classification of these pieces of information through ontologies; \emph{iii)} the disambigation of them through Uniform Resource Identifiers (URIs) identifying real world objects. As a step towards interconnecting the web to real world objects via named entities, different techniques have been proposed. The second objective of this work is to propose a comparison of these approaches in order to highlight strengths and weaknesses in different scenarios such as scientific and news papers, or user generated contents. We created the Named Entity Recognition and Disambiguation (NERD) web framework, publicly accessible on the Web (through REST API and web User Interface), which unifies several named entity extraction technologies. Moreover, we proposed the NERD ontology, a reference ontology for comparing the results of these technologies. Recently, the NERD ontology has been included in the NIF (Natural language processing Interchange Format) specification, part of the Creating Knowledge out of Interlinked Data (LOD2) project. Summarizing, this dissertation defines a framework for the extraction of knowledge from unstructured data and its classification via distributed ontologies. A detailed study of the Semantic Web and knowledge extraction fields is proposed to define the issues taken under investigation in this work. Then, it proposes an architecture to tackle the single point of failure issue introduced by the RDF repositories spread within the Web. Although the use of ontologies enables a Web where data is structured and comprehensible by computing machinery, human users may take advantage of it especially for the annotation task. Hence, this work describes an annotation tool for web editing, audio and video annotation in a web front end User Interface powered on the top of a distributed ontology. Furthermore, this dissertation details a thorough comparison of the state of the art of named entity technologies. The NERD framework is presented as technology to encompass existing solutions in the named entity extraction field and the NERD ontology is presented as reference ontology in the field. Finally, this work highlights three use cases with the purpose to reduce the amount of data silos spread within the Web: a Linked Data approach to augment the automatic classification task in a Systematic Literature Review, an application to lift educational data stored in Sharable Content Object Reference Model (SCORM) data silos to the Web of data and a scientific conference venue enhancer plug on the top of several data live collectors. Significant research efforts have been devoted to combine the efficiency of a reliable data structure and the importance of data extraction techniques. This dissertation opens different research doors which mainly join two different research communities: the Semantic Web and the Natural Language Processing community. The Web provides a considerable amount of data where NLP techniques may shed the light within it. The use of the URI as a unique identifier may provide one milestone for the materialization of entities lifted from a raw text to real world object

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    A structural and quantitative analysis of the webof linked data and its components to perform retrieval data

    Get PDF
    Esta investigaciĆ³n consiste en un anĆ”lisis cuantitativo y estructural de la Web of Linked Data con el fin de mejorar la bĆŗsqueda de datos en distintas fuentes. Para obtener mĆ©tricas cuantitativas de la Web of Linked Data, se aplicarĆ”n tĆ©cnicas estadĆ­sticas. En el caso del anĆ”lisis estructural haremos un AnĆ”lisis de Redes Sociales (ARS). Para tener una idea de la Web of Linked Data para poder hacer un anĆ”lisis, nos ayudaremos del diagrama de la Linking Open Data (LOD) cloud. Este es un catĆ”logo online de datasets cuya informaciĆ³n ha sido publicada usando tĆ©cnicas de Linked Data. Los datasets son publicados en un lenguaje llamado Resource Description Framework (RDF), el cual crea enlaces entre ellos para que la informaciĆ³n pudiera ser reutilizada. El objetivo de obtener un anĆ”lisis cuantitativo y estructural de la Web of Linked Data es mejorar las bĆŗsquedas de datos. Para ese propĆ³sito nosotros nos aprovecharemos del uso del lenguaje de marcado Schema.org y del proyecto Linked Open Vocabularies (LOV). Schema.org es un conjunto de etiquetas cuyo objetivo es que los Webmasters pudieran marcar sus propias pĆ”ginas Web con microdata. El microdata es usado para ayudar a los motores de bĆŗsqueda y otras herramientas Web a entender mejor la informaciĆ³n que estas contienen. LOV es un catĆ”logo para registrar los vocabularios que usan los datasets de la Web of Linked Data. Su objetivo es proporcionar un acceso sencillo a dichos vocabularios. En la investigaciĆ³n, vamos a desarrollar un estudio para la obtenciĆ³n de datos de la Web of Linked Data usando las fuentes mencionadas anteriormente con tĆ©cnicas de ā€œontology matchingā€. En nuestro caso, primeros vamos a mapear Schema.org con LOV, y despuĆ©s LOV con la Web of Linked Data. Un ARS de LOV tambiĆ©n ha sido realizado. El objetivo de dicho anĆ”lisis es obtener una idea cuantitativa y cualitativa de LOV. Sabiendo esto podemos concluir cosas como: cuales son los vocabularios mĆ”s usados o si estĆ”n especializados en algĆŗn campo o no. Estos pueden ser usados para filtrar datasets o reutilizar informaciĆ³n

    Linked Data Entity Summarization

    Get PDF
    On the Web, the amount of structured and Linked Data about entities is constantly growing. Descriptions of single entities often include thousands of statements and it becomes difficult to comprehend the data, unless a selection of the most relevant facts is provided. This doctoral thesis addresses the problem of Linked Data entity summarization. The contributions involve two entity summarization approaches, a common API for entity summarization, and an approach for entity data fusion
    • ā€¦
    corecore