430 research outputs found

    FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example

    Get PDF
    Background: The size, velocity, and heterogeneity of Big Data outclasses conventional data management tools and requires data and metadata to be fully machine-actionable (i.e., eScience-compliant) and thus findable, accessible, interoperable, and reusable (FAIR). This can be achieved by using ontologies and through representing them as semantic graphs. Here, we discuss two different semantic graph approaches of representing empirical data and metadata in a knowledge graph, with phenotype descriptions as an example. Almost all phenotype descriptions are still being published as unstructured natural language texts, with far-reaching consequences for their FAIRness, substantially impeding their overall usability within the life sciences. However, with an increasing amount of anatomy ontologies becoming available and semantic applications emerging, a solution to this problem becomes available. Researchers are starting to document and communicate phenotype descriptions through the Web in the form of highly formalized and structured semantic graphs that use ontology terms and Uniform Resource Identifiers (URIs) to circumvent the problems connected with unstructured texts. Results: Using phenotype descriptions as an example, we compare and evaluate two basic representations of empirical data and their accompanying metadata in the form of semantic graphs: the class-based TBox semantic graph approach called Semantic Phenotype and the instance-based ABox semantic graph approach called Phenotype Knowledge Graph. Their main difference is that only the ABox approach allows for identifying every individual part and property mentioned in the description in a knowledge graph. This technical difference results in substantial practical consequences that significantly affect the overall usability of empirical data. The consequences affect findability, accessibility, and explorability of empirical data as well as their comparability, expandability, universal usability and reusability, and overall machine-actionability. Moreover, TBox semantic graphs often require querying under entailment regimes, which is computationally more complex. Conclusions: We conclude that, from a conceptual point of view, the advantages of the instance-based ABox semantic graph approach outweigh its shortcomings and outweigh the advantages of the class-based TBox semantic graph approach. Therefore, we recommend the instance-based ABox approach as a FAIR approach for documenting and communicating empirical data and metadata in a knowledge graph

    A Survey of the First 20 Years of Research on Semantic Web and Linked Data

    Get PDF
    International audienceThis paper is a survey of the research topics in the field of Semantic Web, Linked Data and Web of Data. This study looks at the contributions of this research community over its first twenty years of existence. Compiling several bibliographical sources and bibliometric indicators , we identify the main research trends and we reference some of their major publications to provide an overview of that initial period. We conclude with some perspectives for the future research challenges.Cet article est une étude des sujets de recherche dans le domaine du Web sémantique, des données liées et du Web des données. Cette étude se penche sur les contributions de cette communauté de recherche au cours de ses vingt premières années d'existence. En compilant plusieurs sources bibliographiques et indicateurs bibliométriques, nous identifions les principales tendances de la recherche et nous référençons certaines de leurs publications majeures pour donner un aperçu de cette période initiale. Nous concluons avec une discussion sur les tendances et perspectives de recherche

    Journalistic Knowledge Platforms: from Idea to Realisation

    Get PDF
    Journalistiske kunnskapsplattformer (JKPer) er en type intelligente informasjonssystemer designet for å forbedre nyhetsproduksjonsprosesser ved å kombinere stordata, kunstig intelligens (KI) og kunnskapsbaser for å støtte journalister. Til tross for sitt potensial for å revolusjonere journalistikkfeltet, har adopsjonen av JKPer vært treg, med forskere og store nyhetsutløp involvert i forskning og utvikling av JKPer. Den langsomme adopsjonen kan tilskrives den tekniske kompleksiteten til JKPer, som har ført til at nyhetsorganisasjoner stoler på flere uavhengige og oppgavespesifikke produksjonssystemer. Denne situasjonen kan øke ressurs- og koordineringsbehovet og kostnadene, samtidig som den utgjør en trussel om å miste kontrollen over data og havne i leverandørlåssituasjoner. De tekniske kompleksitetene forblir en stor hindring, ettersom det ikke finnes en allerede godt utformet systemarkitektur som ville lette realiseringen og integreringen av JKPer på en sammenhengende måte over tid. Denne doktoravhandlingen bidrar til teorien og praksisen rundt kunnskapsgrafbaserte JKPer ved å studere og designe en programvarearkitektur som referanse for å lette iverksettelsen av konkrete løsninger og adopsjonen av JKPer. Den første bidraget til denne doktoravhandlingen gir en grundig og forståelig analyse av ideen bak JKPer, fra deres opprinnelse til deres nåværende tilstand. Denne analysen gir den første studien noensinne av faktorene som har bidratt til den langsomme adopsjonen, inkludert kompleksiteten i deres sosiale og tekniske aspekter, og identifiserer de største utfordringene og fremtidige retninger for JKPer. Den andre bidraget presenterer programvarearkitekturen som referanse, som gir en generisk blåkopi for design og utvikling av konkrete JKPer. Den foreslåtte referansearkitekturen definerer også to nye typer komponenter ment for å opprettholde og videreutvikle KI-modeller og kunnskapsrepresentasjoner. Den tredje presenterer et eksempel på iverksettelse av programvarearkitekturen som referanse og beskriver en prosess for å forbedre effektiviteten til informasjonsekstraksjonspipelines. Denne rammen muliggjør en fleksibel, parallell og samtidig integrering av teknikker for naturlig språkbehandling og KI-verktøy. I tillegg diskuterer denne avhandlingen konsekvensene av de nyeste KI-fremgangene for JKPer og ulike etiske aspekter ved bruk av JKPer. Totalt sett gir denne PhD-avhandlingen en omfattende og grundig analyse av JKPer, fra teorien til designet av deres tekniske aspekter. Denne forskningen tar sikte på å lette vedtaket av JKPer og fremme forskning på dette feltet.Journalistic Knowledge Platforms (JKPs) are a type of intelligent information systems designed to augment news creation processes by combining big data, artificial intelligence (AI) and knowledge bases to support journalists. Despite their potential to revolutionise the field of journalism, the adoption of JKPs has been slow, with scholars and large news outlets involved in the research and development of JKPs. The slow adoption can be attributed to the technical complexity of JKPs that led news organisation to rely on multiple independent and task-specific production system. This situation can increase the resource and coordination footprint and costs, at the same time it poses a threat to lose control over data and face vendor lock-in scenarios. The technical complexities remain a major obstacle as there is no existing well-designed system architecture that would facilitate the realisation and integration of JKPs in a coherent manner over time. This PhD Thesis contributes to the theory and practice on knowledge-graph based JKPs by studying and designing a software reference architecture to facilitate the instantiation of concrete solutions and the adoption of JKPs. The first contribution of this PhD Thesis provides a thorough and comprehensible analysis of the idea of JKPs, from their origins to their current state. This analysis provides the first-ever study of the factors that have contributed to the slow adoption, including the complexity of their social and technical aspects, and identifies the major challenges and future directions of JKPs. The second contribution presents the software reference architecture that provides a generic blueprint for designing and developing concrete JKPs. The proposed reference architecture also defines two novel types of components intended to maintain and evolve AI models and knowledge representations. The third presents an instantiation example of the software reference architecture and details a process for improving the efficiency of information extraction pipelines. This framework facilitates a flexible, parallel and concurrent integration of natural language processing techniques and AI tools. Additionally, this Thesis discusses the implications of the recent AI advances on JKPs and diverse ethical aspects of using JKPs. Overall, this PhD Thesis provides a comprehensive and in-depth analysis of JKPs, from the theory to the design of their technical aspects. This research aims to facilitate the adoption of JKPs and advance research in this field.Doktorgradsavhandlin

    Knowledge-Driven Harmonization of Sensor Observations: Exploiting Linked Open Data for IoT Data Streams

    Get PDF
    The rise of the Internet of Things leads to an unprecedented number of continuous sensor observations that are available as IoT data streams. Harmonization of such observations is a labor-intensive task due to heterogeneity in format, syntax, and semantics. We aim to reduce the effort for such harmonization tasks by employing a knowledge-driven approach. To this end, we pursue the idea of exploiting the large body of formalized public knowledge represented as statements in Linked Open Data

    Enabling automatic provenance-based trust assessment of web content

    Get PDF

    TITAN: A knowledge-based platform for Big Data workflow management

    Get PDF
    Modern applications of Big Data are transcending from being scalable solutions of data processing and analysis, to now provide advanced functionalities with the ability to exploit and understand the underpinning knowledge. This change is promoting the development of tools in the intersection of data processing, data analysis, knowledge extraction and management. In this paper, we propose TITAN, a software platform for managing all the life cycle of science workflows from deployment to execution in the context of Big Data applications. This platform is characterised by a design and operation mode driven by semantics at different levels: data sources, problem domain and workflow components. The proposed platform is developed upon an ontological framework of meta-data consistently managing processes and models and taking advantage of domain knowledge. TITAN comprises a well-grounded stack of Big Data technologies including Apache Kafka for inter-component communication, Apache Avro for data serialisation and Apache Spark for data analytics. A series of use cases are conducted for validation, which comprises workflow composition and semantic meta-data management in academic and real-world fields of human activity recognition and land use monitoring from satellite images.This work has been partially funded by the Spanish Ministry of Science and Innovation via Grant PID2020 112540RB-C41 (AEI/FEDER, UE) and Andalusian PAIDI program with grant P18-RT-2799. Funding for open access charge: Universidad de Málaga / CBUA

    A abordagem POESIA para a integração de dados e serviços na Web semantica

    Get PDF
    Orientador: Claudia Bauzer MedeirosTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: POESIA (Processes for Open-Ended Systems for lnformation Analysis), a abordagem proposta neste trabalho, visa a construção de processos complexos envolvendo integração e análise de dados de diversas fontes, particularmente em aplicações científicas. A abordagem é centrada em dois tipos de mecanismos da Web semântica: workflows científicos, para especificar e compor serviços Web; e ontologias de domínio, para viabilizar a interoperabilidade e o gerenciamento semânticos dos dados e processos. As principais contribuições desta tese são: (i) um arcabouço teórico para a descrição, localização e composição de dados e serviços na Web, com regras para verificar a consistência semântica de composições desses recursos; (ii) métodos baseados em ontologias de domínio para auxiliar a integração de dados e estimar a proveniência de dados em processos cooperativos na Web; (iii) implementação e validação parcial das propostas, em urna aplicação real no domínio de planejamento agrícola, analisando os benefícios e as limitações de eficiência e escalabilidade da tecnologia atual da Web semântica, face a grandes volumes de dadosAbstract: POESIA (Processes for Open-Ended Systems for Information Analysis), the approach proposed in this work, supports the construction of complex processes that involve the integration and analysis of data from several sources, particularly in scientific applications. This approach is centered in two types of semantic Web mechanisms: scientific workflows, to specify and compose Web services; and domain ontologies, to enable semantic interoperability and management of data and processes. The main contributions of this thesis are: (i) a theoretical framework to describe, discover and compose data and services on the Web, inc1uding mIes to check the semantic consistency of resource compositions; (ii) ontology-based methods to help data integration and estimate data provenance in cooperative processes on the Web; (iii) partial implementation and validation of the proposal, in a real application for the domain of agricultural planning, analyzing the benefits and scalability problems of the current semantic Web technology, when faced with large volumes of dataDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã
    corecore