131 research outputs found
A Decade of Scholarly Research on Open Knowledge Graphs
The proliferation of open knowledge graphs has led to a surge in scholarly
research on the topic over the past decade. This paper presents a bibliometric
analysis of the scholarly literature on open knowledge graphs published between
2013 and 2023. The study aims to identify the trends, patterns, and impact of
research in this field, as well as the key topics and research questions that
have emerged. The work uses bibliometric techniques to analyze a sample of 4445
scholarly articles retrieved from Scopus. The findings reveal an
ever-increasing number of publications on open knowledge graphs published every
year, particularly in developed countries (+50 per year). These outputs are
published in highly-referred scholarly journals and conferences. The study
identifies three main research themes: (1) knowledge graph construction and
enrichment, (2) evaluation and reuse, and (3) fusion of knowledge graphs into
NLP systems. Within these themes, the study identifies specific tasks that have
received considerable attention, including entity linking, knowledge graph
embedding, and graph neural networks
Reproducible Domain-Specific Knowledge Graphs in the Life Sciences: a Systematic Literature Review
Knowledge graphs (KGs) are widely used for representing and organizing
structured knowledge in diverse domains. However, the creation and upkeep of
KGs pose substantial challenges. Developing a KG demands extensive expertise in
data modeling, ontology design, and data curation. Furthermore, KGs are
dynamic, requiring continuous updates and quality control to ensure accuracy
and relevance. These intricacies contribute to the considerable effort required
for their development and maintenance. One critical dimension of KGs that
warrants attention is reproducibility. The ability to replicate and validate
KGs is fundamental for ensuring the trustworthiness and sustainability of the
knowledge they represent. Reproducible KGs not only support open science by
allowing others to build upon existing knowledge but also enhance transparency
and reliability in disseminating information. Despite the growing number of
domain-specific KGs, a comprehensive analysis concerning their reproducibility
has been lacking. This paper addresses this gap by offering a general overview
of domain-specific KGs and comparing them based on various reproducibility
criteria. Our study over 19 different domains shows only eight out of 250
domain-specific KGs (3.2%) provide publicly available source code. Among these,
only one system could successfully pass our reproducibility assessment (14.3%).
These findings highlight the challenges and gaps in achieving reproducibility
across domain-specific KGs. Our finding that only 0.4% of published
domain-specific KGs are reproducible shows a clear need for further research
and a shift in cultural practices
Mind the Labels: Describing Relations in Knowledge Graphs With Pretrained Models
Pretrained language models (PLMs) for data-to-text (D2T) generation can use
human-readable data labels such as column headings, keys, or relation names to
generalize to out-of-domain examples. However, the models are well-known in
producing semantically inaccurate outputs if these labels are ambiguous or
incomplete, which is often the case in D2T datasets. In this paper, we expose
this issue on the task of descibing a relation between two entities. For our
experiments, we collect a novel dataset for verbalizing a diverse set of 1,522
unique relations from three large-scale knowledge graphs (Wikidata, DBPedia,
YAGO). We find that although PLMs for D2T generation expectedly fail on unclear
cases, models trained with a large variety of relation labels are surprisingly
robust in verbalizing novel, unseen relations. We argue that using data with a
diverse set of clear and meaningful labels is key to training D2T generation
systems capable of generalizing to novel domains.Comment: Long paper at EACL '23. Code and data:
https://github.com/kasnerz/rel2tex
Knowledge Graph Embedding: An Overview
Many mathematical models have been leveraged to design embeddings for
representing Knowledge Graph (KG) entities and relations for link prediction
and many downstream tasks. These mathematically-inspired models are not only
highly scalable for inference in large KGs, but also have many explainable
advantages in modeling different relation patterns that can be validated
through both formal proofs and empirical results. In this paper, we make a
comprehensive overview of the current state of research in KG completion. In
particular, we focus on two main branches of KG embedding (KGE) design: 1)
distance-based methods and 2) semantic matching-based methods. We discover the
connections between recently proposed models and present an underlying trend
that might help researchers invent novel and more effective models. Next, we
delve into CompoundE and CompoundE3D, which draw inspiration from 2D and 3D
affine operations, respectively. They encompass a broad spectrum of techniques
including distance-based and semantic-based methods. We will also discuss an
emerging approach for KG completion which leverages pre-trained language models
(PLMs) and textual descriptions of entities and relations and offer insights
into the integration of KGE embedding methods with PLMs for KG completion
A descrição formal da qualidade de dados publicados na Web: análise do Data Quality Vocabulary (DQV)
The quality assessment process plays an important role in the reuse of data made available on the Web. To ensure the use and reuse of these data, it is necessary to formally describe them in a way that computational agents can understand. One of the possibilities to make this description viable is the Data Quality Vocabulary, elaborated by the World Wide Web Consortium. The objective was to verify the impact of the Data Quality Vocabulary in the process of formal description of the quality of data published on the Web, analyzing the objectives, characteristics, and structure of the vocabulary. The research has an exploratory and descriptive character, adopting as a method a study of the official documentation published by the consortium. As a result, an overview of the scenario that led to the development of the vocabulary was obtained, its structure was presented and its potential application was discussed. It is concluded that the Data Quality Vocabulary provides a general and customizable descriptive structure for providing the results of the data quality assessment process, which allows these results to be shared by its providers. It also allows the community to participate in the evaluation process and formally share the results obtained, thus reducing rework. It is also concluded that the vocabulary contributes to the reuse of data in the context of the Web by facilitating the use of automatic and semi-automatic tools in the evaluation and selection of data sources for the application. O processo de avaliação de qualidade desempenha um papel importante na reutilização dos dados disponibilizados na Web. Para garantir o uso e reuso desses dados faz-se necessária à sua descrição formal, de maneira compreensível à agentes computacionais. Uma das possibilidades para viabilizar essa descrição é o Data Quality Vocabulary, elaborado pelo Word Wide Web Consortium. Objetivou-se verificar o impacto do Data Quality Vocabulary no processo de descrição formal da qualidade de dados publicados na Web, analisando os objetivos, características e a estrutura do vocabulário. A pesquisa possuí um caráter exploratório e descritivo, adotando como método um estudo da documentação oficial publicada pelo consórcio. Como resultados obteve-se um panorama do cenário que levou ao desenvolvimento do vocabulário, foi apresentada sua estrutura e discutido o seu potencial de aplicação. Conclui-se que o Data Quality Vocabulary disponibiliza uma estrutura descritiva geral e customizável para o fornecimento de resultados do processo de avaliação de qualidade de dados, o que permite que esses resultados sejam compartilhados pelos seus fornecedores. Permite ainda que a comunidade participe do processo de avaliação e compartilhe os resultados obtidos de maneira formal, diminuindo assim o retrabalho. Conclui-se ainda que o vocabulário contribui para o reuso de dados no contexto da Web ao facilitar o uso de ferramentas automáticas e semiautomáticas no processo de avaliação e seleção de fontes de dados para a aplicaçã
An approach to assess the quality of Jupyter projects published by GLAM institutions
GLAM organizations have been digitizing their collections and making them available for the public for several decades. Recent methods for publishing digital collections such as “GLAM Labs” and “Collections as Data” provide guidelines for the application of computational methods to reuse the contents of cultural heritage institutions in innovative and creative ways. Jupyter Notebooks have become a powerful tool to foster use of these collections by digital humanities researchers. Based on previous approaches for quality assessment, which have been adapted for cultural heritage collections, this paper proposes a methodology for assessing the quality of projects based on Jupyter Notebooks published by relevant GLAM institutions. A list of projects based on Jupyter Notebooks using cultural heritage data has been evaluated. Common features and best practices have been identified. A detailed analysis, that can be useful for organizations interested in creating their own Jupyter Notebooks projects, has been provided. Open issues requiring further work and additional avenues for exploration are outlined
On the Evolution of Knowledge Graphs: A Survey and Perspective
Knowledge graphs (KGs) are structured representations of diversified
knowledge. They are widely used in various intelligent applications. In this
article, we provide a comprehensive survey on the evolution of various types of
knowledge graphs (i.e., static KGs, dynamic KGs, temporal KGs, and event KGs)
and techniques for knowledge extraction and reasoning. Furthermore, we
introduce the practical applications of different types of KGs, including a
case study in financial analysis. Finally, we propose our perspective on the
future directions of knowledge engineering, including the potential of
combining the power of knowledge graphs and large language models (LLMs), and
the evolution of knowledge extraction, reasoning, and representation
Recommended from our members
Knowledge Graphs: Opportunities and Challenges
With the explosive growth of artificial intelligence (AI) and big data, it has become vitally important to organize and represent the enormous volume of knowledge appropriately. As graph data, knowledge graphs accumulate and convey knowledge of the real world. It has been well-recognized that knowledge graphs effectively represent complex information; hence, they rapidly gain the attention of academia and industry in recent years. Thus to develop a deeper understanding of knowledge graphs, this paper presents a systematic overview of this field. Specifically, we focus on the opportunities and challenges of knowledge graphs. We first review the opportunities of knowledge graphs in terms of two aspects: (1) AI systems built upon knowledge graphs; (2) potential application fields of knowledge graphs. Then, we thoroughly discuss severe technical challenges in this field, such as knowledge graph embeddings, knowledge acquisition, knowledge graph completion, knowledge fusion, and knowledge reasoning. We expect that this survey will shed new light on future research and the development of knowledge graphs
A Framework to Assess Knowledge Graphs Accountability
Knowledge Graphs (KGs), and Linked Open Data in particular, enable the
generation and exchange of more and more information on the Web. In order to
use and reuse these data properly, the presence of accountability information
is essential. Accountability requires specific and accurate information about
people's responsibilities and actions. In this article, we define KGAcc, a
framework dedicated to the assessment of RDF graphs accountability. It consists
of accountability requirements and a measure of accountability for KGs. Then,
we evaluate KGs from the LOD cloud and describe the results obtained. Finally,
we compare our approach with data quality and FAIR assessment frameworks to
highlight the differences.Comment: 8 pages, to be published in: 2023 IEEE International Conference on
Web Intelligence and Intelligent Agent Technology (WI-IAT
- …