556 research outputs found
A systematic literature review on Wikidata
To review the current status of research on Wikidata and, in particular, of articles that either describe applications of Wikidata or provide empirical evidence, in order to uncover the topics of interest, the fields that are benefiting from its applications and which researchers and institutions are leading the work
Multiple Texts as a Limiting Factor in Online Learning: Quantifying (Dis-)similarities of Knowledge Networks across Languages
We test the hypothesis that the extent to which one obtains information on a
given topic through Wikipedia depends on the language in which it is consulted.
Controlling the size factor, we investigate this hypothesis for a number of 25
subject areas. Since Wikipedia is a central part of the web-based information
landscape, this indicates a language-related, linguistic bias. The article
therefore deals with the question of whether Wikipedia exhibits this kind of
linguistic relativity or not. From the perspective of educational science, the
article develops a computational model of the information landscape from which
multiple texts are drawn as typical input of web-based reading. For this
purpose, it develops a hybrid model of intra- and intertextual similarity of
different parts of the information landscape and tests this model on the
example of 35 languages and corresponding Wikipedias. In this way the article
builds a bridge between reading research, educational science, Wikipedia
research and computational linguistics.Comment: 40 pages, 13 figures, 5 table
Analogy Training Multilingual Encoders
Language encoders encode words and phrases in ways that capture their local semantic relatedness, but are known to be globally inconsistent. Global inconsistency can seemingly be corrected for, in part, by leveraging signals from knowledge bases, but previous results are partial and limited to monolingual English encoders. We extract a large-scale multilingual, multi-word analogy dataset from Wikidata for diagnosing and correcting for global inconsistencies and implement a four-way Siamese BERT architecture for grounding multilingual BERT (mBERT) in Wikidata through analogy training. We show that analogy training not only improves the global consistency of mBERT, as well as the isomorphism of language-specific subspaces, but also leads to significant gains on downstream tasks such as bilingual dictionary induction and sentence retrieval
Identifying and Consolidating Knowledge Engineering Requirements
Knowledge engineering is the process of creating and maintaining
knowledge-producing systems. Throughout the history of computer science and AI,
knowledge engineering workflows have been widely used because high-quality
knowledge is assumed to be crucial for reliable intelligent agents. However,
the landscape of knowledge engineering has changed, presenting four challenges:
unaddressed stakeholder requirements, mismatched technologies, adoption
barriers for new organizations, and misalignment with software engineering
practices. In this paper, we propose to address these challenges by developing
a reference architecture using a mainstream software methodology. By studying
the requirements of different stakeholders and eras, we identify 23 essential
quality attributes for evaluating reference architectures. We assess three
candidate architectures from recent literature based on these attributes.
Finally, we discuss the next steps towards a comprehensive reference
architecture, including prioritizing quality attributes, integrating components
with complementary strengths, and supporting missing socio-technical
requirements. As this endeavor requires a collaborative effort, we invite all
knowledge engineering researchers and practitioners to join us
Wikipedia Citations: A comprehensive dataset of citations with identifiers extracted from English Wikipedia
Wikipedia's contents are based on reliable and published sources. To this
date, relatively little is known about what sources Wikipedia relies on, in
part because extracting citations and identifying cited sources is challenging.
To close this gap, we release Wikipedia Citations, a comprehensive dataset of
citations extracted from Wikipedia. A total of 29.3M citations were extracted
from 6.1M English Wikipedia articles as of May 2020, and classified as being to
books, journal articles or Web contents. We were thus able to extract 4.0M
citations to scholarly publications with known identifiers -- including DOI,
PMC, PMID, and ISBN -- and further equip an extra 261K citations with DOIs from
Crossref. As a result, we find that 6.7% of Wikipedia articles cite at least
one journal article with an associated DOI, and that Wikipedia cites just 2% of
all articles with a DOI currently indexed in the Web of Science. We release our
code to allow the community to extend upon our work and update the dataset in
the future
- …