Search CORE

10 research outputs found

A systematic literature review on Wikidata

Author: García Barriocanal María Elena
Mora Cantallops Marçal
Sánchez Alonso Salvador
Publication venue: Emerald Publishing Limited
Publication date: 01/07/2019
Field of study

To review the current status of research on Wikidata and, in particular, of articles that either describe applications of Wikidata or provide empirical evidence, in order to uncover the topics of interest, the fields that are benefiting from its applications and which researchers and institutions are leading the work

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Managing and Consuming Completeness Information for RDF Data Sources

Author: Darari Fariz
Publication venue
Publication date: 20/06/2017
Field of study

The ever increasing amount of Semantic Web data gives rise to the question: How complete is the data? Though generally data on the Semantic Web is incomplete, many parts of data are indeed complete, such as the children of Barack Obama and the crew of Apollo 11. This thesis aims to study how to manage and consume completeness information about Semantic Web data. In particular, we first discuss how completeness information can guarantee the completeness of query answering. Next, we propose optimization techniques of completeness reasoning and conduct experimental evaluations to show the feasibility of our approaches. We also provide a technique to check the soundness of queries with negation via reduction to query completeness checking. We further enrich completeness information with timestamps, enabling query answers to be checked up to when they are complete. We then introduce two demonstrators, i.e., CORNER and COOL-WD, to show how our completeness framework can be realized. Finally, we investigate an automated method to generate completeness statements from text on the Web via relation cardinality extraction

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

The Case of Wikidata

Author: Farda-Sarbas Mariam
Publication venue
Publication date: 01/01/2024
Field of study

Since its launch in 2012, Wikidata has grown to become the largest open knowledge base (KB), containing more than 100 million data items and over 6 million registered users. Wikidata serves as the structured data backbone of Wikipedia, addressing data inconsistencies, and adhering to the motto of “serving anyone anywhere in the world,” a vision realized through the diversity of knowledge. Despite being a collaboratively contributed platform, the Wikidata community heavily relies on bots, automated accounts with batch, and speedy editing rights, for a majority of edits. As Wikidata approaches its first decade, the question arises: How close is Wikidata to achieving its vision of becoming a global KB and how diverse is it in serving the global population? This dissertation investigates the current status of Wikidata’s diversity, the role of bot interventions on diversity, and how bots can be leveraged to improve diversity within the context of Wikidata. The methodologies used in this study are mapping study and content analysis, which led to the development of three datasets: 1) Wikidata Research Articles Dataset, covering the literature on Wikidata from its first decade of existence sourced from online databases to inspect its current status; 2) Wikidata Requests-for-Permissions Dataset, based on the pages requesting bot rights on the Wikidata website to explore bots from a community perspective; and 3) Wikidata Revision History Dataset, compiled from the edit history of Wikidata to investigate bot editing behavior and its impact on diversity, all of which are freely available online. The insights gained from the mapping study reveal the growing popularity of Wikidata in the research community and its various application areas, indicative of its progress toward the ultimate goal of reaching the global community. However, there is currently no research addressing the topic of diversity in Wikidata, which could shed light on its capacity to serve a diverse global population. To address this gap, this dissertation proposes a diversity measurement concept that defines diversity in a KB context in terms of variety, balance, and disparity and is capable of assessing diversity in a KB from two main angles: user and data. The application of this concept on the domains and classes of the Wikidata Revision History Dataset exposes imbalanced content distribution across Wikidata domains, which indicates low data diversity in Wikidata domains. Further analysis discloses that bots have been active since the inception of Wikidata, and the community embraces their involvement in content editing tasks, often importing data from Wikipedia, which shows a low diversity of sources in bot edits. Bots and human users engage in similar editing tasks but exhibit distinct editing patterns. The findings of this thesis confirm that bots possess the potential to influence diversity within Wikidata by contributing substantial amounts of data to specific classes and domains, leading to an imbalance. However, this potential can also be harnessed to enhance coverage in classes with limited content and restore balance, thus improving diversity. Hence, this study proposes to enhance diversity through automation and demonstrate the practical implementation of the recommendations using a specific use case. In essence, this research enhances our understanding of diversity in relation to a KB, elucidates the influence of automation on data diversity, and sheds light on diversity improvement within a KB context through the usage of automation.Seit seiner Einführung im Jahr 2012 hat sich Wikidata zu der größten offenen Wissensdatenbank entwickelt, die mehr als 100 Millionen Datenelemente und über 6 Millionen registrierte Benutzer enthält. Wikidata dient als das strukturierte Rückgrat von Wikipedia, indem es Datenunstimmigkeiten angeht und sich dem Motto verschrieben hat, ’jedem überall auf der Welt zu dienen’, eine Vision, die durch die Diversität des Wissens verwirklicht wird. Trotz seiner kooperativen Natur ist die Wikidata-Community in hohem Maße auf Bots, automatisierte Konten mit Batch- Verarbeitung und schnelle Bearbeitungsrechte angewiesen, um die Mehrheit der Bearbeitungen durchzuführen. Da Wikidata seinem ersten Jahrzehnt entgegengeht, stellt sich die Frage: Wie nahe ist Wikidata daran, seine Vision, eine globale Wissensdatenbank zu werden, zu verwirklichen, und wie ausgeprägt ist seine Dienstleistung für die globale Bevölkerung? Diese Dissertation untersucht den aktuellen Status der Diversität von Wikidata, die Rolle von Bot-Eingriffen in Bezug auf Diversität und wie Bots im Kontext von Wikidata zur Verbesserung der Diversität genutzt werden können. Die in dieser Studie verwendeten Methoden sind Mapping-Studie und Inhaltsanalyse, die zur Entwicklung von drei Datensätzen geführt haben: 1) Wikidata Research Articles Dataset, die die Literatur zu Wikidata aus dem ersten Jahrzehnt aus Online-Datenbanken umfasst, um den aktuellen Stand zu untersuchen; 2) Requestfor- Permission Dataset, der auf den Seiten zur Beantragung von Bot-Rechten auf der Wikidata-Website basiert, um Bots aus der Perspektive der Gemeinschaft zu untersuchen; und 3)Wikidata Revision History Dataset, der aus der Bearbeitungshistorie von Wikidata zusammengestellt wurde, um das Bearbeitungsverhalten von Bots zu untersuchen und dessen Auswirkungen auf die Diversität, die alle online frei verfügbar sind. Die Erkenntnisse aus der Mapping-Studie zeigen die wachsende Beliebtheit von Wikidata in der Forschungsgemeinschaft und in verschiedenen Anwendungsbereichen, was auf seinen Fortschritt hin zur letztendlichen Zielsetzung hindeutet, die globale Gemeinschaft zu erreichen. Es gibt jedoch derzeit keine Forschung, die sich mit dem Thema der Diversität in Wikidata befasst und Licht auf seine Fähigkeit werfen könnte, eine vielfältige globale Bevölkerung zu bedienen. Um diese Lücke zu schließen, schlägt diese Dissertation ein Konzept zur Messung der Diversität vor, das die Diversität im Kontext einer Wissensbasis anhand von Vielfalt, Balance und Diskrepanz definiert und in der Lage ist, die Diversität aus zwei Hauptperspektiven zu bewerten: Benutzer und Daten. Die Anwendung dieses Konzepts auf die Bereiche und Klassen des Wikidata Revision History Dataset zeigt eine unausgewogene Verteilung des Inhalts über die Bereiche von Wikidata auf, was auf eine geringe Diversität der Daten in den Bereichen von Wikidata hinweist. Weitere Analysen zeigen, dass Bots seit der Gründung von Wikidata aktiv waren und von der Gemeinschaft inhaltliche Bearbeitungsaufgaben angenommen werden, oft mit Datenimporten aus Wikipedia, was auf eine geringe Diversität der Quellen bei Bot-Bearbeitungen hinweist. Bots und menschliche Benutzer führen ähnliche Bearbeitungsaufgaben aus, zeigen jedoch unterschiedliche Bearbeitungsmuster. Die Ergebnisse dieser Dissertation bestätigen, dass Bots das Potenzial haben, die Diversität in Wikidata zu beeinflussen, indem sie bedeutende Datenmengen zu bestimmten Klassen und Bereichen beitragen, was zu einer Ungleichgewichtung führt. Dieses Potenzial kann jedoch auch genutzt werden, um die Abdeckung in Klassen mit begrenztem Inhalt zu verbessern und das Gleichgewicht wiederherzustellen, um die Diversität zu verbessern. Daher schlägt diese Studie vor, die Diversität durch Automatisierung zu verbessern und die praktische Umsetzung der Empfehlungen anhand eines spezifischen Anwendungsfalls zu demonstrieren. Kurz gesagt trägt diese Forschung dazu bei, unser Verständnis der Diversität im Kontext einer Wissensbasis zu vertiefen, wirft Licht auf den Einfluss von Automatisierung auf die Diversität von Daten und zeigt die Verbesserung der Diversität im Kontext einer Wissensbasis durch die Verwendung von Automatisierung auf

Institutional Repository of the Freie Universität Berlin

Closing Information Gaps with Need-driven Knowledge Sharing

Author: Happel Hans-Jörg
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2018
Field of study

Informationslücken schließen durch bedarfsgetriebenen Wissensaustausch Systeme zum asynchronen Wissensaustausch – wie Intranets, Wikis oder Dateiserver – leiden häufig unter mangelnden Nutzerbeiträgen. Ein Hauptgrund dafür ist, dass Informationsanbieter von Informationsuchenden entkoppelt, und deshalb nur wenig über deren Informationsbedarf gewahr sind. Zentrale Fragen des Wissensmanagements sind daher, welches Wissen besonders wertvoll ist und mit welchen Mitteln Wissensträger dazu motiviert werden können, es zu teilen. Diese Arbeit entwirft dazu den Ansatz des bedarfsgetriebenen Wissensaustauschs (NKS), der aus drei Elementen besteht. Zunächst werden dabei Indikatoren für den Informationsbedarf erhoben – insbesondere Suchanfragen – über deren Aggregation eine fortlaufende Prognose des organisationalen Informationsbedarfs (OIN) abgeleitet wird. Durch den Abgleich mit vorhandenen Informationen in persönlichen und geteilten Informationsräumen werden daraus organisationale Informationslücken (OIG) ermittelt, die auf fehlende Informationen hindeuten. Diese Lücken werden mit Hilfe so genannter Mediationsdienste und Mediationsräume transparent gemacht. Diese helfen Aufmerksamkeit für organisationale Informationsbedürfnisse zu schaffen und den Wissensaustausch zu steuern. Die konkrete Umsetzung von NKS wird durch drei unterschiedliche Anwendungen illustriert, die allesamt auf bewährten Wissensmanagementsystemen aufbauen. Bei der Inversen Suche handelt es sich um ein Werkzeug das Wissensträgern vorschlägt Dokumente aus ihrem persönlichen Informationsraum zu teilen, um damit organisationale Informationslücken zu schließen. Woogle erweitert herkömmliche Wiki-Systeme um Steuerungsinstrumente zur Erkennung und Priorisierung fehlender Informationen, so dass die Weiterentwicklung der Wiki-Inhalte nachfrageorientiert gestaltet werden kann. Auf ähnliche Weise steuert Semantic Need, eine Erweiterung für Semantic MediaWiki, die Erfassung von strukturierten, semantischen Daten basierend auf Informationsbedarf der in Form strukturierter Anfragen vorliegt. Die Umsetzung und Evaluation der drei Werkzeuge zeigt, dass bedarfsgetriebener Wissensaustausch technisch realisierbar ist und eine wichtige Ergänzung für das Wissensmanagement sein kann. Darüber hinaus bietet das Konzept der Mediationsdienste und Mediationsräume einen Rahmen für die Analyse und Gestaltung von Werkzeugen gemäß der NKS-Prinzipien. Schließlich liefert der hier vorstellte Ansatz auch Impulse für die Weiterentwicklung von Internetdiensten und -Infrastrukturen wie der Wikipedia oder dem Semantic Web

KITopen

Large-Scale Multilingual Knowledge Extraction, Publishing and Quality Assessment: The case of DBpedia

Author: Kontokostas Dimitrios
Publication venue
Publication date: 04/09/2018
Field of study

Qucosa - Publikationsserver der Universität Leipzig

Resource discovery in heterogeneous digital content environments

Author: Macgregor George.
Publication venue
Publication date: 01/01/2020
Field of study

The concept of 'resource discovery' is central to our understanding of how users explore, navigate, locate and retrieve information resources. This submission for a PhD by Published Works examines a series of 11 related works which explore topics pertaining to resource discovery, each demonstrating heterogeneity in their digital discovery context. The assembled works are prefaced by nine chapters which seek to review and critically analyse the contribution of each work, as well as provide contextualization within the wider body of research literature. A series of conceptual sub-themes is used to organize and structure the works and the accompanying critical commentary. The thesis first begins by examining issues in distributed discovery contexts by studying collection level metadata (CLM), its application in 'information landscaping' techniques, and its relationship to the efficacy of federated item-level search tools. This research narrative continues but expands in the later works and commentary to consider the application of Knowledge Organization Systems (KOS), particularly within Semantic Web and machine interface contexts, with investigations of semantically aware terminology services in distributed discovery. The necessary modelling of data structures to support resource discovery - and its associated functionalities within digital libraries and repositories - is then considered within the novel context of technology-supported curriculum design repositories, where questions of human-computer interaction (HCI) are also examined. The final works studied as part of the thesis are those which investigate and evaluate the efficacy of open repositories in exposing knowledge commons to resource discovery via web search agents. Through the analysis of the collected works it is possible to identify a unifying theory of resource discovery, with the proposed concept of (meta)data alignment described and presented with a visual model. This analysis assists in the identification of a number of research topics worthy of further research; but it also highlights an incremental transition by the present author, from using research to inform the development of technologies designed to support or facilitate resource discovery, particularly at a 'meta' level, to the application of specific technologies to address resource discovery issues in a local context. Despite this variation the research narrative has remained focussed on topics surrounding resource discovery in heterogeneous digital content environments and is noted as having generated a coherent body of work. Separate chapters are used to consider the methodological approaches adopted in each work and the contribution made to research knowledge and professional practice.The concept of 'resource discovery' is central to our understanding of how users explore, navigate, locate and retrieve information resources. This submission for a PhD by Published Works examines a series of 11 related works which explore topics pertaining to resource discovery, each demonstrating heterogeneity in their digital discovery context. The assembled works are prefaced by nine chapters which seek to review and critically analyse the contribution of each work, as well as provide contextualization within the wider body of research literature. A series of conceptual sub-themes is used to organize and structure the works and the accompanying critical commentary. The thesis first begins by examining issues in distributed discovery contexts by studying collection level metadata (CLM), its application in 'information landscaping' techniques, and its relationship to the efficacy of federated item-level search tools. This research narrative continues but expands in the later works and commentary to consider the application of Knowledge Organization Systems (KOS), particularly within Semantic Web and machine interface contexts, with investigations of semantically aware terminology services in distributed discovery. The necessary modelling of data structures to support resource discovery - and its associated functionalities within digital libraries and repositories - is then considered within the novel context of technology-supported curriculum design repositories, where questions of human-computer interaction (HCI) are also examined. The final works studied as part of the thesis are those which investigate and evaluate the efficacy of open repositories in exposing knowledge commons to resource discovery via web search agents. Through the analysis of the collected works it is possible to identify a unifying theory of resource discovery, with the proposed concept of (meta)data alignment described and presented with a visual model. This analysis assists in the identification of a number of research topics worthy of further research; but it also highlights an incremental transition by the present author, from using research to inform the development of technologies designed to support or facilitate resource discovery, particularly at a 'meta' level, to the application of specific technologies to address resource discovery issues in a local context. Despite this variation the research narrative has remained focussed on topics surrounding resource discovery in heterogeneous digital content environments and is noted as having generated a coherent body of work. Separate chapters are used to consider the methodological approaches adopted in each work and the contribution made to research knowledge and professional practice

STAX (Strathclyde Repository)

Linked Data Supported Information Retrieval

Author: Waitelonis Jörg
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2018
Field of study

Um Inhalte im World Wide Web ausfindig zu machen, sind Suchmaschienen nicht mehr wegzudenken. Semantic Web und Linked Data Technologien ermöglichen ein detaillierteres und eindeutiges Strukturieren der Inhalte und erlauben vollkommen neue Herangehensweisen an die Lösung von Information Retrieval Problemen. Diese Arbeit befasst sich mit den Möglichkeiten, wie Information Retrieval Anwendungen von der Einbeziehung von Linked Data profitieren können. Neue Methoden der computer-gestützten semantischen Textanalyse, semantischen Suche, Informationspriorisierung und -visualisierung werden vorgestellt und umfassend evaluiert. Dabei werden Linked Data Ressourcen und ihre Beziehungen in die Verfahren integriert, um eine Steigerung der Effektivität der Verfahren bzw. ihrer Benutzerfreundlichkeit zu erzielen. Zunächst wird eine Einführung in die Grundlagen des Information Retrieval und Linked Data gegeben. Anschließend werden neue manuelle und automatisierte Verfahren zum semantischen Annotieren von Dokumenten durch deren Verknüpfung mit Linked Data Ressourcen vorgestellt (Entity Linking). Eine umfassende Evaluation der Verfahren wird durchgeführt und das zu Grunde liegende Evaluationssystem umfangreich verbessert. Aufbauend auf den Annotationsverfahren werden zwei neue Retrievalmodelle zur semantischen Suche vorgestellt und evaluiert. Die Verfahren basieren auf dem generalisierten Vektorraummodell und beziehen die semantische Ähnlichkeit anhand von taxonomie-basierten Beziehungen der Linked Data Ressourcen in Dokumenten und Suchanfragen in die Berechnung der Suchergebnisrangfolge ein. Mit dem Ziel die Berechnung von semantischer Ähnlichkeit weiter zu verfeinern, wird ein Verfahren zur Priorisierung von Linked Data Ressourcen vorgestellt und evaluiert. Darauf aufbauend werden Visualisierungstechniken aufgezeigt mit dem Ziel, die Explorierbarkeit und Navigierbarkeit innerhalb eines semantisch annotierten Dokumentenkorpus zu verbessern. Hierfür werden zwei Anwendungen präsentiert. Zum einen eine Linked Data basierte explorative Erweiterung als Ergänzung zu einer traditionellen schlüsselwort-basierten Suchmaschine, zum anderen ein Linked Data basiertes Empfehlungssystem

KITopen

Crossing Experiences in Digital Epigraphy: From Practice to Discipline

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 31/12/2018
Field of study

Although a relevant number of projects digitizing inscriptions are under development or have been recently accomplished, Digital Epigraphy is not yet considered to be a proper discipline and there are still no regular occasions to meet and discuss. By collecting contributions on nineteen projects – very diversified for geographic and chronological context, for script and language, and for typology of digital output – this volume intends to point out the methodological issues which are specific to the application of information technologies to epigraphy. The first part of the volume is focused on data modelling and encoding, which are conditioned by the specific features of different scripts and languages, and deeply influence the possibility to perform searches on texts and the approach to the lexicographic study of such under-resourced languages. The second part of the volume is dedicated to the initiatives aimed at fostering aggregation, dissemination and the reuse of epigraphic materials, and to discuss issues of interoperability. The common theme of the volume is the relationship between the compliance with the theoretic tools and the methodologies developed by each different tradition of studies, and, on the other side, the necessity of adopting a common framework in order to produce commensurable and shareable results. The final question is whether the computational approach is changing the way epigraphy is studied, to the extent of renovating the discipline on the basis of new, unexplored questions

Scientific Open-access Literature Archive and Repository

Linked open data e ontologie per la descrizione del patrimonio culturale: criteri per la progettazione di un registro ragionato

Author: Veninata Chiara
Publication venue
Publication date: 15/07/2019
Field of study

La tesi affronta il tema del semantic web e della pubblicazione delle informazioni relative al patrimonio culturale in modalità linked open data. In particolare, oggetto dell’attività di ricerca sono i registri di ontologie, vale a dire quegli strumenti che descrivono formalmente i modelli ontologici disponibili sul web e ne agevolano il reperimento e la valutazione, incentivandone il riuso e facilitando i processi di allineamento semantico e di interoperabilità. I registri di ontologie rispondono in modo efficace all’assenza di strumenti di riferimento e di orientamento nei processi di modellazione concettuale delle risorse informative e sono stati sperimentati con successo in diversi domini, ma sono ancora inediti in ambito culturale. L’esame puntuale delle iniziative condotte nell’ultimo decennio nell’ambito dei beni culturali ha evidenziato con chiarezza la mancanza di un assetto epistemologico consolidato nella modellazione concettuale delle risorse informative, a fronte delle numerose ontologie realizzate in funzione dei molteplici progetti di pubblicazione di linked open data. Di conseguenza, risulta tutt’altro che agevole conoscere esaustivamente tutte le ontologie disponibili in relazione al proprio abito di interesse ed ottenere in maniera agevole e sistematica una valutazione attendibile circa la loro capacità rappresentativa e il loro grado di interoperabilità semantica. L’analisi dei principali registri di ontologie finora realizzati al di fuori del dominio dei beni culturali ha consentito di individuare e definire i requisiti di un registro di ontologie per i beni culturali (denominato CLOVER, Culture – Linked Open Vocabularies – Extensible Registry), e di elaborarne la relativa ontologia. L’ontologia ADMS-AP_IT (Asset Description Metadata Schema – Application Profile – Italy) è stata redatta a seguito di un’analisi sistematica e di una valutazione critica di preesistenti ontologie concepite per scopi similari. Essa è stata sottoposta ad AgID, che l’ha inclusa nella rete di ontologie e vocabolari controllati della pubblica amministrazione detta OntoPiA. Tale ontologia rappresenta un punto di arrivo del progetto di ricerca, ma anche una base di partenza per approfondire l'indagine su tali temi: in questo senso, la sua inclusione nella rete OntoPiA di ontologie e vocabolari controllati della pubblica amministrazione si configura come un'opportunità rilevante per sperimentarne l'applicabilità e migliorarne la qualità

Archivio della ricerca- Università di Roma La Sapienza