79 research outputs found
A survey of guidelines and best practices for the generation, interlinking, publication, and validation of linguistic linked data
This article discusses a survey carried out within the NexusLinguarum COST Action which aimed to give an overview of existing guidelines (GLs) and best practices (BPs) in linguistic linked data. In particular it focused on four core tasks in the production/publication of linked data: generation, interlinking, publication, and validation. We discuss the importance of GLs and BPs for LLD before describing the survey and its results in full. Finally we offer a number of directions for future work in order to address the findings of the survey
Bibliographic Control in the Digital Ecosystem
With the contributions of international experts, the book aims to explore the new boundaries of universal bibliographic control. Bibliographic control is radically changing because the bibliographic universe is radically changing: resources, agents, technologies, standards and practices. Among the main topics addressed: library cooperation networks; legal deposit; national bibliographies; new tools and standards (IFLA LRM, RDA, BIBFRAME); authority control and new alliances (Wikidata, Wikibase, Identifiers); new ways of indexing resources (artificial intelligence); institutional repositories; new book supply chain; βdiscoverabilityβ in the IIIF digital ecosystem; role of thesauri and ontologies in the digital ecosystem; bibliographic control and search engines
Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021
The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at UniversitΓ degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown
ΠΠΊΡΡΠΆΠ΅ΡΠ΅ Π·Π° Π°Π½Π°Π»ΠΈΠ·Ρ ΠΈ ΠΎΡΠ΅Π½Ρ ΠΊΠ²Π°Π»ΠΈΡΠ΅ΡΠ° Π²Π΅Π»ΠΈΠΊΠΈΡ ΠΈ ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΡ ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ°
Linking and publishing data in the Linked Open Data format increases the interoperability
and discoverability of resources over the Web. To accomplish this, the process comprises
several design decisions, based on the Linked Data principles that, on one hand, recommend to
use standards for the representation and the access to data on the Web, and on the other hand
to set hyperlinks between data from different sources.
Despite the efforts of the World Wide Web Consortium (W3C), being the main international
standards organization for the World Wide Web, there is no one tailored formula for publishing
data as Linked Data. In addition, the quality of the published Linked Open Data (LOD) is a
fundamental issue, and it is yet to be thoroughly managed and considered.
In this doctoral thesis, the main objective is to design and implement a novel framework for
selecting, analyzing, converting, interlinking, and publishing data from diverse sources,
simultaneously paying great attention to quality assessment throughout all steps and modules
of the framework. The goal is to examine whether and to what extent are the Semantic Web
technologies applicable for merging data from different sources and enabling end-users to
obtain additional information that was not available in individual datasets, in addition to the
integration into the Semantic Web community space. Additionally, the Ph.D. thesis intends to
validate the applicability of the process in the specific and demanding use case, i.e. for creating
and publishing an Arabic Linked Drug Dataset, based on open drug datasets from selected
Arabic countries and to discuss the quality issues observed in the linked data life-cycle. To that
end, in this doctoral thesis, a Semantic Data Lake was established in the pharmaceutical domain
that allows further integration and developing different business services on top of the
integrated data sources. Through data representation in an open machine-readable format, the
approach offers an optimum solution for information and data dissemination for building
domain-specific applications, and to enrich and gain value from the original dataset. This thesis
showcases how the pharmaceutical domain benefits from the evolving research trends for
building competitive advantages. However, as it is elaborated in this thesis, a better
understanding of the specifics of the Arabic language is required to extend linked data
technologies utilization in targeted Arabic organizations.ΠΠΎΠ²Π΅Π·ΠΈΠ²Π°ΡΠ΅ ΠΈ ΠΎΠ±ΡΠ°Π²ΡΠΈΠ²Π°ΡΠ΅ ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° Ρ ΡΠΎΡΠΌΠ°ΡΡ "ΠΠΎΠ²Π΅Π·Π°Π½ΠΈ ΠΎΡΠ²ΠΎΡΠ΅Π½ΠΈ ΠΏΠΎΠ΄Π°ΡΠΈ" (Π΅Π½Π³.
Linked Open Data) ΠΏΠΎΠ²Π΅ΡΠ°Π²Π° ΠΈΠ½ΡΠ΅ΡΠΎΠΏΠ΅ΡΠ°Π±ΠΈΠ»Π½ΠΎΡΡ ΠΈ ΠΌΠΎΠ³ΡΡΠ½ΠΎΡΡΠΈ Π·Π° ΠΏΡΠ΅ΡΡΠ°ΠΆΠΈΠ²Π°ΡΠ΅ ΡΠ΅ΡΡΡΡΠ°
ΠΏΡΠ΅ΠΊΠΎ Web-Π°. ΠΡΠΎΡΠ΅Ρ ΡΠ΅ Π·Π°ΡΠ½ΠΎΠ²Π°Π½ Π½Π° Linked Data ΠΏΡΠΈΠ½ΡΠΈΠΏΠΈΠΌΠ° (W3C, 2006) ΠΊΠΎΡΠΈ ΡΠ° ΡΠ΅Π΄Π½Π΅
ΡΡΡΠ°Π½Π΅ Π΅Π»Π°Π±ΠΎΡΠΈΡΠ° ΡΡΠ°Π½Π΄Π°ΡΠ΄Π΅ Π·Π° ΠΏΡΠ΅Π΄ΡΡΠ°Π²ΡΠ°ΡΠ΅ ΠΈ ΠΏΡΠΈΡΡΡΠΏ ΠΏΠΎΠ΄Π°ΡΠΈΠΌΠ° Π½Π° WΠ΅Π±Ρ (RDF, OWL,
SPARQL), Π° ΡΠ° Π΄ΡΡΠ³Π΅ ΡΡΡΠ°Π½Π΅, ΠΏΡΠΈΠ½ΡΠΈΠΏΠΈ ΡΡΠ³Π΅ΡΠΈΡΡ ΠΊΠΎΡΠΈΡΡΠ΅ΡΠ΅ Ρ
ΠΈΠΏΠ΅ΡΠ²Π΅Π·Π° ΠΈΠ·ΠΌΠ΅ΡΡ ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ°
ΠΈΠ· ΡΠ°Π·Π»ΠΈΡΠΈΡΠΈΡ
ΠΈΠ·Π²ΠΎΡΠ°.
Π£ΠΏΡΠΊΠΎΡ Π½Π°ΠΏΠΎΡΠΈΠΌΠ° W3C ΠΊΠΎΠ½Π·ΠΎΡΡΠΈΡΡΠΌΠ° (W3C ΡΠ΅ Π³Π»Π°Π²Π½Π° ΠΌΠ΅ΡΡΠ½Π°ΡΠΎΠ΄Π½Π° ΠΎΡΠ³Π°Π½ΠΈΠ·Π°ΡΠΈΡΠ° Π·Π°
ΡΡΠ°Π½Π΄Π°ΡΠ΄Π΅ Π·Π° Web-Ρ), Π½Π΅ ΠΏΠΎΡΡΠΎΡΠΈ ΡΠ΅Π΄ΠΈΠ½ΡΡΠ²Π΅Π½Π° ΡΠΎΡΠΌΡΠ»Π° Π·Π° ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½ΡΠ°ΡΠΈΡΡ ΠΏΡΠΎΡΠ΅ΡΠ°
ΠΎΠ±ΡΠ°Π²ΡΠΈΠ²Π°ΡΠ΅ ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° Ρ Linked Data ΡΠΎΡΠΌΠ°ΡΡ. Π£Π·ΠΈΠΌΠ°ΡΡΡΠΈ Ρ ΠΎΠ±Π·ΠΈΡ Π΄Π° ΡΠ΅ ΠΊΠ²Π°Π»ΠΈΡΠ΅Ρ
ΠΎΠ±ΡΠ°Π²ΡΠ΅Π½ΠΈΡ
ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΡ
ΠΎΡΠ²ΠΎΡΠ΅Π½ΠΈΡ
ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° ΠΎΠ΄Π»ΡΡΡΡΡΡΠΈ Π·Π° Π±ΡΠ΄ΡΡΠΈ ΡΠ°Π·Π²ΠΎΡ Web-Π°, Ρ ΠΎΠ²ΠΎΡ
Π΄ΠΎΠΊΡΠΎΡΡΠΊΠΎΡ Π΄ΠΈΡΠ΅ΡΡΠ°ΡΠΈΡΠΈ, Π³Π»Π°Π²Π½ΠΈ ΡΠΈΡ ΡΠ΅ (1) Π΄ΠΈΠ·Π°ΡΠ½ ΠΈ ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½ΡΠ°ΡΠΈΡΠ° ΠΈΠ½ΠΎΠ²Π°ΡΠΈΠ²Π½ΠΎΠ³ ΠΎΠΊΠ²ΠΈΡΠ°
Π·Π° ΠΈΠ·Π±ΠΎΡ, Π°Π½Π°Π»ΠΈΠ·Ρ, ΠΊΠΎΠ½Π²Π΅ΡΠ·ΠΈΡΡ, ΠΌΠ΅ΡΡΡΠΎΠ±Π½ΠΎ ΠΏΠΎΠ²Π΅Π·ΠΈΠ²Π°ΡΠ΅ ΠΈ ΠΎΠ±ΡΠ°Π²ΡΠΈΠ²Π°ΡΠ΅ ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° ΠΈΠ·
ΡΠ°Π·Π»ΠΈΡΠΈΡΠΈΡ
ΠΈΠ·Π²ΠΎΡΠ° ΠΈ (2) Π°Π½Π°Π»ΠΈΠ·Π° ΠΏΡΠΈΠΌΠ΅Π½Π° ΠΎΠ²ΠΎΠ³ ΠΏΡΠΈΡΡΡΠΏΠ° Ρ ΡΠ°ΡΠΌΠ°ΡeΡΡΡΠΊΠΎΠΌ Π΄ΠΎΠΌΠ΅Π½Ρ.
ΠΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½Π° Π΄ΠΎΠΊΡΠΎΡΡΠΊΠ° Π΄ΠΈΡΠ΅ΡΡΠ°ΡΠΈΡΠ° Π΄Π΅ΡΠ°ΡΠ½ΠΎ ΠΈΡΡΡΠ°ΠΆΡΡΠ΅ ΠΏΠΈΡΠ°ΡΠ΅ ΠΊΠ²Π°Π»ΠΈΡΠ΅ΡΠ° Π²Π΅Π»ΠΈΠΊΠΈΡ
ΠΈ
ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΡ
Π΅ΠΊΠΎΡΠΈΡΡΠ΅ΠΌΠ° ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° (Π΅Π½Π³. Linked Data Ecosystems), ΡΠ·ΠΈΠΌΠ°ΡΡΡΠΈ Ρ ΠΎΠ±Π·ΠΈΡ
ΠΌΠΎΠ³ΡΡΠ½ΠΎΡΡ ΠΏΠΎΠ½ΠΎΠ²Π½ΠΎΠ³ ΠΊΠΎΡΠΈΡΡΠ΅ΡΠ° ΠΎΡΠ²ΠΎΡΠ΅Π½ΠΈΡ
ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ°. Π Π°Π΄ ΡΠ΅ ΠΌΠΎΡΠΈΠ²ΠΈΡΠ°Π½ ΠΏΠΎΡΡΠ΅Π±ΠΎΠΌ Π΄Π° ΡΠ΅
ΠΎΠΌΠΎΠ³ΡΡΠΈ ΠΈΡΡΡΠ°ΠΆΠΈΠ²Π°ΡΠΈΠΌΠ° ΠΈΠ· Π°ΡΠ°ΠΏΡΠΊΠΈΡ
Π·Π΅ΠΌΠ°ΡΠ° Π΄Π° ΡΠΏΠΎΡΡΠ΅Π±ΠΎΠΌ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠΊΠΈΡ
Π²Π΅Π± ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΡΠ°
ΠΏΠΎΠ²Π΅ΠΆΡ ΡΠ²ΠΎΡΠ΅ ΠΏΠΎΠ΄Π°ΡΠΊΠ΅ ΡΠ° ΠΎΡΠ²ΠΎΡΠ΅Π½ΠΈΠΌ ΠΏΠΎΠ΄Π°ΡΠΈΠΌΠ°, ΠΊΠ°ΠΎ Π½ΠΏΡ. DBpedia-ΡΠΎΠΌ. Π¦ΠΈΡ ΡΠ΅ Π΄Π° ΡΠ΅ ΠΈΡΠΏΠΈΡΠ°
Π΄Π° Π»ΠΈ ΠΎΡΠ²ΠΎΡΠ΅Π½ΠΈ ΠΏΠΎΠ΄Π°ΡΠΈ ΠΈΠ· ΠΡΠ°ΠΏΡΠΊΠΈΡ
Π·Π΅ΠΌΠ°ΡΠ° ΠΎΠΌΠΎΠ³ΡΡΠ°Π²Π°ΡΡ ΠΊΡΠ°ΡΡΠΈΠΌ ΠΊΠΎΡΠΈΡΠ½ΠΈΡΠΈΠΌΠ° Π΄Π° Π΄ΠΎΠ±ΠΈΡΡ
Π΄ΠΎΠ΄Π°ΡΠ½Π΅ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΡΠ΅ ΠΊΠΎΡΠ΅ Π½ΠΈΡΡ Π΄ΠΎΡΡΡΠΏΠ½Π΅ Ρ ΠΏΠΎΡΠ΅Π΄ΠΈΠ½Π°ΡΠ½ΠΈΠΌ ΡΠΊΡΠΏΠΎΠ²ΠΈΠΌΠ° ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ°, ΠΏΠΎΡΠ΅Π΄
ΠΈΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΡΠ΅ Ρ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠΊΠΈ WΠ΅Π± ΠΏΡΠΎΡΡΠΎΡ.
ΠΠΎΠΊΡΠΎΡΡΠΊΠ° Π΄ΠΈΡΠ΅ΡΡΠ°ΡΠΈΡΠ° ΠΏΡΠ΅Π΄Π»Π°ΠΆΠ΅ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ»ΠΎΠ³ΠΈΡΡ Π·Π° ΡΠ°Π·Π²ΠΎΡ Π°ΠΏΠ»ΠΈΠΊΠ°ΡΠΈΡΠ΅ Π·Π° ΡΠ°Π΄ ΡΠ°
ΠΏΠΎΠ²Π΅Π·Π°Π½ΠΈΠΌ (Linked) ΠΏΠΎΠ΄Π°ΡΠΈΠΌΠ° ΠΈ ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½ΡΠΈΡΠ° ΡΠΎΡΡΠ²Π΅ΡΡΠΊΠΎ ΡΠ΅ΡΠ΅ΡΠ΅ ΠΊΠΎΡΠ΅ ΠΎΠΌΠΎΠ³ΡΡΡΡΠ΅
ΠΏΡΠ΅ΡΡΠ°ΠΆΠΈΠ²Π°ΡΠ΅ ΠΊΠΎΠ½ΡΠΎΠ»ΠΈΠ΄ΠΎΠ²Π°Π½ΠΎΠ³ ΡΠΊΡΠΏΠ° ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° ΠΎ Π»Π΅ΠΊΠΎΠ²ΠΈΠΌΠ° ΠΈΠ· ΠΈΠ·Π°Π±ΡΠ°Π½ΠΈΡ
Π°ΡΠ°ΠΏΡΠΊΠΈΡ
Π·Π΅ΠΌΠ°ΡΠ°. ΠΠΎΠ½ΡΠΎΠ»ΠΈΠ΄ΠΎΠ²Π°Π½ΠΈ ΡΠΊΡΠΏ ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° ΡΠ΅ ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½ΡΠΈΡΠ°Π½ Ρ ΠΎΠ±Π»ΠΈΠΊΡ Π‘Π΅ΠΌΠ°Π½ΡΠΈΡΠΊΠΎΠ³ ΡΠ΅Π·Π΅ΡΠ°
ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° (Π΅Π½Π³. Semantic Data Lake).
ΠΠ²Π° ΡΠ΅Π·Π° ΠΏΠΎΠΊΠ°Π·ΡΡΠ΅ ΠΊΠ°ΠΊΠΎ ΡΠ°ΡΠΌΠ°ΡΠ΅ΡΡΡΠΊΠ° ΠΈΠ½Π΄ΡΡΡΡΠΈΡΠ° ΠΈΠΌΠ° ΠΊΠΎΡΠΈΡΡΠΈ ΠΎΠ΄ ΠΏΡΠΈΠΌΠ΅Π½Π΅
ΠΈΠ½ΠΎΠ²Π°ΡΠΈΠ²Π½ΠΈΡ
ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΡΠ° ΠΈ ΠΈΡΡΡΠ°ΠΆΠΈΠ²Π°ΡΠΊΠΈΡ
ΡΡΠ΅Π½Π΄ΠΎΠ²Π° ΠΈΠ· ΠΎΠ±Π»Π°ΡΡΠΈ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠΊΠΈΡ
ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΡΠ°. ΠΠ΅ΡΡΡΠΈΠΌ, ΠΊΠ°ΠΊΠΎ ΡΠ΅ Π΅Π»Π°Π±ΠΎΡΠΈΡΠ°Π½ΠΎ Ρ ΠΎΠ²ΠΎΡ ΡΠ΅Π·ΠΈ, ΠΏΠΎΡΡΠ΅Π±Π½ΠΎ ΡΠ΅ Π±ΠΎΡΠ΅ ΡΠ°Π·ΡΠΌΠ΅Π²Π°ΡΠ΅
ΡΠΏΠ΅ΡΠΈΡΠΈΡΠ½ΠΎΡΡΠΈ Π°ΡΠ°ΠΏΡΠΊΠΎΠ³ ΡΠ΅Π·ΠΈΠΊΠ° Π·Π° ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½ΡΠ°ΡΠΈΡΡ Linked Data Π°Π»Π°ΡΠ° ΠΈ ΡΡΡ
ΠΎΠ²Ρ ΠΏΡΠΈΠΌΠ΅Π½Ρ
ΡΠ° ΠΏΠΎΠ΄Π°ΡΠΈΠΌΠ° ΠΈΠ· ΠΡΠ°ΠΏΡΠΊΠΈΡ
Π·Π΅ΠΌΠ°ΡΠ°
Cultural Heritage on line
The 2nd International Conference "Cultural Heritage online β Empowering users: an active role for user communities" was held in Florence on 15-16 December 2009. It was organised by the Fondazione Rinascimento Digitale, the Italian Ministry for Cultural Heritage and Activities and the Library of Congress, through the National Digital Information Infrastructure and Preservation Program - NDIIP partners. The conference topics were related to digital libraries, digital preservation and the changing paradigms, focussing on user needs and expectations, analysing how to involve users and the cultural heritage community in creating and sharing digital resources. The sessions investigated also new organisational issues and roles, and cultural and economic limits from an international perspective
Exploiting general-purpose background knowledge for automated schema matching
The schema matching task is an integral part of the data integration process. It is usually the first step in integrating data. Schema matching is typically very complex and time-consuming. It is, therefore, to the largest part, carried out by humans. One reason for the low amount of automation is the fact that schemas are often defined with deep background knowledge that is not itself present within the schemas. Overcoming the problem of missing background knowledge is a core challenge in automating the data integration process.
In this dissertation, the task of matching semantic models, so-called ontologies, with the help of external background knowledge is investigated in-depth in Part I. Throughout this thesis, the focus lies on large, general-purpose resources since domain-specific resources are rarely available for most domains. Besides new knowledge resources, this thesis also explores new strategies to exploit such resources.
A technical base for the development and comparison of matching systems is presented in Part II. The framework introduced here allows for simple and modularized matcher development (with background knowledge sources) and for extensive evaluations of matching systems.
One of the largest structured sources for general-purpose background knowledge are knowledge graphs which have grown significantly in size in recent years. However, exploiting such graphs is not trivial. In Part III, knowledge graph em- beddings are explored, analyzed, and compared. Multiple improvements to existing approaches are presented.
In Part IV, numerous concrete matching systems which exploit general-purpose background knowledge are presented. Furthermore, exploitation strategies and resources are analyzed and compared. This dissertation closes with a perspective on real-world applications
- β¦