9 research outputs found

    From MARC silos to Linked Data silos?

    Get PDF
    Libraries are opening up their bibliographic metadata as Linked Data. However, they have all used different data models for structuring their bibliographic data. Some are using a FRBR-based model with several layers of entities while others use flat, record-oriented data models. The proliferation of data models limits the reusability of bibliographic data. In effect, libraries have moved from MARC silos to Linked Data silos of incompatible data models. Data sets can be difficult to combine and reuse. Small modelling differences may be overcome by schema mappings, but it is not clear that interoperability has improved overall. We present a survey of published bibliographic Linked Data, the data models proposed for representing bibliographic data as RDF, and tools used for conversion from MARC. Also, the approach of the National Library of Finland is discussed. Seit einiger Zeit stellen Bibliotheken ihre bibliografischen Metadadaten verstärkt offen in Form von Linked Data zur Verfügung. Dabei kommen jedoch ganz unterschiedliche Modelle für die Strukturierung der bibliografischen Daten zur Anwendung. Manche Bibliotheken verwenden ein auf FRBR basierendes Modell mit mehreren Schichten von Entitäten, während andere flache, am Datensatz orientierte Modelle nutzen. Der Wildwuchs bei den Datenmodellen erschwert die Nachnutzung der bibliografischen Daten. Im Ergebnis haben die Bibliotheken die früheren MARC-Silos nur mit zueinander inkompatiblen Linked-Data-Silos vertauscht. Deshalb ist es häufig schwierig, Datensets miteinander zu kombinieren und nachzunutzen. Kleinere Unterschiede in der Datenmodellierung lassen sich zwar durch Schema Mappings in den Griff bekommen, doch erscheint es fraglich, ob die Interoperabilität insgesamt zugenommen hat. Der Beitrag stellt die Ergebnisse einer Studie zu verschiedenen veröffentlichten Sets von bibliografischen Daten vor. Dabei werden auch die unterschiedlichen Modelle betrachtet, um bibliografische Daten als RDF darzustellen, sowie Werkzeuge zur Erzeugung von entsprechenden Daten aus dem MARC-Format. Abschließend wird der von der Finnischen Nationalbibliothek verfolgte Ansatz behandelt

    Centenary (Internet)

    Get PDF
    The centenary events surrounding the First World War have produced a significant amount of digital content in various forms, and thus has set a precedent for how large scale post-war memorisation can be undertaken. Looking towards future centenaries, which we can assume will produce a similar volume of digital content, there is a need for project leadership to invest in the digital legacy from the very outset of projects to ensure sustainability

    Content Enrichment of Digital Libraries: Methods, Technologies and Implementations

    Get PDF
    Parallel to the establishment of the concept of a "digital library", there have been rapid developments in the fields of semantic technologies, information retrieval and artificial intelligence. The idea is to use make use of these three fields to crosslink bibliographic data, i.e., library content, and to enrich it "intelligently" with additional, especially non-library, information. By linking the contents of a library, it is possible to offer users access to semantically similar contents of different digital libraries. For instance, a list of semantically similar publications from completely different subject areas and from different digital libraries can be made accessible. In addition, the user is able to see a wider profile about authors, enriched with information such as biographical details, name alternatives, images, job titles, institute affiliations, etc. This information comes from a wide variety of sources, most of which are not library sources. In order to make such scenarios a reality, this dissertation follows two approaches. The first approach is about crosslinking digital library content in order to offer semantically similar publications based on additional information for a publication. Hence, this approach uses publication-related metadata as a basis. The aligned terms between linked open data repositories/thesauri are considered as an important starting point by considering narrower, broader and related concepts through semantic data models such as SKOS. Information retrieval methods are applied to identify publications with high semantic similarity. For this purpose, approaches of vector space models and "word embedding" are applied and analyzed comparatively. The analyses are performed in digital libraries with different thematic focuses (e.g. economy and agriculture). Using machine learning techniques, metadata is enriched, e.g. with synonyms for content keywords, in order to further improve similarity calculations. To ensure quality, the proposed approaches will be analyzed comparatively with different metadata sets, which will be assessed by experts. Through the combination of different information retrieval methods, the quality of the results can be further improved. This is especially true when user interactions offer possibilities for adjusting the search properties. In the second approach, which this dissertation pursues, author-related data are harvested in order to generate a comprehensive author profile for a digital library. For this purpose, non-library sources, such as linked data repositories (e.g. WIKIDATA) and library sources, such as authority data, are used. If such different sources are used, the disambiguation of author names via the use of already existing persistent identifiers becomes necessary. To this end, we offer an algorithmic approach to disambiguate authors, which makes use of authority data such as the Virtual International Authority File (VIAF). Referring to computer sciences, the methodological value of this dissertation lies in the combination of semantic technologies with methods of information retrieval and artificial intelligence to increase the interoperability between digital libraries and between libraries with non-library sources. By positioning this dissertation as an application-oriented contribution to improve the interoperability, two major contributions are made in the context of digital libraries: (1) The retrieval of information from different Digital Libraries can be made possible via a single access. (2) Existing information about authors is collected from different sources and aggregated into one author profile.Parallel zur Etablierung des Konzepts einer „Digitalen Bibliothek“ gab es rasante Weiterentwicklungen in den Bereichen semantischer Technologien, Information Retrieval und künstliche Intelligenz. Die Idee ist es, mit ihrer Hilfe bibliographische Daten, also Inhalte von Bibliotheken, miteinander zu vernetzen und „intelligent“ mit zusätzlichen, insbesondere nicht-bibliothekarischen Informationen anzureichern. Durch die Verknüpfung von Inhalten einer Bibliothek wird es möglich, einen Zugang für Benutzer*innen anzubieten, über den semantisch ähnliche Inhalte unterschiedlicher Digitaler Bibliotheken zugänglich werden. Beispielsweise können hierüber ausgehend von einer bestimmten Publikation eine Liste semantisch ähnlicher Publikationen ggf. aus völlig unterschiedlichen Themenfeldern und aus verschiedenen digitalen Bibliotheken zugänglich gemacht werden. Darüber hinaus können sich Nutzer*innen ein breiteres Autoren-Profil anzeigen lassen, das mit Informationen wie biographischen Angaben, Namensalternativen, Bildern, Berufsbezeichnung, Instituts-Zugehörigkeiten usw. angereichert ist. Diese Informationen kommen aus unterschiedlichsten und in der Regel nicht-bibliothekarischen Quellen. Um derartige Szenarien Realität werden zu lassen, verfolgt diese Dissertation zwei Ansätze. Der erste Ansatz befasst sich mit der Vernetzung von Inhalten Digitaler Bibliotheken, um auf Basis zusätzlicher Informationen für eine Publikation semantisch ähnliche Publikationen anzubieten. Dieser Ansatz verwendet publikationsbezogene Metadaten als Grundlage. Die verknüpften Begriffe zwischen verlinkten offenen Datenrepositorien/Thesauri werden als wichtiger Angelpunkt betrachtet, indem Unterbegriffe, Oberbegriffe und verwandten Konzepte über semantische Datenmodelle, wie SKOS, berücksichtigt werden. Methoden des Information Retrieval werden angewandt, um v.a. Publikationen mit hoher semantischer Verwandtschaft zu identifizieren. Zu diesem Zweck werden Ansätze des Vektorraummodells und des „Word Embedding“ eingesetzt und vergleichend analysiert. Die Analysen werden in Digitalen Bibliotheken mit unterschiedlichen thematischen Schwerpunkten (z.B. Wirtschaft und Landwirtschaft) durchgeführt. Durch Techniken des maschinellen Lernens werden hierfür Metadaten angereichert, z.B. mit Synonymen für inhaltliche Schlagwörter, um so Ähnlichkeitsberechnungen weiter zu verbessern. Zur Sicherstellung der Qualität werden die beiden Ansätze mit verschiedenen Metadatensätzen vergleichend analysiert wobei die Beurteilung durch Expert*innen erfolgt. Durch die Verknüpfung verschiedener Methoden des Information Retrieval kann die Qualität der Ergebnisse weiter verbessert werden. Dies trifft insbesondere auch dann zu wenn Benutzerinteraktion Möglichkeiten zur Anpassung der Sucheigenschaften bieten. Im zweiten Ansatz, den diese Dissertation verfolgt, werden autorenbezogene Daten gesammelt, verbunden mit dem Ziel, ein umfassendes Autorenprofil für eine Digitale Bibliothek zu generieren. Für diesen Zweck kommen sowohl nicht-bibliothekarische Quellen, wie Linked Data-Repositorien (z.B. WIKIDATA) und als auch bibliothekarische Quellen, wie Normdatensysteme, zum Einsatz. Wenn solch unterschiedliche Quellen genutzt werden, wird die Disambiguierung von Autorennamen über die Nutzung bereits vorhandener persistenter Identifikatoren erforderlich. Hierfür bietet sich ein algorithmischer Ansatz für die Disambiguierung von Autoren an, der Normdaten, wie die des Virtual International Authority File (VIAF) nachnutzt. Mit Bezug zur Informatik liegt der methodische Wert dieser Dissertation in der Kombination von semantischen Technologien mit Verfahren des Information Retrievals und der künstlichen Intelligenz zur Erhöhung von Interoperabilität zwischen Digitalen Bibliotheken und zwischen Bibliotheken und nicht-bibliothekarischen Quellen. Mit der Positionierung dieser Dissertation als anwendungsorientierter Beitrag zur Verbesserung von Interoperabilität werden zwei wesentliche Beiträge im Kontext Digitaler Bibliotheken geleistet: (1) Die Recherche nach Informationen aus unterschiedlichen Digitalen Bibliotheken kann über einen Zugang ermöglicht werden. (2) Vorhandene Informationen über Autor*innen werden aus unterschiedlichsten Quellen eingesammelt und zu einem Autorenprofil aggregiert

    CLARIN. The infrastructure for language resources

    Get PDF
    CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

    Report from Dagstuhl Seminar 23031: Frontiers of Information Access Experimentation for Research and Education

    Full text link
    This report documents the program and the outcomes of Dagstuhl Seminar 23031 ``Frontiers of Information Access Experimentation for Research and Education'', which brought together 37 participants from 12 countries. The seminar addressed technology-enhanced information access (information retrieval, recommender systems, natural language processing) and specifically focused on developing more responsible experimental practices leading to more valid results, both for research as well as for scientific education. The seminar brought together experts from various sub-fields of information access, namely IR, RS, NLP, information science, and human-computer interaction to create a joint understanding of the problems and challenges presented by next generation information access systems, from both the research and the experimentation point of views, to discuss existing solutions and impediments, and to propose next steps to be pursued in the area in order to improve not also our research methods and findings but also the education of the new generation of researchers and developers. The seminar featured a series of long and short talks delivered by participants, who helped in setting a common ground and in letting emerge topics of interest to be explored as the main output of the seminar. This led to the definition of five groups which investigated challenges, opportunities, and next steps in the following areas: reality check, i.e. conducting real-world studies, human-machine-collaborative relevance judgment frameworks, overcoming methodological challenges in information retrieval and recommender systems through awareness and education, results-blind reviewing, and guidance for authors.Comment: Dagstuhl Seminar 23031, report

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

    A 3D Digital Approach to the Stylistic and Typo-Technological Study of Small Figurines from Ayia Irini, Cyprus

    Get PDF
    The thesis aims to develop a 3D digital approach to the stylistic and typo-technological study of coroplastic, focusing on small figurines. The case study to test the method is a sample of terracotta statuettes from an assemblage of approximately 2000 statues and figurines found at the beginning of the 20th century in a rural open-air sanctuary at Ayia Irini (Cyprus) by the archaeologists of the Swedish Cyprus Expedition. The excavators identified continuity of worship at the sanctuary from the Late Cypriot III (circa 1200 BC) to the end of the Cypro-Archaic II period (ca. 475 BC). They attributed the small figurines to the Cypro-Archaic I-II. Although the excavation was one of the first performed through the newly established stratigraphic method, the archaeologists studied the site and its material following a traditional, merely qualitative approach. Theanalysis of the published results identified a classification of the material with no-clear-cut criteria, and their overlap between types highlights ambiguities in creating groups and classes. Similarly, stratigraphic arguments and different opinions among archaeologists highlight the need for revising. Moreover, pastlegislation allowed the excavators to export half of the excavated antiquities, creating a dispersion of the assemblage. Today, the assemblage is still partly exhibited at the Cyprus Museum in Nicosia and in four different museums in Sweden. Such a setting prevents to study, analyse and interpret the assemblageholistically. This research proposes a 3D chaîne opératoire methodology to study the collection’s small terracotta figurines, aiming to understand the context’s function and social role as reflected by the classification obtained with the 3D digital approach. The integration proposed in this research of traditional archaeological studies, and computer-assisted investigation based on quantitative criteria, identified and defined with 3D measurements and analytical investigations, is adopted as a solution to the biases of a solely qualitative approach. The 3D geometric analysis of the figurines focuses on the objects’ shape and components, mode of manufacture, level of expertise, specialisation or skills of the craftsman and production techniques. The analysis leads to the creation of classes of artefacts which allow archaeologists to formulate hypotheses on the production process, identify a common production (e.g., same hand, same workshop) and establish a relative chronological sequence. 3D reconstruction of the excavation’s area contributes to the virtual re-unification of the assemblage for its holistic study, the relative chronological dating of the figurines and the interpretation of their social and ritual purposes. The results obtained from the selected sample prove the efficacy of the proposed 3D approach and support the expansion of the analysis to the whole assemblage, and possibly initiate quantitative and systematic studies on Cypriot coroplastic production

    Dimensions of Linguistic Space: Variation – Multilingualism Conceptualisations Dimensionen des sprachlichen Raums: Variation – Mehrsprachigkeit – Konzeptualisierung

    Get PDF
    This volume focuses on the use and structure of the German language in Austria. In addition, the aim of the book is to compare the linguistic conditions in Austria with those in other German speaking countries. The 20 articles present current findings from the research fields of variation, contact and perception. Der Band widmet sich schwerpunktmäßig der Verwendung und Struktur der deutschen Sprache in Österreich. Ziel des Sammelbandes ist es außerdem, die sprachlichen Verhältnisse in Österreich mit denjenigen in anderen deutschsprachigen Ländern zu vergleichen. In 20 Beiträgen werden daher aktuelle Forschungsergebnisse aus den Forschungsbereichen Variation, Kontakt und Perzeption vorgestellt
    corecore