38 research outputs found

    Legal Interoperability Issues in the Framework of the OpenMinTeD Project: A Methodological Overview

    Get PDF
    This paper is a first analysis of the legal interoperability issues in the framework of the OpenMinTeD (OMTD) project (www.openminted.eu), which aims to create an open, service-oriented e-Infrastructure for Text and Data Mining (TDM) of scientific and scholarly content. The paper offers an overview into the methods for achieving such interoperability

    A Legal Perspective on Training Models for Natural Language Processing

    Get PDF
    A significant concern in processing natural language data is the often unclear legal status of the input and output data/resources. In this paper, we investigate this problem by discussing a typical activity in Natural Language Processing: the training of a machine learning model from an annotated corpus. We examine which legal rules apply at relevant steps and how they affect the legal status of the results, especially in terms of copyright and copyright-related rights

    LexMeta model za leksičke resurse: teorija i primjena

    Get PDF
    This paper presents LexMeta, a metadata model for the description of lexical resources, such as dictionaries, word lists, glossaries, etc., to be used in language data catalogues mainly targeting the lexicographic and broader humanities communities but also users exploiting such resources in their research and applications. A comparative review of similar models is made in order to show the differences and commonalities with LexMeta. To enhance semantic interoperability and support the exchange of (meta)data across disciplinary and general catalogues, the most influential models for our purposes, namely FRBR (used in library catalogues) and META-SHARE (used for language resources), are selected as a base for the design of LexMeta. We discuss how these models are aligned and extended with new properties as required for the description of lexical resources. The formal representation of the model following the Linked Data paradigm aims to further enhance the semantic interoperability. The choice to implement it in two formats (as an RDF/OWL and as a Wikibase ontology) facilitates its adoption and hence its enrichment, yet poses challenges as to their synchronisation, which are addressed through automatic workflows. We conclude with ongoing and planned activities for the improvement of the model.Rad opisuje LexMeta, metapodatkovni model za opis leksičkih resursa kao što su rječnici, popisi riječi, glosari i dr., koji će se upotrebljavati u katalozima podataka namijenjenima leksikografskoj i široj humanističkoj zajednici, ali i korisnicima koji upotreblajvaju takve modele u istraživanjima i praktičnoj primjeni. U radu je dan usporedni pregled sličnih modela kako bi se pokazale razlike i sličnosti s LexMetom. Kako bi se poboljšala semantička interoperabilnost i podržala razmjena (meta) podataka između strukovnih i općih kataloga, kao temelj za dizajn LexMeta odabrani su najutjecajniji modeli, naime FRBR koji se upotrebljava u knjižničnim katalozima i META-SHARE koji se upotrebljava za jezične resurse. Rad donosi raspravu o tome kako su ti modeli usklađeni i prošireni novim značajkama potrebnima za opis leksičkih izvora. Formalni prikaz modela koji slijedi paradigmu povezanih podataka ima za cilj dodatno poboljšati semantičku interoperabilnost. Izbor da se implementira u dva formata (kao RDF/OWL i kao ontologija Wikibase) olakšava njegovo usvajanje, a time i obogaćivanje, ali i postavlja izazove koji se tiču sinkronizacije formata, koji se rješavaju automatskim tijekovima rada. Zaključujemo rad s opisom tekućih i planiranih aktivnosti na unapređenju modela

    Computational morphology with OntoLex-Morph

    Get PDF
    This paper describes the current status of the emerging OntoLex module for linguistic morphology. It serves as an update to the previous version of the vocabulary (Klimek et al. 2019). Whereas this earlier model was exclusively focusing on descriptive morphology and focused on applications in lexicography, we now present a novel part and a novel application of the vocabulary to applications in language technology, i.e., the rule-based generation of lexicons, introducing a dynamic component into OntoLex

    The LexMeta Metadata Model for Lexical Resources: Theoretical and Implementation Issues

    Get PDF
    The paper presents LexMeta, a metadata model catering for descriptions of human-readable and computational lexical resources included in library catalogues and repositories of language resources. We present the main concepts of the model, its implementation, and discuss current findings and future plans

    A Metadata Schema for the Description ofLanguage Resources (LRs)

    Get PDF
    This paper presents the metadata schema for describing language resources (LRs) currently under development for the needs of META-SHARE, an open distributed facility for the exchange and sharing of LRs. An essential ingredient in its setup is the existence of formal and standardized LR descriptions, cornerstone of the interoperability layer of any such initiative. The description of LRs is granular and abstractive, combining the taxonomy of LRs with an inventory of a structured set of descriptive elements, of which only a minimal subset is obligatory; the schema additionally proposes recommended and optional elements. Moreover, the schema includes a set of relations catering for the appropriate inter-linking of resources. The current paper presents the main principles and features of the metadata schema, focusing on the description of text corpora and lexical / conceptual resources

    Processing personal data without the consent of the data subject for the development and use of language resources

    Get PDF
    The development and use of language resources often involve the processing of personal data. The General Data Protection Regulation (GDPR) establishes an EU-wide framework for the processing of personal data for research purposes while at the same time allowing for some flexibility on the part of the Member States. The paper discusses the legal framework for language research following the entry into force of the GDPR. In the first section, we present some fundamental concepts of data protection relevant to language research. In the second section, the general framework of processing personal data for research purposes is discussed. In the last section, we focus on the models that certain EU Member States use to regulate data processing for research purposes.Peer reviewe

    OpenMinTeD: A Platform Facilitating Text Mining of Scholarly Content

    Get PDF
    The OpenMinTeD platform aims to bring full text Open Access scholarly content from a wide range of providers together with Text and Data Mining (TDM) tools from various Natural Language Processing frameworks and TDM developers in an integrated environment. In this way, it supports users who want to mine scientific literature with easy access to relevant content and allows running scalable TDM workflows in the cloud

    Documentation and User Manual of the META-SHARE Metadata Model

    Get PDF
    The current deliverable presents the META-SHARE metadata schema v1.0, as implemented in the META-SHARE XSD\u27s v1.0 released to (META-NET and PSP partners) in July 2011 for text corpora and lexical/conceptual resources and its supplement for audio corpora, tools and language descriptions (simplified/refactored version) as implemented in November. It is meant to act as a user manual, providing explanations on the model contents for LRs providers and LRs curators that wish to describe their resources in accordance to it. Work on the schema is ongoing and changes/updates to the model are constantly being made; where appropriate, some changes that are already under way are documented in this deliverable

    OntoLex-Morph: Morphology for the Web of Data

    Get PDF
    Purpose: OntoLex-Lemon is a widely used community standard for publishing lexical resources in machine-readable form, and is in fact the predominant RDF vocabulary for this purpose. With the growing popularity and increasing adoption of this model for applications in both language technology and lexicography, a number of new modules have been developed in the past year to complement the OntoLex core vocabulary and its lexicographic follow up, lexicog. In this paper, we describe the current status of the development of the OntoLex-Morph vocabulary
    corecore