Search CORE

1,924 research outputs found

A road map for interoperable language resource metadata

Author: Calzolari Nicoletta
Choukri Khalid
Cieri Christopher
Ide Nancy
Langendoen D. Terence
Leveling Johannes
Palmer Martha
Pustejovsky James
Publication venue: European Language Resources Association
Publication date: 01/01/2010
Field of study

LRs remain expensive to create and thus rare relative to demand across languages and technology types. The accidental re-creation of an LR that already exists is a nearly unforgiveable waste of scarce resources that is unfortunately not so easy to avoid. The number of catalogs the HLT researcher must search, with their different formats, make it possible to overlook an existing resource. This paper sketches the sources of this problem and outlines a proposal to rectify along with a new vision of LR cataloging that will to facilitates the documentation and exploitation of a much wider range of LRs than previously considered

CiteSeerX

Stabilizing knowledge through standards - A perspective for the humanities

Author: Romary Laurent
Publication venue
Publication date: 23/06/2009
Field of study

It is usual to consider that standards generate mixed feelings among scientists. They are often seen as not really reflecting the state of the art in a given domain and a hindrance to scientific creativity. Still, scientists should theoretically be at the best place to bring their expertise into standard developments, being even more neutral on issues that may typically be related to competing industrial interests. Even if it could be thought of as even more complex to think about developping standards in the humanities, we will show how this can be made feasible through the experience gained both within the Text Encoding Initiative consortium and the International Organisation for Standardisation. By taking the specific case of lexical resources, we will try to show how this brings about new ideas for designing future research infrastructures in the human and social sciences

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Best practices in the creation, archiving and dissemination of speech corpora at the Language Archive

Author: Broeder D.
Drude S.
Sloetjes H.
Trilsbeek P.
Publication venue
Publication date: 01/01/2014
Field of study

UFRA: a UIMA-based Approach to Federated Language Resource Architecture

Author: Bartolini Roberto
Calzolari Nicoletta
Caselli Tommaso
Del Gratta Riccardo
Monachini Monica
Soria Claudia
Publication venue: European Language Resources Association (ELRA)
Publication date
Field of study

In this paper we address the issue of developing an interoperable infrastructure for language resources and technologies. In our approach, called UFRA, we extend the Federate Database Architecture System adding typical functionalities caming from UIMA. In this way, we capitalize the advantages of a federated architecture, such as autonomy, heterogeneity and distribution of components, monitored by a central authority responsible for checking both the integration of components and user rights on performing different tasks. We use the UIMA approach to manage and define one common front-end, enabling users and clients to query, retrieve and use language resources and technologies. The purpose of this paper is to show how UIMA leads from a Federated Database Architecture to a Federated Resource Architecture, adding to a registry of available components both static resources such as lexicons and corpora and dynamic ones such as tools and general purpose language technologies. At the end of the paper, we present a case-study that adopts this framework to integrate the SIMPLE lexicon and TIMEML annotation guidelines to tag natural language texts

Knowledge Organization Systems (KOS) in the Semantic Web: A Multi-Dimensional Review

Author: Mayr Philipp
Zeng Marcia Lei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/01/2018
Field of study

Since the Simple Knowledge Organization System (SKOS) specification and its SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009 a significant number of conventional knowledge organization systems (KOS) (including thesauri, classification schemes, name authorities, and lists of codes and terms, produced before the arrival of the ontology-wave) have made their journeys to join the Semantic Web mainstream. This paper uses "LOD KOS" as an umbrella term to refer to all of the value vocabularies and lightweight ontologies within the Semantic Web framework. The paper provides an overview of what the LOD KOS movement has brought to various communities and users. These are not limited to the colonies of the value vocabulary constructors and providers, nor the catalogers and indexers who have a long history of applying the vocabularies to their products. The LOD dataset producers and LOD service providers, the information architects and interface designers, and researchers in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper examines a set of the collected cases (experimental or in real applications) and aims to find the usages of LOD KOS in order to share the practices and ideas among communities and users. Through the viewpoints of a number of different user groups, the functions of LOD KOS are examined from multiple dimensions. This paper focuses on the LOD dataset producers, vocabulary producers, and researchers (as end-users of KOS).Comment: 31 pages, 12 figures, accepted paper in International Journal on Digital Librarie

arXiv.org e-Print Archive

HANDLING MULTILINGUAL CONTENT IN DIGITAL MEDIA: A CRITICAL ANALYSIS

Author
Publication venue
Publication date: 01/01/2006
Field of study

This document expresses and analyzes the need to define a generic method for representing multilingual information in multimedia data. It describes the basic requirements that would bear upon such representations and establishes the potential link with ISO committee TC 37/SC 4 (Language Resource Management) and with XMT (eXtended MPEG-4 Textual format)

Towards FAIRification of sensitive and fragmented rare disease patient data: challenges and solutions in European reference network registries

Author: Abaza H.
Benis N.
Bernabé C.H.
dos Santos Vieira B.
Et al.
Zhang S.
Publication venue: BioMed Central
Publication date: 30/03/2023
Field of study

Introduction: Rare disease patient data are typically sensitive, present in multiple registries controlled by different custodians, and non-interoperable. Making these data Findable, Accessible, Interoperable, and Reusable (FAIR) for humans and machines at source enables federated discovery and analysis across data custodians. This facilitates accurate diagnosis, optimal clinical management, and personalised treatments. In Europe, twenty-four European Reference Networks (ERNs) work on rare disease registries in different clinical domains. The process and the implementation choices for making data FAIR (‘FAIRification’) differ among ERN registries. For example, registries use different software systems and are subject to different legal regulations. To support the ERNs in making informed decisions and to harmonise FAIRification, the FAIRification steward team was established to work as liaisons between ERNs and researchers from the European Joint Programme on Rare Diseases. Results: The FAIRification steward team inventoried the FAIRification challenges of the ERN registries and proposed solutions collectively with involved stakeholders to address them. Ninety-eight FAIRification challenges from 24 ERNs’ registries were collected and categorised into “training” (31), “community” (9), “modelling” (12), “implementation” (26), and “legal” (20). After curating and aggregating highly similar challenges, 41 unique FAIRification challenges remained. The two categories with the most challenges were “training” (15) and “implementation” (9), followed by “community” (7), and then “modelling” (5) and “legal” (5). To address all challenges, eleven types of solutions were proposed. Among them, the provision of guidelines and the organisation of training activities resolved the “training” challenges, which ranged from less-technical “coffee-rounds” to technical workshops, from informal FAIR Games to formal hackathons. Obtaining implementation support from technical experts was the solution type for tackling the “implementation” challenges. Conclusion: This work shows that a dedicated team of FAIR data stewards is an asset for harmonising the various processes of making data FAIR in a large organisation with multiple stakeholders. Additionally, multi-levelled training activities are required to accommodate the diverse needs of the ERNs. Finally, the lessons learned from the experience of the FAIRification steward team described in this paper may help to increase FAIR awareness and provide insights into FAIRification challenges and solutions of rare disease registries