320 research outputs found
The strategic impact of META-NET on the regional, national and international level
This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative’s work throughout Europe in order to boost progress and innovation in our field.Postprint (published version
The strategic impact of META-NET on the regional, national and international level
This article provides an overview of the dissemination work carried out in META-NET from 2010 until 2015; we describe its impact on the regional, national and international level, mainly with regard to politics and the funding situation for LT topics. The article documents the initiative's work throughout Europe in order to boost progress and innovation in our field.Peer ReviewedPostprint (author's final draft
Vulnerability in acquisition, language impairments in Dutch: Creating a VALID data archive
The VALID Data Archive is an open multimedia data archive (under construction) with data from speakers suffering from language impairments. We report on a pilot project in the CLARIN-NL framework in which five data resources were curated. For all data sets concerned, written informed consent from the participants or their caretakers has been obtained. All materials were anonymized. The audio files were converted into wav (linear PCM) files and the transcriptions into CHAT or ELAN format. Research data that consisted of test, SPSS and Excel files were documented and converted into CSV files. All data sets obtained appropriate CMDI metadata files. A new CMDI metadata profile for this type of data resources was established and care was taken that ISOcat metadata categories were used to optimize interoperability. After curation all data are deposited at the Max Planck Institute for Psycholinguistics Nijmegen where persistent identifiers are linked to all resources. The content of the transcriptions in CHAT and plain text format can be searched with the TROVA search engin
CLARIN
The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
CLARIN. The infrastructure for language resources
CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future.
The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
Pretrained contextual representation models (Peters et al., 2018; Devlin et
al., 2018) have pushed forward the state-of-the-art on many NLP tasks. A new
release of BERT (Devlin, 2018) includes a model simultaneously pretrained on
104 languages with impressive performance for zero-shot cross-lingual transfer
on a natural language inference task. This paper explores the broader
cross-lingual potential of mBERT (multilingual) as a zero shot language
transfer model on 5 NLP tasks covering a total of 39 languages from various
language families: NLI, document classification, NER, POS tagging, and
dependency parsing. We compare mBERT with the best-published methods for
zero-shot cross-lingual transfer and find mBERT competitive on each task.
Additionally, we investigate the most effective strategy for utilizing mBERT in
this manner, determine to what extent mBERT generalizes away from language
specific features, and measure factors that influence cross-lingual transfer.Comment: EMNLP 2019 Camera Read
The latest development of the DELAD project for sharing corpora of speech disorders
Corpora of speech of individuals with communication disorders (CSD) are invaluable resources for education and research, but they are costly and hard to build and difficult to share for various reasons. DELAD, which means 'shared' in Swedish, is a project initiated by Professors Nicole Muller and Martin Ball in 2015 that aims to address this issue by establishing a platform for researchers to share datasets of speech disorders with interested audiences. To date four workshops have been held, where selected participants, covering various expertise including researchers in clinical phonetics and linguistics, speech and language therapy, infrastructure specialists, and ethics and legal specialists, participated to discuss relevant issues in setting up such an archive. Positive and steady progress has been made since 2015, including refurbishing the DELAD website (http://delad.net/) with information and application forms for researchers to join and share their datasets and linking with the CLARIN K-Centre for Atypical Communication Expertise (https://ace.ruhosting.nl/) where CSD can be hosted and accessed through the CLARIN B-Centres, The Language Archive (https://tla.mpi.nl/tools/tla-tools/) and TalkBank (https://talkbank.org/). The latest workshop, which was funded by CLARIN (Common Language Resources and Technology Infrastructure) was held as an online event in January 2021 on topics including Data Protection Impact Assessments, reviewing changes in ethics perspectives in academia on sharing CSD, and voice conversion as a mean to pseudonomise speech. This paper reports the latest progress of DELAD and discusses the directions for further advance of the initiative, with information on how researchers can contribute to the repository.Peer reviewe
CLARIN
The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
- …