Search CORE

27,777 research outputs found

Corpora and evaluation tools for multilingual named entity grammar development

Author: Bering Christian
Droźdźyński Witold
Erbach Gregor
Guasch Clara
Homola Petr
Krieger Hans-Ulrich
Lehmann Sabine
Li Hong
Piskorski Jakub
Schäfer Ulrich
Shimada Atsuko
Siegel Melanie
Xu Feiyu
Ziegler-Eisele Dorothee
Publication venue
Publication date: 14/12/2011
Field of study

We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats

Non-Standard and Minority Varieties as Community Languages in the UK: Towards a New Strategy for Language Maintenance

Author: Karatsareas P.
Karatsareas P.
Matras Y.
Matras Y.
Publication venue
Publication date: 01/01/2020
Field of study

Supplementary schools (also referred to as complementary or Saturday schools) play a key role in teaching community heritage languages. In this way they contribute to strengthening awareness of cultural identity and confidence among pupils of migrant and minority backgrounds. The diaspora setting poses a number of challenges: parents and pupils expect supplementary schools to provide instruction in formal aspects of the heritage languages (reading and writing, and ‘correct’ grammar), but also to help develop competence in using the language in everyday settings, not least in order to enable intergenerational communication. Where the formal language differs from non-standard speech varieties (such as regional dialects), gaps may emerge between expectations and delivery. Most schools do not equip teachers to address such issues because the traditional curricula (including textbooks and teacher training packages that are often imported from the origin countries) fail to take them into consideration. The paper draws on recent research by specialist sociolinguists working in various UK settings and on a discussion among researchers and practitioners that was hosted by the University of Westminster in April 2019, co-organised by the Multilingual Manchester research unit at the University of Manchester as part of the Multilingual Communities strand of the AHRC Open World Research Initiative consortium ‘Cross- Language Dynamics: Re-shaping Communities.’ Research has shown that teachers, parents and pupils attribute importance to the teaching of standard languages, not least as a way of gaining additional formal qualifications and increasing prospects of university admission and employment. However, pupils also show an interest in everyday speech varieties and often challenge the prevailing language ideologies that fail to recognise their importance in informal communication. Teachers tend to be aware of this tension but lack the training and resources to address it in the classroom. The workshop findings suggest that failure to take non-standard speech varieties into consideration can discourage pupils from attending supplementary schools and so it also risks having an adverse effect on the transmission of standard heritage languages. Pupils’ motivation can be boosted if they are offered more tools and opportunities to communicate in everyday speech varieties. To that end, non-standard varieties must be valorised and teachers should be equipped with the skills to address language variation and pupils’ multilingual repertoires and to promote them as valuable communicative resources. The paper recommends that supplementary schools should explore ways to take into account pupils’ multilingualism and use of non-standard varieties. Curricula should be adjusted to recognise non- standard varieties as valuable resources while continuing to teach the formal (standard) varieties. Teacher training modules should be designed that take pupils’ multilingual repertoires into account and equip teachers to understand and address sociolinguistic issues such as structural variation, multilingualism and language ideologies. The paper also recommends public engagement to address the inequality that underpins the use of the terms ‘community’ versus ‘modern languages’, and calls for collaboration between mainstream (statutory) schools and supplementary schools when it comes to celebrating diversity in their pupils’ backgrounds. Academics should play a greater role in providing advice, support and training to practitioners. They should work with practitioners and stakeholders to raise public awareness of the contribution that supplementary schools make and to develop policies and pedagogical approaches to support them

MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach

Author: Bryl Volha
Brümmer Martin
Consoli Sergio
Cucerzan Silviu
Devi Pooja
Erp Marieke Van
Ferreira Thiago Castro
Hoffart Johannes
Juan
Luo Gang
Nuzzolese Andrea-Giovanni
Röder Michael
Steinmetz Nadine
van Erp Marieke
Zhang Lei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/10/2017
Field of study

Entity linking has recently been the subject of a significant body of research. Currently, the best performing approaches rely on trained mono-lingual models. Porting these approaches to other languages is consequently a difficult endeavor as it requires corresponding training data and retraining of the models. We address this drawback by presenting a novel multilingual, knowledge-based agnostic and deterministic approach to entity linking, dubbed MAG. MAG is based on a combination of context-based retrieval on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data sets and in 7 languages. Our results show that the best approach trained on English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse on datasets in other languages. MAG, on the other hand, achieves state-of-the-art performance on English datasets and reaches a micro F-measure that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc

arXiv.org e-Print Archive

Non-native children speech recognition through transfer learning

Author: Falavigna Daniele
Giuliani Diego
Gretter Roberto
Matassoni Marco
Publication venue
Publication date: 01/01/2018
Field of study

This work deals with non-native children's speech and investigates both multi-task and transfer learning approaches to adapt a multi-language Deep Neural Network (DNN) to speakers, specifically children, learning a foreign language. The application scenario is characterized by young students learning English and German and reading sentences in these second-languages, as well as in their mother language. The paper analyzes and discusses techniques for training effective DNN-based acoustic models starting from children native speech and performing adaptation with limited non-native audio material. A multi-lingual model is adopted as baseline, where a common phonetic lexicon, defined in terms of the units of the International Phonetic Alphabet (IPA), is shared across the three languages at hand (Italian, German and English); DNN adaptation methods based on transfer learning are evaluated on significant non-native evaluation sets. Results show that the resulting non-native models allow a significant improvement with respect to a mono-lingual system adapted to speakers of the target language

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler