Search CORE

10 research outputs found

The Power of Character N-grams in Native Language Identification

Author: Bjerva Johannes
Blankers Bo
Kulmizev Artur
Nissim Malvina
Plank Barbara
van Noord Gerardus
Wieling Martijn
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

ARTS repository - University of Groningen

One Model to Rule them All:multitask and Multilingual Modelling for Lexical Analysis

Author: Bjerva Johannes
Publication venue: Rijksuniversiteit Groningen
Publication date: 01/01/2017
Field of study

ARTS repository - University of Groningen

Scaling Native Language Identification with Transformer Adapters

Author: Schneider Gerold
Uluslu Ahmet
Publication venue: Cornell University
Publication date: 17/12/2022
Field of study

Native language identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is useful for a variety of purposes including marketing, security and educational applications. NLI is usually framed as a multi-label classification task, where numerous designed features are combined to achieve state-of-the-art results. Recently deep generative approach based on transformer decoders (GPT-2) outperformed its counterparts and achieved the best results on the NLI benchmark datasets. We investigate this approach to determine the practical implications compared to traditional state-of-the-art NLI systems. We introduce transformer adapters to address memory limitations and improve training/inference speed to scale NLI applications for production

ZORA

How Good are Humans at Native Language Identification? A Case Study on Italian L2 writings

Author: Bosco Cristina
Corino Elisa
Di Nuovo Elisa
Publication venue: CEUR
Publication date: 01/01/2020
Field of study

In this paper we present a pilot study on human performance for the Native Language Identification task. We performed two tests aimed at exploring the human baseline for the task in which test takers had to identify the writers’ L1 relying only on scripts written in Italian by English, French, German and Spanish native speakers. Then, we conducted an error analysis considering the language background of both test takers and text writers

OpenEdition

Institutional Research Information System University of Turin

On the Development of Large Scale Corpus for Native Language Identification

Author: Hudson Thomas
Jaf Sardar
Publication venue: Linkoping University Electronic Press, Sweden
Publication date: 13/12/2018
Field of study

Sunderland University Institutional Repository

Improving Data Quality in Customer Relationship Management Systems: a method for cleaning personal information

Author: LEONELLI ELENA
Publication venue
Publication date: 17/04/2023
Field of study

openIn recent years, data literacy has become critical in areas such as marketing, sales, and more generally in businesses. These companies rely on software such as Customer Relationship Management (CRM) systems to derive useful information from the vast amount of data collected. However, lack of data quality undermines the effectiveness of this approach, as it directly impacts overall business performance. This thesis investigates the various issues and challenges related to data quality in CRM systems, focusing particularly on datasets with attributes such as first name, last name, and e-mail address. In addition, an algorithm for cleaning such datasets is proposed.In recent years, data literacy has become critical in areas such as marketing, sales, and more generally in businesses. These companies rely on software such as Customer Relationship Management (CRM) systems to derive useful information from the vast amount of data collected. However, lack of data quality undermines the effectiveness of this approach, as it directly impacts overall business performance. This thesis investigates the various issues and challenges related to data quality in CRM systems, focusing particularly on datasets with attributes such as first name, last name, and e-mail address. In addition, an algorithm for cleaning such datasets is proposed

Padua Thesis and Dissertation Archive

One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis

Author: Bjerva Johannes
Publication venue
Publication date: 01/01/2017
Field of study

When learning a new skill, you take advantage of your preexisting skills and knowledge. For instance, if you are a skilled violinist, you will likely have an easier time learning to play cello. Similarly, when learning a new language you take advantage of the languages you already speak. For instance, if your native language is Norwegian and you decide to learn Dutch, the lexical overlap between these two languages will likely benefit your rate of language acquisition. This thesis deals with the intersection of learning multiple tasks and learning multiple languages in the context of Natural Language Processing (NLP), which can be defined as the study of computational processing of human language. Although these two types of learning may seem different on the surface, we will see that they share many similarities. The traditional approach in NLP is to consider a single task for a single language at a time. However, recent advances allow for broadening this approach, by considering data for multiple tasks and languages simultaneously. This is an important approach to explore further as the key to improving the reliability of NLP, especially for low-resource languages, is to take advantage of all relevant data whenever possible. In doing so, the hope is that in the long term, low-resource languages can benefit from the advances made in NLP which are currently to a large extent reserved for high-resource languages. This, in turn, may then have positive consequences for, e.g., language preservation, as speakers of minority languages will have a lower degree of pressure to using high-resource languages. In the short term, answering the specific research questions posed should be of use to NLP researchers working towards the same goal.Comment: PhD thesis, University of Groninge

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

The Power of Character N-grams in Native Language Identification

Author: Bjerva Johannes
Blankers Bo
Kulmizev Artur
Nissim Malvina
Plank Barbara
van Noord Gerardus
Wieling Martijn
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Author
Publication venue: 'OpenEdition'
Publication date: 01/07/2022
Field of study

On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

Directory of Open Access Books (DOAB)