Search CORE

2 research outputs found

Towards the Automatic Processing of Language Registers: Semi-supervisedly Built Corpus and Classifier for French

Author: Ayats Hugo
Battistelli Delphine
Béchet Nicolas
Chevelu Jonathan
Fournier Benoît
Lecorvé Gwénolé
Mekki Jade
Publication venue: HAL CCSD
Publication date: 07/04/2019
Field of study

International audienceLanguage registers are a strongly perceptible characteristic of texts and speeches. However, they are still poorly studied in natural language processing. In this paper, we present a semi-supervised approach which jointly builds a corpus of texts labeled in registers and an associated classifier. This approach relies on a small initial seed of expert data. After massively retrieving web pages, it iteratively alternates the training of an intermediate classifier and the annotation of new texts to augment the labeled corpus. The approach is applied to the casual, neutral, and formal registers, leading to a 750M word corpus and a final neural classifier with an acceptable performance

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1