29 research outputs found
User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis
This paper reports on progress integrating the speech recognition toolkit ESPnet into Elpis,a web front-end originally designed to provide access to the Kaldi automatic speech recognition toolkit. The goal of this work is to makeend-to-end speech recognition models avail-able to language workers via a user-friendlygraphical interface. Encouraging results are reported on (i) development of an ESPnet recipe for use in Elpis, with preliminary resultson data sets previously used for training acoustic models with the Persephone toolkit alongwith a new data set that had not previously been used in speech recognition, and (ii) in-corporating ESPnet into Elpis along with UIe nhancements and a CUDA-supported Docker file
Ouvrir aux linguistes « de terrain » un accÚs à la transcription automatique
Le traitement automatique de la parole (TAP) commence deÌsormais aÌ reÌaliser son fort potentiel pour les taÌches urgentes de description de la diversiteÌ linguistique mondiale (en deÌclin rapide). Lâobjectif du travail deÌcrit ici consiste aÌ mettre aÌ la porteÌe des praticiens de la linguistique « de terrain » (linguistes et collaborateurs) des outils de transcription automatique aÌ la pointe des avanceÌes technologiques. Une interface graphique conviviale, Elpis, donne acceÌs aÌ Kaldi et ESPnet. Les reÌsultats sont particulieÌrement encourageants. Dâune part, la mise au point dâune recette ESPnet aÌ utiliser dans Elpis donne dâexcellents reÌsultats, aussi bien sur deux jeux de donneÌes preÌceÌdemment utiliseÌs pour entraiÌner des modeÌles acoustiques avec la boiÌte aÌ outils Persephone quâavec un nouveau jeu de donneÌes (langue japhug). Dâautre part, lâincorporation dâESPnet dans Elpis sâaccompagne dâameÌliorations de lâinterface utilisateur, dâune installation faciliteÌe par conteneurisation (Docker), ainsi que de lâutilisation de processeurs graphiques (CUDA), ce qui acceÌleÌre lâentraiÌnement des modeÌles
Intégration d'un systÚme de reconnaissance neuronale des phonÚmes et d'un modÚle de langue simple : une chaßne de traitement pour les scénarios à faibles ressources
International audienceRecently, several works have shown that fine-tuning a multilingual model of speech representation (typically XLS-R) with very small amounts of annotated data allows for the development of phonemic transcription systems of sufficient quality to help field linguists in their efforts to document the languages of the world. In this work, we explain how the quality of these systems can be improved by a very simple method, namely integrating them with a language model. Our experiments on an endangered language, Japhug (Trans-Himalayan/Tibeto-Burman), show that this approach can significantly reduce the WER, reaching the stage of automatic recognition of entire words.Des travaux rĂ©cents montrent que la spĂ©cialisation (*fine-tuning*) d'un modĂšle multilingue de reprĂ©sentation de la parole (tel que XLS-R) au moyen de trĂšs petites quantitĂ©s de donnĂ©es annotĂ©es permet d'obtenir des systĂšmes de transcription phonĂ©mique de qualitĂ© suffisante pour ĂȘtre utile aux linguistes de terrain dans leur entreprise de documentation des langues du monde. Dans ce travail, nous exposons une mĂ©thode trĂšs simple qui permet d'amĂ©liorer la qualitĂ© de ces systĂšmes : leur intĂ©gration avec un modĂšle de langue. Nos expĂ©riences sur une langue menacĂ©e, le japhug (famille trans-himalayenne/tibĂ©to-birmane), montrent que cette approche peut rĂ©duire significativement le taux d'erreur sur les mots (WER: *Word Error Rate*), et mener au stade de la reconnaissance automatique de mots entiers
Vers des ressources électroniques interconnectées : Lexica, les dictionnaires de la collection Pangloss
International audienceLa prĂ©sente communication expose lâĂ©tat dâavancement de rĂ©alisation de dictionnaires en ligne, Ă©tape dans lâentreprise de long terme qui consiste Ă tirer parti des nouvelles technologies pour relier entre elles les rĂ©alisations des linguistes dits "de terrain": grammaires, dictionnaires, et recueils de textes. Demain, dictionnaires et grammaires pourront non seulement ĂȘtre interconnectĂ©s, mais aussi liĂ©s aux textes qui forment le cĆur des donnĂ©es linguistiques, ainsi quâaux enregistrements audio et vidĂ©o de parole spontanĂ©e. Plus que de fixer une langue au moyen de lâimprimĂ©, il sâagit dĂ©sormais de lâoffrir Ă des modes nouveaux de navigation, en exploitant tout le potentiel de corpus en ligne, y compris par des traitements statistiques
Deux corpus audio transcrits de langues rares (japhug et na) normalisés en vue d'expériences en traitement du signal
International audienceTwo audio corpora of minority languages of China (Japhug and Na), with transcriptions, are proposed as reference data sets for experiments in Natural Language Processing. The data, collected and transcribed in the course of immersion fieldwork, amount to a total of 1,907 minutes in Japhug and 209 minutes in Na. By making them available in an easily accessible and usable form, we hope to facilitate the development and deployment of state-of-the-art NLP tools for the full range of human languages. We present a tool for assembling datasets from the Pangloss Collection (an open archive) in a way that ensures full reproducibility of experiments conducted on these data.Deux corpus audio transcrits de langues « rares » (langues minoritaires de Chine : japhug et na) sont proposĂ©s comme corpus de rĂ©fĂ©rence pour des expĂ©riences en traitement automatique des langues. Les donnĂ©es, collectĂ©es et transcrites au fil d'enquĂȘtes de terrain en immersion, s'Ă©lĂšvent Ă un total de 1907 minutes d'audio transcrit en japhug et de 209 minutes en na. Nous dĂ©crivons les traitements effectuĂ©s pour les mettre Ă disposition sous une forme aisĂ©ment accessible et utilisable, et prĂ©sentons un outil qui permet d'assembler divers jeux de donnĂ©es de la collection Pangloss (archive ouverte de langues rares) en assurant la reproductibilitĂ© des expĂ©riences menĂ©es sur ces donnĂ©es
Spécialisation de modÚles neuronaux pour la transcription phonémique : premiers pas vers la reconnaissance de mots pour les langues rares
International audienceWe describe the latest results we have obtained in the development of NLP (Natural Language Processing) tools to reduce the transcription and annotation workload of field linguists, as part of workflows to document and describe the world's languages. We show how a new deep learning approach based on the fine-tuning of a generic representation model allows to significantly improve the quality of automatic phonemic transcription, and, more significantly, to take a first step towards automatic word recognition for low-resource languages.Nous décrivons les résultats les plus récents que nous avons obtenus dans le cadre du développement d'outils de Traitement Automatique des Langues (TAL) pour réduire l'effort de transcription et d'annotation que doivent fournir les linguistes « de terrain » au fil de leur travail de documentation et description de langues rares. En particulier, nous montrons comment une nouvelle approche neuronale fondée sur la spécialisation d'un modÚle de représentation générique permet d'améliorer significativement la qualité de la transcription phonémique automatique, et surtout d'envisager la reconnaissance automatique de mots, approchant ainsi du stade de la reconnaissance automatique de la parole au sens plein du terme
Fine-tuning pre-trained models for Automatic Speech Recognition: experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)
International audienceThis is a report on results obtained in the development of speech recognition tools intended to support linguistic documentation efforts. The test case is an extensive fieldwork corpus of Japhug, an endangered language of the Trans-Himalayan (Sino-Tibetan) family. The goal is to reduce the transcription workload of field linguists. The method used is a deep learning approach based on the language-specific tuning of a generic pre-trained representation model, XLS-R, using a Transformer architecture. We note difficulties in implementation, in terms of learning stability. But this approach brings significant improvements nonetheless. The quality of phonemic transcription is improved over earlier experiments; and most significantly, the new approach allows for reaching the stage of automatic word recognition. Subjective evaluation of the tool by the author of the training data confirms the usefulness of this approach
Fine-tuning pre-trained models for Automatic Speech Recognition: experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)
International audienceThis is a report on results obtained in the development of speech recognition tools intended to support linguistic documentation efforts. The test case is an extensive fieldwork corpus of Japhug, an endangered language of the Trans-Himalayan (Sino-Tibetan) family. The goal is to reduce the transcription workload of field linguists. The method used is a deep learning approach based on the language-specific tuning of a generic pre-trained representation model, XLS-R, using a Transformer architecture. We note difficulties in implementation, in terms of learning stability. But this approach brings significant improvements nonetheless. The quality of phonemic transcription is improved over earlier experiments; and most significantly, the new approach allows for reaching the stage of automatic word recognition. Subjective evaluation of the tool by the author of the training data confirms the usefulness of this approach
Les modÚles pré-entraßnés à l'épreuve des langues rares : expériences de reconnaissance de mots sur la langue japhug (sino-tibétain)
International audienceWe describe in this work the latest results obtained in interdisciplinary work to support "fundamental language documentation" through the use of speech recognition tools. Specifically, the focus is on the development of a speech recognition system for Japhug, an endangered minority language of China. The practical goal is to reduce the transcription workload of field linguists. We show how a new deep learning approach based on the language-specific tuning of a generic pre-trained representation model, XLS-R, using a Transformer architecture, significantly improves the quality of phonemic transcription, in a setting where only a few hours of annotated data are available. Most significantly, this method allows for reaching the stage of automatic word recognition. Nevertheless, we note difficulties in implementation, in terms of learning stability. The question of the evaluation of the tool by field linguists is also addressed.Nous dĂ©crivons dans ce travail des rĂ©sultats obtenus dans le cadre d'explorations interdisciplinaires visant Ă venir en appui aux linguistes « de terrain » au moyen d'outils de Reconnaissance Automatique de la Parole. SpĂ©cifiquement, nous nous focalisons sur le dĂ©veloppement d'un systĂšme de reconnaissance de la parole pour le japhug, langue rare de Chine. L'objectif consiste Ă rĂ©duire l'effort de transcription des linguistes « de terrain ». Nous montrons comment une nouvelle approche neuronale fondĂ©e sur la spĂ©cialisation d'un modĂšle de reprĂ©sentation gĂ©nĂ©rique prĂ©-entraĂźnĂ© multilingue XLS-R reposant sur une architecture de type Transformer permet d'amĂ©liorer significativement la qualitĂ© de la transcription phonĂ©mique dans le cas oĂč seules quelques heures de donnĂ©es annotĂ©es sont disponibles, et surtout de progresser jusqu'Ă la reconnaissance automatique de mots. Nous relevons nĂ©anmoins des difficultĂ©s de mise en oeuvre, en termes de stabilitĂ© de l'apprentissage. La question de l'Ă©valuation de l'outil par les linguistes de terrain est Ă©galement abordĂ©e
A Stromal Immune Module Correlated with the Response to Neoadjuvant Chemotherapy, Prognosis and Lymphocyte Infiltration in <i>HER2</i>-Positive Breast Carcinoma Is Inversely Correlated with Hormonal Pathways
<div><p>Introduction</p><p><i>HER2</i>-positive breast cancer (BC) is a heterogeneous group of aggressive breast cancers, the prognosis of which has greatly improved since the introduction of treatments targeting <i>HER2</i>. However, these tumors may display intrinsic or acquired resistance to treatment, and classifiers of <i>HER2</i>-positive tumors are required to improve the prediction of prognosis and to develop novel therapeutic interventions.</p><p>Methods</p><p>We analyzed 2893 primary human breast cancer samples from 21 publicly available datasets and developed a six-metagene signature on a training set of 448 <i>HER2</i>-positive BC. We then used external public datasets to assess the ability of these metagenes to predict the response to chemotherapy (Ignatiadis dataset), and prognosis (METABRIC dataset).</p><p>Results</p><p>We identified a six-metagene signature (138 genes) containing metagenes enriched in different gene ontologies. The gene clusters were named as follows: Immunity, Tumor suppressors/proliferation, Interferon, Signal transduction, Hormone/survival and Matrix clusters. In all datasets, the Immunity metagene was less strongly expressed in ER-positive than in ER-negative tumors, and was inversely correlated with the Hormonal/survival metagene. Within the signature, multivariate analyses showed that strong expression of the âImmunityâ metagene was associated with higher pCR rates after NAC (OR = 3.71[1.28â11.91], <i>p</i> = 0.019) than weak expression, and with a better prognosis in <i>HER2</i>-positive/ER-negative breast cancers (HR = 0.58 [0.36â0.94], <i>p</i> = 0.026). Immunity metagene expression was associated with the presence of tumor-infiltrating lymphocytes (TILs).</p><p>Conclusion</p><p>The identification of a predictive and prognostic immune module in <i>HER2</i>-positive BC confirms the need for clinical testing for immune checkpoint modulators and vaccines for this specific subtype. The inverse correlation between Immunity and hormone pathways opens research perspectives and deserves further investigation.</p></div