5 research outputs found
Multilingual enrichment of disease biomedical ontologies
International audienceTranslating biomedical ontologies is an important challenge, but doing it manually requires much time and money. We study the possibility to use open-source knowledge bases to translate biomedical ontologies. We focus on two aspects: coverage and quality. We look at the coverage of two biomedical ontologies focusing on diseases with respect to Wikidata for 9 European languages (Czech, Dutch, English, French, German, Italian, Polish, Portuguese and Spanish) for both ontologies, plus Arabic, Chinese and Russian for the second one. We first use direct links between Wikidata and the studied ontologies and then use second-order links by going through other intermediate ontologies. We then compare the quality of the translations obtained thanks to Wikidata with a commercial machine translation tool, here Google Cloud Translation
AMU-EURANOVA at CASE 2021 Task 1: Assessing the stability of multilingual BERT
International audienceThis paper explains our participation in task 1 of the CASE 2021 shared task. This task is about multilingual event extraction from news. We focused on sub-task 4, event information extraction. This sub-task has a small training dataset and we fine-tuned a multilingual BERT to solve this sub-task. We studied the instability problem on the dataset and tried to mitigate it
Pruning Random Forest with Orthogonal Matching Trees
In this paper we propose a new method to reduce the size of Breiman's Random Forests. Given a Random Forest and a target size, our algorithm builds a linear combination of trees which minimizes the training error. Selected trees, as well as weights of the linear combination are obtained by mean of the Orthogonal Matching Pursuit algorithm. We test our method on many public benchmark datasets both on regression and binary classification and we compare it to other pruning techniques. Experiments show that our technique performs significantly better or equally good on many datasets 1. We also discuss the benefit and shortcoming of learning weights for the pruned forest which lead us to propose to use a non-negative constraint on the OMP weights for better empirical results