Search CORE

24 research outputs found

Proposition d'un nouveau modèle: vers plus de transparence et d'indépendance des agences de notation souveraine

Author: Knafou Julien
Picca Alexis
Publication venue
Publication date: 01/02/2016
Field of study

Suite à la crise des subprimes et de la dette grecque, les marchés obligataires ont complètement été bouleversés. De nos jours, une grande partie du globe est surendettée laissant une faible marge de manoeuvre aux politiques budgétaires des pays concernés. C’est en empruntant autant que ces derniers sont devenus tributaires du mécanisme de l’offre et de la demande qui fixe les taux d’intérêt de leurs emprunts. Le risque de crédit des États souverains noté par les agences joue donc un rôle majeur dans ce processus puisqu’un pays risqué paiera des intérêts plus élevés qu’une économie en bonne santé. Ce travail propose une approche analytique du système des agences de notation. Après une initiation du cadre des agences, nous conviendrons que de nombreuses incohérences subsistent dans ce modèle. Premièrement, les réglementations en place aujourd’hui font des agences un acteur essentiel du système en place ayant un pouvoir considérable sur les taux de financement des États. Ensuite, la situation du marché actuel répartit plus de 95% du marché entre seulement trois agences dont le premier objectif est de faire du profit. Nous conviendrons finalement que la présente organisation laisse trop de place à des possibles conflits d’intérêts au sein des processus et que la communication des agences de notation est en général très opaque quand elle n’est pas approximative voire floue. Partant du principe que les taux d’intérêt que paient les entités souveraines a trait au domaine public et que les agences de notation ont un impact sur ces derniers, nous proposons dans ce travail quatre recommandations visant à rendre la notation financière plus sûre, plus objective et moins volatile. Nous recommandons ainsi en premier lieu la mise en place d’un monopole avec comme seul et unique acteur une agence de notation objective, neutre et indépendante. Ensuite, nous appuyons la nécessité de rendre toute activité de notation souveraine complètement transparente. Pour finir, nous pensons qu’il serait judicieux de prendre également en compte des substituts d’indicateurs de risque de crédit ainsi que d’adapter le barème actuel des notes pour rendre la notation plus «lisse» via un barème chiffré à incrémentation continue

RERO DOC Digital Library

Named entity recognition in chemical patents using ensemble of contextual language models

Author: Copara Jenny
Knafou Julien
Naderi Nona
Ruch Patrick
Teodoro Douglas
Publication venue
Publication date: 01/01/2020
Field of study

Chemical patent documents describe a broad range of applications holding key reaction and compound information, such as chemical structure, reaction formulas, and molecular properties. These informational entities should be first identified in text passages to be utilized in downstream tasks. Text mining provides means to extract relevant information from chemical patents through information extraction techniques. As part of the Information Extraction task of the Cheminformatics Elsevier Melbourne University challenge, in this work we study the effectiveness of contextualized language models to extract reaction information in chemical patents. We assess transformer architectures trained on a generic and specialised corpora to propose a new ensemble model. Our best model, based on a majority ensemble approach, achieves an exact F1-score of 92.30% and a relaxed F1-score of 96.24%. The results show that ensemble of contextualized language models can provide an effective method to extract information from chemical patents

arXiv.org e-Print Archive

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Archive ouverte UNIGE

Multilingual RECIST classification of radiology reports using supervised learning.

Author: Achermann Rita
Charrier Mélinda
Ehrsam Julien
Foufi Vasiliki
Gobeill Julien
Goldman Jean-Philippe
Gérard Camille L
Jäggli Christoph
Kiessling Michael K
Knafou Julien
Leichtle Alexander
Lovis Christian
Michielin Olivier
Mottin Luc
Pradervand Sylvain
Ruch Patrick
Schwenk Tanja
Tsantoulis Petros
Wicky Alexandre
Publication venue: Frontiers Media
Publication date: 01/01/2023
Field of study

OBJECTIVES The objective of this study is the exploration of Artificial Intelligence and Natural Language Processing techniques to support the automatic assignment of the four Response Evaluation Criteria in Solid Tumors (RECIST) scales based on radiology reports. We also aim at evaluating how languages and institutional specificities of Swiss teaching hospitals are likely to affect the quality of the classification in French and German languages. METHODS In our approach, 7 machine learning methods were evaluated to establish a strong baseline. Then, robust models were built, fine-tuned according to the language (French and German), and compared with the expert annotation. RESULTS The best strategies yield average F1-scores of 90% and 86% respectively for the 2-classes (Progressive/Non-progressive) and the 4-classes (Progressive Disease, Stable Disease, Partial Response, Complete Response) RECIST classification tasks. CONCLUSIONS These results are competitive with the manual labeling as measured by Matthew's correlation coefficient and Cohen's Kappa (79% and 76%). On this basis, we confirm the capacity of specific models to generalize on new unseen data and we assess the impact of using Pre-trained Language Models (PLMs) on the accuracy of the classifiers

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Serveur académique lausannois

Bern Open Repository and Information System (BORIS)

Algorithmic methods to explore the automation of the appraisal of structured and unstructured digital data

Author: Gaudinat Arnaud
Knafou Julien
Makhlouf Shabou Basma
Tièche Julien
Publication venue: 'Emerald'
Publication date: 08/07/2020
Field of study

This paper aims to describe an interdisciplinary and innovative research conducted in Switzerland, at the Geneva School of Business Administration HES-SO and supported by the State Archives of Neuchâtel (Office des archives de l'État de Neuchâtel, OAEN). The problem to be addressed is one of the most classical ones: how to extract and discriminate relevant data in a huge amount of diversified and complex data record formats and contents. The goal of this study is to provide a framework and a proof of concept for a software that helps taking defensible decisions on the retention and disposal of records and data proposed to the OAEN. For this purpose, the authors designed two axes: the archival axis, to propose archival metrics for the appraisal of structured and unstructured data, and the data mining axis to propose algorithmic methods as complementary or/and additional metrics for the appraisal process

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

SIB text mining at TREC 2020 deep learning track

Author: Ferdowsi Sohrab
Jeffryes Matthew
Knafou Julien
Ruch Patrick
Publication venue: Virtual conference, 16-20 November 2020
Publication date: 09/07/2021
Field of study

This second campaign of the TREC Deep Learning Track was an opportunity for us to experiment with deep neural language models reranking techniques in a realistic use case. This year’s tasks were the same as the previous edition: (1) building a reranking system and (2) building an end-to-end retrieval system. Both tasks could be completed on both a document and a passage collection. In this paper, we describe how we coupled Anserini’s information retrieval toolkit with a BERT-based classifier to build a state-of-the-art end-to-end retrieval system. Our only submission which is based on a RoBERTa large pretrained model achieves for (1)a ncdg@10 of .6558 and .6295 for passages and documents respectively and for (2) a ndcg@10 of .6614 and .6404 for passages and documents respectively

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Ensemble of deep masked language models for effective named entity recognition in health and life science corpora

Author: Copara Jenny
Knafou Julien
Naderi Nona
Ruch Patrick
Teodoro Douglas
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2021
Field of study

The health and life science domains are well known for their wealth of named entities found in large free text corpora, such as scientific literature and electronic health records. To unlock the value of such corpora, named entity recognition (NER) methods are proposed. Inspired by the success of transformer-based pretrained models for NER, we assess how individual and ensemble of deep masked language models perform across corpora of different health and life science domains—biology, chemistry, and medicine—available in different languages—English and French. Individual deep masked language models, pretrained on external corpora, are fined-tuned on task-specific domain and language corpora and ensembled using classical majority voting strategies. Experiments show statistically significant improvement of the ensemble models over an individual BERTbased baseline model, with an overall best performance of 77% macro F1-score. We further perform a detailed analysis of the ensemble results and show how their effectiveness changes according to entity properties, such as length, corpus frequency, and annotation consistency. The results suggest that the ensembles of deep masked language models are an effective strategy for tackling NER across corpora from the health and life science domains

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

PubMed Central

Archive ouverte UNIGE

Classification of hierarchical text using geometric deep learning:

Author: Amini Poorya
Borissov Nikolay
Ferdowsi Sohrab
Knafou Julien
Teodoro Douglas
Publication venue: Punta Cana, Dominican Republic, 7-11 November 2021
Publication date: 22/11/2021
Field of study

We consider the hierarchical representation of documents as graphs and use geometric deep learning to classify them into different categories. While graph neural networks can efficiently handle the variable structure of hierarchical documents using the permutation invariant message passing operations, we show that we can gain extra performance improvements using our proposed selective graph pooling operation that arises from the fact that some parts of the hierarchy are invariable across different documents. We applied our model to classify clinical trial (CT) protocols into completed and terminated categories. We use bag-of-words based, as well as pre-trained transformer-based embeddings to featurize the graph nodes, achieving f1-scores ' 0:85 on a publicly available large scale CT registry of around 360K protocols. We further demonstrate how the selective pooling can add insights into the CT termination status prediction. We make the source code and dataset splits accessible

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

BiTeM at WNUT 2020 shared task-1 ::named entity recognition over wet lab protocols using an ensemble of contextual language models

Author: Copara Jenny
Knafou Julien
Naderi Nona
Ruch Patrick
Teodoro Douglas
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 19/01/2021
Field of study

Recent improvements in machine-reading technologies attracted much attention to automation problems and their possibilities. In this context, WNUT 2020 introduces a Name Entity Recognition (NER) task based on wet laboratory procedures. In this paper, we present a 3-step method based on deep neural language models that reported the best overall exact match F1-score (77.99%) of the competition. By fine-tuning 10 times, 10 different pretrained language models, this work shows the advantage of having more models in an ensemble based on a majority of votes strategy. On top of that, having 100 different models allowed us to analyse the combinations of ensemble that demonstrated the impact of having multiple pretrained models versus fine-tuning a pretrained model multiple times

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

UPCLASS ::a deep learning-based classifier for UniProtKB entry publications

Author: Arighi Cecilia N.
Gobeill Julien
Knafou Julien
Naderi Nona
Pasche Emilie
Ruch Patrick
Teodoro Douglas
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2020
Field of study

In the UniProt Knowledgebase (UniProtKB), publications providing evidence for a specific protein annotation entry are organized across different categories, such as function, interaction and expression, based on the type of data they contain. To provide a systematic way of categorizing computationally mapped bibliographies in UniProt, we investigate a convolutional neural network (CNN) model to classify publications with accession annotations according to UniProtKB categories. The main challenge of categorizing publications at the accession annotation level is that the same publication can be annotated with multiple proteins and thus be associated with different category sets according to the evidence provided for the protein. We propose a model that divides the document into parts containing and not containing evidence for the protein annotation. Then, we use these parts to create different feature sets for each accession and feed them to separate layers of the network. The CNN model achieved a micro F1-score of 0.72 and a macro F1-score of 0.62, outperforming baseline models based on logistic regression and support vector machine by up to 22 and 18 percentage points, respectively. We believe that such an approach could be used to systematically categorize the computationally mapped bibliography in UniProtKB, which represents a significant set of the publications, and help curators to decide whether a publication is relevant for further curation for a protein accession

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Archive ouverte UNIGE

SIB Text Mining at TREC 2019 Deep Learning Track: Working Note

Author: Jeffryes Matt
Knafou Julien David Marc
Mottin Luc
Ruch Patrick
Teodoro Douglas
Publication venue: 'National Institute of Standards and Technology (NIST)'
Publication date: 01/01/2019
Field of study

The TREC 2019 Deep Learning task aims at studying information retrieval in a large training data regime. It includes two tasks: the document ranking task (1) and the passage ranking task (2). Both of these tasks had a full ranking (a) and reranking (b) subtasks. The SIB Text Mining group participated at the full document ranking subtask (1a). In order to retrieve pertinent documents in the 3.2 million documents corpus, our strategy was two-fold. At first, we used a BM25 model to retrieve a subset of documents relevant to a query. We also tried to improve recall by using query expansion. The second step consisted in reranking the retrieved subset using an original model, so-called query2doc. This model, which has been designed to predict if a query-document pair was a good candidate to be ranked in position #1, was trained using the training dataset provided for the task. Our baseline, which is basically a BM25 ranking performed the best and achieve a MAP of 0.2892. Results of the query2doc run clearly indicates that the query2doc model could not learn any meaningful relationship. More precisely, to explain such a failure, we hypothesize that using documents returned by our baseline model as negative items confused our model. As future steps, it will be interesting to take into account features such as the document’s BM25 score as well as the number of times a document’s URL is mentioned in the corpus and use them along with learning to rank algorithms

Archive ouverte UNIGE