24 research outputs found

    Proposition d'un nouveau modÚle: vers plus de transparence et d'indépendance des agences de notation souveraine

    Get PDF
    Suite Ă  la crise des subprimes et de la dette grecque, les marchĂ©s obligataires ont complĂštement Ă©tĂ© bouleversĂ©s. De nos jours, une grande partie du globe est surendettĂ©e laissant une faible marge de manoeuvre aux politiques budgĂ©taires des pays concernĂ©s. C’est en empruntant autant que ces derniers sont devenus tributaires du mĂ©canisme de l’offre et de la demande qui fixe les taux d’intĂ©rĂȘt de leurs emprunts. Le risque de crĂ©dit des États souverains notĂ© par les agences joue donc un rĂŽle majeur dans ce processus puisqu’un pays risquĂ© paiera des intĂ©rĂȘts plus Ă©levĂ©s qu’une Ă©conomie en bonne santĂ©. Ce travail propose une approche analytique du systĂšme des agences de notation. AprĂšs une initiation du cadre des agences, nous conviendrons que de nombreuses incohĂ©rences subsistent dans ce modĂšle. PremiĂšrement, les rĂ©glementations en place aujourd’hui font des agences un acteur essentiel du systĂšme en place ayant un pouvoir considĂ©rable sur les taux de financement des États. Ensuite, la situation du marchĂ© actuel rĂ©partit plus de 95% du marchĂ© entre seulement trois agences dont le premier objectif est de faire du profit. Nous conviendrons finalement que la prĂ©sente organisation laisse trop de place Ă  des possibles conflits d’intĂ©rĂȘts au sein des processus et que la communication des agences de notation est en gĂ©nĂ©ral trĂšs opaque quand elle n’est pas approximative voire floue. Partant du principe que les taux d’intĂ©rĂȘt que paient les entitĂ©s souveraines a trait au domaine public et que les agences de notation ont un impact sur ces derniers, nous proposons dans ce travail quatre recommandations visant Ă  rendre la notation financiĂšre plus sĂ»re, plus objective et moins volatile. Nous recommandons ainsi en premier lieu la mise en place d’un monopole avec comme seul et unique acteur une agence de notation objective, neutre et indĂ©pendante. Ensuite, nous appuyons la nĂ©cessitĂ© de rendre toute activitĂ© de notation souveraine complĂštement transparente. Pour finir, nous pensons qu’il serait judicieux de prendre Ă©galement en compte des substituts d’indicateurs de risque de crĂ©dit ainsi que d’adapter le barĂšme actuel des notes pour rendre la notation plus «lisse» via un barĂšme chiffrĂ© Ă  incrĂ©mentation continue

    Named entity recognition in chemical patents using ensemble of contextual language models

    Full text link
    Chemical patent documents describe a broad range of applications holding key reaction and compound information, such as chemical structure, reaction formulas, and molecular properties. These informational entities should be first identified in text passages to be utilized in downstream tasks. Text mining provides means to extract relevant information from chemical patents through information extraction techniques. As part of the Information Extraction task of the Cheminformatics Elsevier Melbourne University challenge, in this work we study the effectiveness of contextualized language models to extract reaction information in chemical patents. We assess transformer architectures trained on a generic and specialised corpora to propose a new ensemble model. Our best model, based on a majority ensemble approach, achieves an exact F1-score of 92.30% and a relaxed F1-score of 96.24%. The results show that ensemble of contextualized language models can provide an effective method to extract information from chemical patents

    Multilingual RECIST classification of radiology reports using supervised learning.

    Get PDF
    OBJECTIVES The objective of this study is the exploration of Artificial Intelligence and Natural Language Processing techniques to support the automatic assignment of the four Response Evaluation Criteria in Solid Tumors (RECIST) scales based on radiology reports. We also aim at evaluating how languages and institutional specificities of Swiss teaching hospitals are likely to affect the quality of the classification in French and German languages. METHODS In our approach, 7 machine learning methods were evaluated to establish a strong baseline. Then, robust models were built, fine-tuned according to the language (French and German), and compared with the expert annotation. RESULTS The best strategies yield average F1-scores of 90% and 86% respectively for the 2-classes (Progressive/Non-progressive) and the 4-classes (Progressive Disease, Stable Disease, Partial Response, Complete Response) RECIST classification tasks. CONCLUSIONS These results are competitive with the manual labeling as measured by Matthew's correlation coefficient and Cohen's Kappa (79% and 76%). On this basis, we confirm the capacity of specific models to generalize on new unseen data and we assess the impact of using Pre-trained Language Models (PLMs) on the accuracy of the classifiers

    Algorithmic methods to explore the automation of the appraisal of structured and unstructured digital data

    No full text
    This paper aims to describe an interdisciplinary and innovative research conducted in Switzerland, at the Geneva School of Business Administration HES-SO and supported by the State Archives of Neuchñtel (Office des archives de l'État de Neuchñtel, OAEN). The problem to be addressed is one of the most classical ones: how to extract and discriminate relevant data in a huge amount of diversified and complex data record formats and contents. The goal of this study is to provide a framework and a proof of concept for a software that helps taking defensible decisions on the retention and disposal of records and data proposed to the OAEN. For this purpose, the authors designed two axes: the archival axis, to propose archival metrics for the appraisal of structured and unstructured data, and the data mining axis to propose algorithmic methods as complementary or/and additional metrics for the appraisal process

    SIB text mining at TREC 2020 deep learning track

    No full text
    This second campaign of the TREC Deep Learning Track was an opportunity for us to experiment with deep neural language models reranking techniques in a realistic use case. This year’s tasks were the same as the previous edition: (1) building a reranking system and (2) building an end-to-end retrieval system. Both tasks could be completed on both a document and a passage collection. In this paper, we describe how we coupled Anserini’s information retrieval toolkit with a BERT-based classifier to build a state-of-the-art end-to-end retrieval system. Our only submission which is based on a RoBERTa large pretrained model achieves for (1)a ncdg@10 of .6558 and .6295 for passages and documents respectively and for (2) a ndcg@10 of .6614 and .6404 for passages and documents respectively

    Ensemble of deep masked language models for effective named entity recognition in health and life science corpora

    No full text
    The health and life science domains are well known for their wealth of named entities found in large free text corpora, such as scientific literature and electronic health records. To unlock the value of such corpora, named entity recognition (NER) methods are proposed. Inspired by the success of transformer-based pretrained models for NER, we assess how individual and ensemble of deep masked language models perform across corpora of different health and life science domains—biology, chemistry, and medicine—available in different languages—English and French. Individual deep masked language models, pretrained on external corpora, are fined-tuned on task-specific domain and language corpora and ensembled using classical majority voting strategies. Experiments show statistically significant improvement of the ensemble models over an individual BERTbased baseline model, with an overall best performance of 77% macro F1-score. We further perform a detailed analysis of the ensemble results and show how their effectiveness changes according to entity properties, such as length, corpus frequency, and annotation consistency. The results suggest that the ensembles of deep masked language models are an effective strategy for tackling NER across corpora from the health and life science domains

    Classification of hierarchical text using geometric deep learning:

    No full text
    We consider the hierarchical representation of documents as graphs and use geometric deep learning to classify them into different categories. While graph neural networks can efficiently handle the variable structure of hierarchical documents using the permutation invariant message passing operations, we show that we can gain extra performance improvements using our proposed selective graph pooling operation that arises from the fact that some parts of the hierarchy are invariable across different documents. We applied our model to classify clinical trial (CT) protocols into completed and terminated categories. We use bag-of-words based, as well as pre-trained transformer-based embeddings to featurize the graph nodes, achieving f1-scores ' 0:85 on a publicly available large scale CT registry of around 360K protocols. We further demonstrate how the selective pooling can add insights into the CT termination status prediction. We make the source code and dataset splits accessible

    BiTeM at WNUT 2020 shared task-1 ::named entity recognition over wet lab protocols using an ensemble of contextual language models

    No full text
    Recent improvements in machine-reading technologies attracted much attention to automation problems and their possibilities. In this context, WNUT 2020 introduces a Name Entity Recognition (NER) task based on wet laboratory procedures. In this paper, we present a 3-step method based on deep neural language models that reported the best overall exact match F1-score (77.99%) of the competition. By fine-tuning 10 times, 10 different pretrained language models, this work shows the advantage of having more models in an ensemble based on a majority of votes strategy. On top of that, having 100 different models allowed us to analyse the combinations of ensemble that demonstrated the impact of having multiple pretrained models versus fine-tuning a pretrained model multiple times

    UPCLASS ::a deep learning-based classifier for UniProtKB entry publications

    No full text
    In the UniProt Knowledgebase (UniProtKB), publications providing evidence for a specific protein annotation entry are organized across different categories, such as function, interaction and expression, based on the type of data they contain. To provide a systematic way of categorizing computationally mapped bibliographies in UniProt, we investigate a convolutional neural network (CNN) model to classify publications with accession annotations according to UniProtKB categories. The main challenge of categorizing publications at the accession annotation level is that the same publication can be annotated with multiple proteins and thus be associated with different category sets according to the evidence provided for the protein. We propose a model that divides the document into parts containing and not containing evidence for the protein annotation. Then, we use these parts to create different feature sets for each accession and feed them to separate layers of the network. The CNN model achieved a micro F1-score of 0.72 and a macro F1-score of 0.62, outperforming baseline models based on logistic regression and support vector machine by up to 22 and 18 percentage points, respectively. We believe that such an approach could be used to systematically categorize the computationally mapped bibliography in UniProtKB, which represents a significant set of the publications, and help curators to decide whether a publication is relevant for further curation for a protein accession

    SIB Text Mining at TREC 2019 Deep Learning Track: Working Note

    No full text
    The TREC 2019 Deep Learning task aims at studying information retrieval in a large training data regime. It includes two tasks: the document ranking task (1) and the passage ranking task (2). Both of these tasks had a full ranking (a) and reranking (b) subtasks. The SIB Text Mining group participated at the full document ranking subtask (1a). In order to retrieve pertinent documents in the 3.2 million documents corpus, our strategy was two-fold. At first, we used a BM25 model to retrieve a subset of documents relevant to a query. We also tried to improve recall by using query expansion. The second step consisted in reranking the retrieved subset using an original model, so-called query2doc. This model, which has been designed to predict if a query-document pair was a good candidate to be ranked in position #1, was trained using the training dataset provided for the task. Our baseline, which is basically a BM25 ranking performed the best and achieve a MAP of 0.2892. Results of the query2doc run clearly indicates that the query2doc model could not learn any meaningful relationship. More precisely, to explain such a failure, we hypothesize that using documents returned by our baseline model as negative items confused our model. As future steps, it will be interesting to take into account features such as the document’s BM25 score as well as the number of times a document’s URL is mentioned in the corpus and use them along with learning to rank algorithms
    corecore