8 research outputs found
SIB Text Mining at TREC 2019 Deep Learning Track: Working Note
The TREC 2019 Deep Learning task aims at studying information retrieval in a large training data regime. It includes two tasks: the document ranking task (1) and the passage ranking task (2). Both of these tasks had a full ranking (a) and reranking (b) subtasks. The SIB Text Mining group participated at the full document ranking subtask (1a). In order to retrieve pertinent documents in the 3.2 million documents corpus, our strategy was two-fold. At first, we used a BM25 model to retrieve a subset of documents relevant to a query. We also tried to improve recall by using query expansion. The second step consisted in reranking the retrieved subset using an original model, so-called query2doc. This model, which has been designed to predict if a query-document pair was a good candidate to be ranked in position #1, was trained using the training dataset provided for the task. Our baseline, which is basically a BM25 ranking performed the best and achieve a MAP of 0.2892. Results of the query2doc run clearly indicates that the query2doc model could not learn any meaningful relationship. More precisely, to explain such a failure, we hypothesize that using documents returned by our baseline model as negative items confused our model. As future steps, it will be interesting to take into account features such as the document’s BM25 score as well as the number of times a document’s URL is mentioned in the corpus and use them along with learning to rank algorithms
Classification of hierarchical text using geometric deep learning: the case of clinical trials corpus
We consider the hierarchical representation of documents as graphs and use geometric deep learning to classify them into different categories. While graph neural networks can efficiently handle the variable structure of hierarchical documents using the permutation invariant message passing operations, we show that we can gain extra performance improvements using our proposed selective graph pooling operation that arises from the fact that some parts of the hierarchy are invariable across different documents. We applied our model to classify clinical trial (CT) protocols into completed and terminated categories. We use bag-of-words based, as well as pre-trained transformer-based embeddings to featurize the graph nodes, achieving f1-scores around 0.85 on a publicly available large scale CT registry of around 360K protocols. We further demonstrate how the selective pooling can add insights into the CT termination status prediction. We make the source code and dataset splits accessible
BiTeM at WNUT 2020 Shared Task-1: Named Entity Recognition over Wet Lab Protocols using an Ensemble of Contextual Language Models
Recent improvements in machine-reading technologies attracted much attention to automation problems and their possibilities. In this context, WNUT 2020 introduces a Name Entity Recognition (NER) task based on wet laboratory procedures. In this paper, we present a 3-step method based on deep neural language models that reported the best overall exact match F1-score (77.99%) of the competition. By fine-tuning 10 times, 10 different pretrained language models, this work shows the advantage of having more models in an ensemble based on a majority of votes strategy. On top of that, having 100 different models allowed us to analyse the combinations of ensemble that demonstrated the impact of having multiple pretrained models versus fine-tuning a pretrained model multiple times
A Data-Driven Approach for Measuring the Severity of the Signs of Depression using Reddit Posts
In response to the CLEF eRisk 2019 shared task on measuring the severity of the signs of depression from threads of user submissions on social media, our team has developed a data-driven, ensemble model approach. Our system leverages word polarities, token extraction via mutual information, keyword expansion and semantic similarities for classifying Reddit posts according to the Beck’s Depression Inventory (BDI). Individual models were combined at the post level by majority voting. The approach achieved a baseline performance for the assessed metrics, including Average Hit Rate and Depression Category Hit Rate, being equivalent to the median system in the limit of one standard deviation
Contextualized French Language Models for Biomedical Named Entity Recognition
Named entity recognition (NER) is key for biomedical applications as it allows knowledge discovery in free text data. As entities are semantic phrases, their meaning is conditioned to the context to avoid ambiguity. In this work, we explore contextualized language models for NER in French biomedical text as part of the Défi Fouille de Textes challenge. Our best approach achieved an F1 -measure of 66% for symptoms and signs, and pathology categories, being top 1 for subtask 1. For anatomy, dose, exam, mode, moment, substance, treatment, and value categories, it achieved an F1 -measure of 75% (subtask 2). If considered all categories, our model achieved the best result in the challenge, with an F1 -measure of 72%. The use of an ensemble of neural language models proved to be very effective, improving a CRF baseline by up to 28% and a single specialised language model by 4La reconnaissance des entités nommées (NER) est essentielle pour les applications biomédicales car elle permet la découverte de connaissances dans des données en texte libre. Comme les entités sont des phrases sémantiques, leur signification est conditionnée par le contexte pour éviter toute ambiguïté. Dans ce travail, nous explorons les modèles de langage contextualisés pour la NER dans les textes biomédicaux français dans le cadre du Défi Fouille de Textes. Notre meilleure approche a obtenu une mesure F1 de 66% pour les symptômes et les signes, et les catégories de pathologie, en étant dans le top 1 pour la sous-tâche 1. Pour les catégories anatomie, dose, examen, mode, moment, substance, traitement et valeur, elle a obtenu une mesure F1 de 75% (sous-tâche 2). Si l’on considère toutes les catégories, notre modèle a obtenu le meilleur résultat dans le cadre de ce défi, avec une mesure F1 de 72%. L’utilisation d’un ensemble de modèles de langages neuronaux s’est révélée très efficace, améliorant une base de référence du CRF de 28% et un modèle de langage spécialisé unique de 4%
An Extended Overview of the CLEF 2020 ChEMU Lab: Information Extraction of Chemical Reactions from Patents
The discovery of new chemical compounds is perceived as a key driver of the chemistry industry and many other economic sectors. The information about the new discoveries are usually disclosed in scientific literature and in particular, in chemical patents, since patents are often the first venues where the new chemical compounds are publicized. Despite the significance of the information provided in chemical patents, extracting the information from patents is costly due to the large volume of existing patents and its drastic expansion rate. The Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020), provides a platform to advance the state-of-the-arts in automatic information extraction systems over chemical patents. In particular, we focus on extracting synthesis process of new chemical compounds from chemical patents. Using the ChEMU corpus of 1500 “snippets” (text segments) sampled from 170 patent documents and annotated by chemical experts, we defined two key information extraction tasks. Task 1 targets at chemical named entity recognition, i.e., the identification of chemical compounds and their specific roles in chemical reactions. Task 2 targets at event extraction, i.e., the identification of reaction steps, relating the chemical compounds involved in a chemical reaction. In this paper, we provide an overview of our ChEMU2020 lab. Herein, we describe the resources created for the two tasks, the evaluation methodology adopted, and participants results. We also provide a brief summary of the methods employed by participants of this lab and the results obtained across 46 runs from 11 teams, finding that several submissions achieve substantially better results than the baseline methods prepared by the organizers