Search CORE

3 research outputs found

Approach for Named Entity Recognition and Case Identification Implemented by ZuKyo-JA Sub-team at the NTCIR-16 Real-MedNLP Task

Author: Cornelius Joseph
Fujimoto Koji
Horvath Aron N
Ichikawa Kana
Kanjirangat Vani
Krauthammer Michael
Lithgow-Serrano Oscar
Nishio Mizuho
Nooralahzadeh Farhad
Rinaldi Fabio
Sugiyama Osamu
Publication venue
Publication date: 17/06/2022
Field of study

In this NTCIR-16 Real-MedNLP shared task paper, we present the methods of the ZuKyo-JA subteam for solving the Japanese part of Subtask1 and Subtask3 (Subtask1-CR-JA, Subtask1-RR- JA, Subtask3-RR-JA). Our solution is based on a sliding- window approach using a Japanese BERT pre-trained masked- language model., which was used as a common architecture for addressing the specific subtasks. We additionally present a method that makes extensive use of medical knowledge for the same case identification subtask3-RR-JA

ZORA

Leveraging Token-Based Concept Information and Data Augmentation in Few-Resource NER: ZuKyo-EN at the NTCIR-16 Real-MedNLP task

Author: Cornelius Joseph
Fujimoto Koji
Horvath Aron N
Ichikawa Kana
Kanjirangat Vani
Krauthammer Michael
Lithgow-Serrano Oscar
Nishio Mizuho
Nooralahzadeh Farhad
Rinaldi Fabio
Sugiyama Osamu
Publication venue: NTCIR
Publication date: 17/06/2022
Field of study

In this paper, we discuss our contribution to the NII Testbeds and Community for Information Access Research (NTCIR) - 16 Real- MedNLP shared task. Our team (ZuKyo) participated in the English subtask: Few-resource Named Entity Recognition. The main challenge in this low-resource task was a low number of training documents annotated with a high number of tags and attributes. For our submissions, we used different general and domain-specific transfer learning approaches in combination with multiple data augmentation methods. In addition, we experimented with models enriched with biomedical concepts encoded as token-based input feature

ZORA

Lisen&Curate: a platform to facilitate gathering textual evidence for curation of regulation of transcription initiation in bacteria

Author: Collado Vides Pedro Julio
Díaz-Rodríguez Martín
Gama-Castro Socorro
Guadarrama-García Francisco
Lithgow-Serrano Oscar
Méndez-Cruz Carlos-Francisco
Rinaldi Fabio
Salgado Heladia
Solano-Lira Hilda
Tierrafría Víctor H.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

The number of published papers in biomedical research makes it rather impossible for a researcher to keep up to date. This is where manually curated databases contribute facilitating the access to knowledge. However, the structure required by databases strongly limits the type of valuable information that can be incorporated. Here, we present Lisen&Curate, a curation system that facilitates linking sentences or part of sentences (both considered sources) in articles with their corresponding curated objects, so that rich additional information of these objects is easily available to users. These sources are going to be offered both within RegulonDB and a new database, L-Regulon. To show the relevance of our work, two senior curators performed a curation of 31 articles on the regulation of transcription initiation of E. coli using Lisen&Curate. As a result, 194 objects were curated and 781 sources were recorded. We also found that these sources are useful to develop automatic approaches to detect objects in articles by observing word frequency patterns and by carrying out an open information extraction task. Sources may help to elaborate a controlled vocabulary of experimental methods. Finally, we discuss our ecosystem of interconnected applications, RegulonDB, L-Regulon, and Lisen&Curate, to facilitate the access to knowledge on regulation of transcription initiation in bacteria. We see our proposal as the starting point to change the way experimentalists connect a piece of knowledge with its evidence using RegulonDB.This study was supported by the Universidad Nacional Autónoma de México (UNAM) and the National Institute of General Medical Sciences of the National Institutes of Health [grants number 5RO1-GM110597-04 and 1RO1-GM131643-01A1

UPF Digital Repository