NII Repository (National Institute of Informatics)
Not a member yet
2020 research outputs found
Sort by
TUSNLP at the NTCIR-18 MedNLP-CHAT Task: Utilization of External Medical Knowledge and Hybrid Approach of BERT and ChatGPT
We developed model systems for detecting medical, legal, and ethical risks in medical chatbot answers by using BERT and ChatGPT language models. The ChatGPT model system, which refers to external medical knowledge, performed best in detecting medical risk, while the BERT model system performed well in detecting legal and ethical risks. The hybrid model system reduces missed risks by combining the best of the BERT and ChatGPT model systems and has the best recall values for all risk determination models. This study demonstrates the usefulness of utilizing external medical knowledge and the effectiveness of the hybrid approach.conference pape
ORAD at NTCIR-18 RadNLP 2024 Shared Task
Here, we report our approach to the NTCIR-18 RadNLP2024 Shared Task (Japanese Track, Main Task). In this study, we developed a system to determine the TNM classification from lung cancer using Japanese radiology reports. Specifically, we provided Google DeepMind’s Gemini 2.0 Flash Experimental (gemini-2.0-flash-exp) with a prompt that combines Chain-of-Thought (CoT) and Many-Shot In-Context Learning (ICL), enabling automatic prediction of the T, N, and M factors for each case. Besides accuracy, interpretability is crucial in the medical domain; thus, having the model output the rationale for its TNM classification ensures a degree of transparency. Moreover, by including numerous examples of CoT-based reasoning—written by a radiologist with 5 years of dedicated experience in diagnostic radiology—to explain how the TNM classification is derived, we achieved improved inference accuracy. Furthermore, to address privacy concerns and the need for local inference without network connectivity in clinical settings, we performed Supervised Fine-Tuning (SFT) using Gemma2-9b-it, a comparatively lightweight open-source model. By providing the model with CoT-based reasoning steps leading to TNM classification as training data, we observed improved inference accuracy. These findings demonstrate that additional data and prompt strategies to support large language model (LLM)-based inference can be highly effective in automating TNM classification while also indicating the feasibility of realizing interpretability in LLM-based medical applications.conference pape
Automated Lung Cancer Staging from Radiological Reports: A Large Language Model Approach for the NTCIR-18 RadNLP Task
Lung cancer TNM classification from narrative radiology reports presents challenges due to expression variability and complex relationships between findings. This study develops an automated TNM classification system utilizing large language models (LLMs) with supervised fine-tuning (SFT) and specialized prompting (SP) approaches. We evaluated our system on the NTCIR-18 RadNLP 2024 Task dataset, achieving 72.69\% (Japanese) and 55.56\% (English) fine-grained accuracy, ranking 5th among 15 teams. Our system demonstrated particularly high performance in N-factor classification (>93.98\% accuracy) and in the subtask of textual analysis (ranking 1st in Japanese and 3rd in English tracks). Error analysis revealed challenges in interpreting complex expressions and implicit information. This system shows potential for clinical workflow optimization, standardization of TNM classification, and educational support, with implications for improving cancer staging practices.conference pape
第30回大学図書館と国立情報学研究所との連携・協力推進会議議事次第
会議名:第30回大学図書館と国立情報学研究所との連携・協力推進会議
開催場所:オンライン
日時:2025年7月15日(火)13:30~15:30conference outpu
ditlab at the NTCIR-18 Transfer-2 Task
The ditlab team participated in the RAG and DMR tasks of the NTCIR-18 Transfer-2 task. For the RAG task, we proposed a late fusion method for answer generation that uses multiple contexts retrieved by the dense passage retriever. Unlike sequential approaches that input contexts sequentially into large language models (LLM), our method processes contexts in parallel and employs majority voting to determine the final answer. We also fine-tuned the LLM using a LoRA-based method to better handle quiz-style questions, achieving over 10 points gains against the baseline in terms of accuracy.For the DMR task, we introduce a modality-aware sensor encoder that processes numerical and textual sensor features separately, and enhance geolocation features by converting latitude/longitude data into address strings via k-nearest neighbor matching. Although our baseline performance is degraded from the official baseline due to the mismatch of data between the training and evaluation data, our approach improved the image-to-sensor retrieval performance from our baseline.conference pape
目指すは知識活用基盤 -情報メディアの50年を15分で振り返りながら-
会議名:学術情報基盤オープンフォーラム2025
開催場所:CiNii Researchトラック「これからどうなる?CiNii Research
」
日時:2025年6月16日(月)~6月18日(水)conference outpu
IMNTPU at NTCIR-18 MedNLP-CHAT Task: Evaluating Agentic AI for Multilingual Risk Assessment in Medical Chatbots
The IMNTPU team presents a multilingual evaluation of Agentic AI for chatbot risk classification in the NTCIR-18 MedNLP-CHAT task. Our framework integrates fine-tuned small models, optimized few-shot prompting with GPT-4o, and multi-agent aggregation via majority and trust-weighted voting. Results show that Agentic AI enhances decision consistency, especially in subjective tasks like ethical risk, but yields limited gains in structured domains such as medical and legal assessment. Language-specific outcomes reveal that annotation quality and linguistic complexity jointly affect model performance, with Japanese systems showing the most stability. Confidence analysis highlights a decoupling between model certainty and accuracy, underscoring the need for adaptive trust and calibration strategies. Building on these insights, we propose a Trust-Guided Agentic AI architecture featuring self-consistency filtering, dynamic trust updating, and Chain-of-Thought prompting to further improve reliability in safety-critical AI systems.conference pape
TMUNLPG2 at the NTCIR-18 MedNLP-CHAT Task
The TMUNLPG2 team participated in the Japanese subtask of the NTCIR-18 Medical Natural Language Processing for AI Chat (MedNLP-CHAT) Task. This paper presents our methodological approach and analyzes the official results. For the Japanese subtask, we implemented two distinct methodologies addressing the objective and subjective components. In the objective task, we fine-tuned a pre-trained language model enhanced with focal loss, comprehensive feature engineering, and strategic data augmentation techniques to optimize performance. For the subjective task, we developed specialized feature engineering methods to extract implicit semantic relationships within question-answer pairs, subsequently leveraging these features to train a robust deep learning architecture. Our approach yielded significant results, with TMUNLPG2 achieving the highest average F1-score among seven participating teams in the objective task and securing second place in the subjective task. These outcomes demonstrate the efficacy of our methodological framework and highlight its potential applications in advancing medical natural language processing systems.conference pape
NTCIR-18 RadNLP 2024 Overview: Dataset and Solutions for Automated Lung Cancer Staging
Radiology reports play a vital role in clinical workflows, serving as a primary means for radiologists to communicate imaging findings to physicians. However, the increasing number of imaging studies has made it challenging to produce and interpret comprehensive reports in a timely manner. Natural language processing (NLP) has shown potential to alleviate this burden, yet most existing studies are limited to English, while clinical reports are often written in local languages. To address this gap, we have developed and released Japanese medical text datasets through a series of shared tasks. Our recent efforts, including NTCIR-16 Real-MedNLP and NTCIR-17 RR-TNM, focused on automating lung cancer staging from radiology reports using the TNM classification system. This task is clinically significant, yet challenging due to the implicit nature of staging information and the complexity of TNM criteria. In this paper, we introduce the NTCIR-18 RadNLP 2024 shared task, which extends the previous task with finer-grained classification, a larger and bilingual corpus, and new sentence-level subtasks. We present the dataset, participating systems, and evaluation results, aiming to provide practical insights into building NLP systems for cancer staging support.conference pape