Search CORE

21 research outputs found

JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning

Author: Kodera Satoshi
Sakaji Hiroki
Sukeda Issey
Suzuki Masahiro
Publication venue
Publication date: 30/11/2023
Field of study

In the ongoing wave of impact driven by large language models (LLMs) like ChatGPT, the adaptation of LLMs to medical domain has emerged as a crucial research frontier. Since mainstream LLMs tend to be designed for general-purpose applications, constructing a medical LLM through domain adaptation is a huge challenge. While instruction-tuning is used to fine-tune some LLMs, its precise roles in domain adaptation remain unknown. Here we show the contribution of LoRA-based instruction-tuning to performance in Japanese medical question-answering tasks. In doing so, we employ a multifaceted evaluation for multiple-choice questions, including scoring based on "Exact match" and "Gestalt distance" in addition to the conventional accuracy. Our findings suggest that LoRA-based instruction-tuning can partially incorporate domain-specific knowledge into LLMs, with larger models demonstrating more pronounced effects. Furthermore, our results underscore the potential of adapting English-centric models for Japanese applications in domain adaptation, while also highlighting the persisting limitations of Japanese-centric models. This initiative represents a pioneering effort in enabling medical institutions to fine-tune and operate models without relying on external services.Comment: 8 pages, 1 figure

arXiv.org e-Print Archive

Is ChatGPT the Future of Causal Text Mining? A Comprehensive Evaluation and Analysis

Author: Izumi Kiyoshi
Kobayashi Ryotaro
Sakaji Hiroki
Suzuki Masahiro
Takayanagi Takehiro
Publication venue
Publication date: 23/02/2024
Field of study

Causality is fundamental in human cognition and has drawn attention in diverse research fields. With growing volumes of textual data, discerning causalities within text data is crucial, and causal text mining plays a pivotal role in extracting meaningful patterns. This study conducts comprehensive evaluations of ChatGPT's causal text mining capabilities. Firstly, we introduce a benchmark that extends beyond general English datasets, including domain-specific and non-English datasets. We also provide an evaluation framework to ensure fair comparisons between ChatGPT and previous approaches. Finally, our analysis outlines the limitations and future challenges in employing ChatGPT for causal text mining. Specifically, our analysis reveals that ChatGPT serves as a good starting point for various datasets. However, when equipped with a sufficient amount of training data, previous models still surpass ChatGPT's performance. Additionally, ChatGPT suffers from the tendency to falsely recognize non-causal sequences as causal sequences. These issues become even more pronounced with advanced versions of the model, such as GPT-4. In addition, we highlight the constraints of ChatGPT in handling complex causality types, including both intra/inter-sentential and implicit causality. The model also faces challenges with effectively leveraging in-context learning and domain adaptation. We release our code to support further research and development in this field

arXiv.org e-Print Archive

Learning company embeddings from annual reports for fine-grained industry characterization

Author: Camacho Collados Jose
Ito Tomoki
Sakaji Hiroki
Schockaert Steven
Publication venue
Publication date
Field of study

Organizingcompaniesbyindustrysegment(e.g.artiﬁcial intelligence, healthcare or ﬁntech) is useful foranalyzingstockmarketperformanceandfordesigning theme base investment funds, among others. Current practice is to manually assign companies to sectors or industries from a small predeﬁned list, which has two key limitations. First, due to the manual effort involved, this strategy is only feasible for relatively mainstream industry segments, and can thus not easily be used for niche or emerging topics. Second, the use of hard label assignments ignores the fact that different companies will be more or less exposed to a particular segment. To address these limitations, we propose to learn vector representations of companies based ontheirannualreports. Thekeychallengeistodistill the relevant information from these reports for characterizing their industries, since annual reports also contain a lot of information which is not relevant for our purpose. To this end, we introduce a multi-task learning strategy, which is based on ﬁne-tuning the BERT language model on (i) existingsectorlabelsand(ii)stockmarketperformance. Experiments in both English and Japanese demonstrate the usefulness of this strategy

Online Research @ Cardiff

Enhancing risk analysis with GNN: Edge classification in risk causality from securities reports

Author: Hajime Sasaki
Hiroki Sakaji
Motomasa Fujii
Shigeru Masuyama
Publication venue: Elsevier
Publication date: 01/04/2024
Field of study

In the evolving business landscape, the scope of risk factors is extremely wide, making it impossible for all business-related risks to be captured within publicly available financial disclosures. Previous studies have predominantly focused on understanding causal relationships and risk chains based on the risks that are explicitly documented. Thus, risks that are not explicitly listed are often overlooked. The aim of this study was to analyze risk chains and extract implicit information from disclosed documents. We focused on edge classification and suggested suitable labels for the edges of a risk chain graph. Furthermore, we proposed an edge-type classification in heterogeneous graphs using Graph Neural Networks (GNN). This was accomplished by defining six risks and constructing risk-chain graphs. The outcomes demonstrated the edge-type classification proved to be an effective approach compared with existing method. This method holds the potential to aid investors in enhancing their profits and making more informed decisions

Directory of Open Access Journals

Autoencoder-based three-factor model for the yield curve of Japanese government bonds and a trading strategy

Author: Izumi Kiyoshi
Matsushima Hiroyasu
Sakaji Hiroki
Suimon Yoshiyuki
Publication venue: Basel: MDPI
Publication date: 01/01/2020
Field of study

Interest rates are representative indicators that reflect the degree of economic activity. The yield curve, which combines government bond interest rates by maturity, fluctuates to reflect various macroeconomic factors. Central bank monetary policy is one of the significant factors influencing interest rate markets. Generally, when the economy slows down, the central bank tries to stimulate the economy by lowering the policy rate to establish an environment in which companies and individuals can easily raise funds. In Japan, the shape of the yield curve has changed significantly in recent years following major changes in monetary policy. Therefore, an increasing need exists for a model that can flexibly respond to the various shapes of yield curves. In this research, we construct a three-factor model to represent the Japanese yield curve using the machine learning approach of an autoencoder. In addition, we focus on the model parameters of the intermediate layer of the neural network that constitute the autoencoder and confirm that the three automatically generated factors represent the "Level", "Curvature," and "Slope" of the yield curve. Furthermore, we develop a long-short strategy for Japanese government bonds by setting their valuation with the autoencoder, and we confirm good performance compared with the trend-follow investment strategy

EconStor (ZBW Kiel)

Autoencoder-Based Three-Factor Model for the Yield Curve of Japanese Government Bonds and a Trading Strategy

Author: Hiroki Sakaji
Hiroyasu Matsushima
Kiyoshi Izumi
Yoshiyuki Suimon
Publication venue: 'MDPI AG'
Publication date: 23/04/2020
Field of study

Interest rates are representative indicators that reflect the degree of economic activity. The yield curve, which combines government bond interest rates by maturity, fluctuates to reflect various macroeconomic factors. Central bank monetary policy is one of the significant factors influencing interest rate markets. Generally, when the economy slows down, the central bank tries to stimulate the economy by lowering the policy rate to establish an environment in which companies and individuals can easily raise funds. In Japan, the shape of the yield curve has changed significantly in recent years following major changes in monetary policy. Therefore, an increasing need exists for a model that can flexibly respond to the various shapes of yield curves. In this research, we construct a three-factor model to represent the Japanese yield curve using the machine learning approach of an autoencoder. In addition, we focus on the model parameters of the intermediate layer of the neural network that constitute the autoencoder and confirm that the three automatically generated factors represent the “Level,” “Curvature,” and “Slope” of the yield curve. Furthermore, we develop a long–short strategy for Japanese government bonds by setting their valuation with the autoencoder, and we confirm good performance compared with the trend-follow investment strategy

Multidisciplinary Digital Publishing Institute

地方議会会議録における発言文推定手法の性能評価

Author: Hiroki SAKAJI
Kenji ARAKI
Takuma HIMORI
Yasutomo KIMURA
Publication venue: 'Japan Society for Fuzzy Theory and Intelligent Informatics'
Publication date
Field of study

Crossref

Word-Level Contextual Sentiment Analysis with Interpretability

Author: Ito Tomoki
Izumi Kiyoshi
Sakaji Hiroki
Tsubouchi Kota
Yamashita Tatsuo
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 03/04/2020
Field of study

Word-level contextual sentiment analysis (WCSA) is an important task for mining reviews or opinions. When analyzing this type of sentiment in the industry, both the interpretability and practicality are often required. However, such a WCSA method has not been established. This study aims to develop a WCSA method with interpretability and practicality. To achieve this aim, we propose a novel neural network architecture called Sentiment Interpretable Neural Network (SINN). To realize this SINN practically, we propose a novel learning strategy called Lexical Initialization Learning (LEXIL). SINN is interpretable because it can extract word-level contextual sentiment through extracting word-level original sentiment and its local and global word-level contexts. Moreover, LEXIL can develop the SINN without any specific knowledge for context; therefore, this strategy is practical. Using real textual datasets, we experimentally demonstrate that the proposed LEXIL is effective for improving the interpretability of SINN and that the SINN features both the high WCSA ability and high interpretability

Association for the Advancement of Artificial Intelligence: AAAI Publications