21 research outputs found
JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning
In the ongoing wave of impact driven by large language models (LLMs) like
ChatGPT, the adaptation of LLMs to medical domain has emerged as a crucial
research frontier. Since mainstream LLMs tend to be designed for
general-purpose applications, constructing a medical LLM through domain
adaptation is a huge challenge. While instruction-tuning is used to fine-tune
some LLMs, its precise roles in domain adaptation remain unknown. Here we show
the contribution of LoRA-based instruction-tuning to performance in Japanese
medical question-answering tasks. In doing so, we employ a multifaceted
evaluation for multiple-choice questions, including scoring based on "Exact
match" and "Gestalt distance" in addition to the conventional accuracy. Our
findings suggest that LoRA-based instruction-tuning can partially incorporate
domain-specific knowledge into LLMs, with larger models demonstrating more
pronounced effects. Furthermore, our results underscore the potential of
adapting English-centric models for Japanese applications in domain adaptation,
while also highlighting the persisting limitations of Japanese-centric models.
This initiative represents a pioneering effort in enabling medical institutions
to fine-tune and operate models without relying on external services.Comment: 8 pages, 1 figure
Is ChatGPT the Future of Causal Text Mining? A Comprehensive Evaluation and Analysis
Causality is fundamental in human cognition and has drawn attention in
diverse research fields. With growing volumes of textual data, discerning
causalities within text data is crucial, and causal text mining plays a pivotal
role in extracting meaningful patterns. This study conducts comprehensive
evaluations of ChatGPT's causal text mining capabilities. Firstly, we introduce
a benchmark that extends beyond general English datasets, including
domain-specific and non-English datasets. We also provide an evaluation
framework to ensure fair comparisons between ChatGPT and previous approaches.
Finally, our analysis outlines the limitations and future challenges in
employing ChatGPT for causal text mining. Specifically, our analysis reveals
that ChatGPT serves as a good starting point for various datasets. However,
when equipped with a sufficient amount of training data, previous models still
surpass ChatGPT's performance. Additionally, ChatGPT suffers from the tendency
to falsely recognize non-causal sequences as causal sequences. These issues
become even more pronounced with advanced versions of the model, such as GPT-4.
In addition, we highlight the constraints of ChatGPT in handling complex
causality types, including both intra/inter-sentential and implicit causality.
The model also faces challenges with effectively leveraging in-context learning
and domain adaptation. We release our code to support further research and
development in this field
Learning company embeddings from annual reports for fine-grained industry characterization
Organizingcompaniesbyindustrysegment(e.g.artificial intelligence, healthcare or fintech) is useful foranalyzingstockmarketperformanceandfordesigning theme base investment funds, among others. Current practice is to manually assign companies to sectors or industries from a small predefined list, which has two key limitations. First, due to the manual effort involved, this strategy is only feasible for relatively mainstream industry segments, and can thus not easily be used for niche or emerging topics. Second, the use of hard label assignments ignores the fact that different companies will be more or less exposed to a particular segment. To address these limitations, we propose to learn vector representations of companies based ontheirannualreports. Thekeychallengeistodistill the relevant information from these reports for characterizing their industries, since annual reports also contain a lot of information which is not relevant for our purpose. To this end, we introduce a multi-task learning strategy, which is based on fine-tuning the BERT language model on (i) existingsectorlabelsand(ii)stockmarketperformance. Experiments in both English and Japanese demonstrate the usefulness of this strategy
Enhancing risk analysis with GNN: Edge classification in risk causality from securities reports
In the evolving business landscape, the scope of risk factors is extremely wide, making it impossible for all business-related risks to be captured within publicly available financial disclosures. Previous studies have predominantly focused on understanding causal relationships and risk chains based on the risks that are explicitly documented. Thus, risks that are not explicitly listed are often overlooked. The aim of this study was to analyze risk chains and extract implicit information from disclosed documents. We focused on edge classification and suggested suitable labels for the edges of a risk chain graph. Furthermore, we proposed an edge-type classification in heterogeneous graphs using Graph Neural Networks (GNN). This was accomplished by defining six risks and constructing risk-chain graphs. The outcomes demonstrated the edge-type classification proved to be an effective approach compared with existing method. This method holds the potential to aid investors in enhancing their profits and making more informed decisions
Autoencoder-based three-factor model for the yield curve of Japanese government bonds and a trading strategy
Interest rates are representative indicators that reflect the degree of economic activity. The yield curve, which combines government bond interest rates by maturity, fluctuates to reflect various macroeconomic factors. Central bank monetary policy is one of the significant factors influencing interest rate markets. Generally, when the economy slows down, the central bank tries to stimulate the economy by lowering the policy rate to establish an environment in which companies and individuals can easily raise funds. In Japan, the shape of the yield curve has changed significantly in recent years following major changes in monetary policy. Therefore, an increasing need exists for a model that can flexibly respond to the various shapes of yield curves. In this research, we construct a three-factor model to represent the Japanese yield curve using the machine learning approach of an autoencoder. In addition, we focus on the model parameters of the intermediate layer of the neural network that constitute the autoencoder and confirm that the three automatically generated factors represent the "Level", "Curvature," and "Slope" of the yield curve. Furthermore, we develop a long-short strategy for Japanese government bonds by setting their valuation with the autoencoder, and we confirm good performance compared with the trend-follow investment strategy
Autoencoder-Based Three-Factor Model for the Yield Curve of Japanese Government Bonds and a Trading Strategy
Interest rates are representative indicators that reflect the degree of economic activity. The yield curve, which combines government bond interest rates by maturity, fluctuates to reflect various macroeconomic factors. Central bank monetary policy is one of the significant factors influencing interest rate markets. Generally, when the economy slows down, the central bank tries to stimulate the economy by lowering the policy rate to establish an environment in which companies and individuals can easily raise funds. In Japan, the shape of the yield curve has changed significantly in recent years following major changes in monetary policy. Therefore, an increasing need exists for a model that can flexibly respond to the various shapes of yield curves. In this research, we construct a three-factor model to represent the Japanese yield curve using the machine learning approach of an autoencoder. In addition, we focus on the model parameters of the intermediate layer of the neural network that constitute the autoencoder and confirm that the three automatically generated factors represent the “Level,” “Curvature,” and “Slope” of the yield curve. Furthermore, we develop a long–short strategy for Japanese government bonds by setting their valuation with the autoencoder, and we confirm good performance compared with the trend-follow investment strategy
Word-Level Contextual Sentiment Analysis with Interpretability
Word-level contextual sentiment analysis (WCSA) is an important task for mining reviews or opinions. When analyzing this type of sentiment in the industry, both the interpretability and practicality are often required. However, such a WCSA method has not been established. This study aims to develop a WCSA method with interpretability and practicality. To achieve this aim, we propose a novel neural network architecture called Sentiment Interpretable Neural Network (SINN). To realize this SINN practically, we propose a novel learning strategy called Lexical Initialization Learning (LEXIL). SINN is interpretable because it can extract word-level contextual sentiment through extracting word-level original sentiment and its local and global word-level contexts. Moreover, LEXIL can develop the SINN without any specific knowledge for context; therefore, this strategy is practical. Using real textual datasets, we experimentally demonstrate that the proposed LEXIL is effective for improving the interpretability of SINN and that the SINN features both the high WCSA ability and high interpretability