27 research outputs found
JEC-QA: A Legal-Domain Question Answering Dataset
We present JEC-QA, the largest question answering dataset in the legal
domain, collected from the National Judicial Examination of China. The
examination is a comprehensive evaluation of professional skills for legal
practitioners. College students are required to pass the examination to be
certified as a lawyer or a judge. The dataset is challenging for existing
question answering methods, because both retrieving relevant materials and
answering questions require the ability of logic reasoning. Due to the high
demand of multiple reasoning abilities to answer legal questions, the
state-of-the-art models can only achieve about 28% accuracy on JEC-QA, while
skilled humans and unskilled humans can reach 81% and 64% accuracy
respectively, which indicates a huge gap between humans and machines on this
task. We will release JEC-QA and our baselines to help improve the reasoning
ability of machine comprehension models. You can access the dataset from
http://jecqa.thunlp.org/.Comment: 9 pages, 2 figures, 10 tables, accepted by AAAI202
Chat2Brain: A Method for Mapping Open-Ended Semantic Queries to Brain Activation Maps
Over decades, neuroscience has accumulated a wealth of research results in
the text modality that can be used to explore cognitive processes.
Meta-analysis is a typical method that successfully establishes a link from
text queries to brain activation maps using these research results, but it
still relies on an ideal query environment. In practical applications, text
queries used for meta-analyses may encounter issues such as semantic redundancy
and ambiguity, resulting in an inaccurate mapping to brain images. On the other
hand, large language models (LLMs) like ChatGPT have shown great potential in
tasks such as context understanding and reasoning, displaying a high degree of
consistency with human natural language. Hence, LLMs could improve the
connection between text modality and neuroscience, resolving existing
challenges of meta-analyses. In this study, we propose a method called
Chat2Brain that combines LLMs to basic text-2-image model, known as Text2Brain,
to map open-ended semantic queries to brain activation maps in data-scarce and
complex query environments. By utilizing the understanding and reasoning
capabilities of LLMs, the performance of the mapping model is optimized by
transferring text queries to semantic queries. We demonstrate that Chat2Brain
can synthesize anatomically plausible neural activation patterns for more
complex tasks of text queries.Comment: 8 pages, 4 figure
ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT
Large language models (LLMs) such as ChatGPT have recently demonstrated
significant potential in mathematical abilities, providing valuable reasoning
paradigm consistent with human natural language. However, LLMs currently have
difficulty in bridging perception, language understanding and reasoning
capabilities due to incompatibility of the underlying information flow among
them, making it challenging to accomplish tasks autonomously. On the other
hand, abductive learning (ABL) frameworks for integrating the two abilities of
perception and reasoning has seen significant success in inverse decipherment
of incomplete facts, but it is limited by the lack of semantic understanding of
logical reasoning rules and the dependence on complicated domain knowledge
representation. This paper presents a novel method (ChatABL) for integrating
LLMs into the ABL framework, aiming at unifying the three abilities in a more
user-friendly and understandable manner. The proposed method uses the strengths
of LLMs' understanding and logical reasoning to correct the incomplete logical
facts for optimizing the performance of perceptual module, by summarizing and
reorganizing reasoning rules represented in natural language format. Similarly,
perceptual module provides necessary reasoning examples for LLMs in natural
language format. The variable-length handwritten equation deciphering task, an
abstract expression of the Mayan calendar decoding, is used as a testbed to
demonstrate that ChatABL has reasoning ability beyond most existing
state-of-the-art methods, which has been well supported by comparative studies.
To our best knowledge, the proposed ChatABL is the first attempt to explore a
new pattern for further approaching human-level cognitive ability via natural
language interaction with ChatGPT
Understanding LLMs: A Comprehensive Overview from Training to Inference
The introduction of ChatGPT has led to a significant increase in the
utilization of Large Language Models (LLMs) for addressing downstream tasks.
There's an increasing focus on cost-efficient training and deployment within
this context. Low-cost training and deployment of LLMs represent the future
development trend. This paper reviews the evolution of large language model
training techniques and inference deployment technologies aligned with this
emerging trend. The discussion on training includes various aspects, including
data preprocessing, training architecture, pre-training tasks, parallel
training, and relevant content related to model fine-tuning. On the inference
side, the paper covers topics such as model compression, parallel computation,
memory scheduling, and structural optimization. It also explores LLMs'
utilization and provides insights into their future development.Comment: 30 pages,6 figure
ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data
Radiology report generation, as a key step in medical image analysis, is
critical to the quantitative analysis of clinically informed decision-making
levels. However, complex and diverse radiology reports with cross-source
heterogeneity pose a huge generalizability challenge to the current methods
under massive data volume, mainly because the style and normativity of
radiology reports are obviously distinctive among institutions, body regions
inspected and radiologists. Recently, the advent of large language models (LLM)
offers great potential for recognizing signs of health conditions. To resolve
the above problem, we collaborate with the Second Xiangya Hospital in China and
propose ChatRadio-Valuer based on the LLM, a tailored model for automatic
radiology report generation that learns generalizable representations and
provides a basis pattern for model adaptation in sophisticated analysts' cases.
Specifically, ChatRadio-Valuer is trained based on the radiology reports from a
single institution by means of supervised fine-tuning, and then adapted to
disease diagnosis tasks for human multi-system evaluation (i.e., chest,
abdomen, muscle-skeleton, head, and maxillofacial neck) from six different
institutions in clinical-level events. The clinical dataset utilized in this
study encompasses a remarkable total of \textbf{332,673} observations. From the
comprehensive results on engineering indicators, clinical efficacy and
deployment cost metrics, it can be shown that ChatRadio-Valuer consistently
outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and
GPT-4 et al., in terms of the diseases diagnosis from radiology reports.
ChatRadio-Valuer provides an effective avenue to boost model generalization
performance and alleviate the annotation workload of experts to enable the
promotion of clinical AI applications in radiology reports
The Ninth Visual Object Tracking VOT2021 Challenge Results
acceptedVersionPeer reviewe
Iteratively Questioning and Answering for Interpretable Legal Judgment Prediction
Legal Judgment Prediction (LJP) aims to predict judgment results according to the facts of cases. In recent years, LJP has drawn increasing attention rapidly from both academia and the legal industry, as it can provide references for legal practitioners and is expected to promote judicial justice. However, the research to date usually suffers from the lack of interpretability, which may lead to ethical issues like inconsistent judgments or gender bias. In this paper, we present QAjudge, a model based on reinforcement learning to visualize the prediction process and give interpretable judgments. QAjudge follows two essential principles in legal systems across the world: Presumption of Innocence and Elemental Trial. During inference, a Question Net will select questions from the given set and an Answer Net will answer the question according to the fact description. Finally, a Predict Net will produce judgment results based on the answers. Reward functions are designed to minimize the number of questions asked. We conduct extensive experiments on several real-world datasets. Experimental results show that QAjudge can provide interpretable judgments while maintaining comparable performance with other state-of-the-art LJP models. The codes can be found from https://github.com/thunlp/QAjudge
Boosting Active Learning via Improving Test Performance
Central to active learning (AL) is what data should be selected for annotation. Existing works attempt to select highly uncertain or informative data for annotation. Nevertheless, it remains unclear how selected data impacts the test performance of the task model used in AL. In this work, we explore such an impact by theoretically proving that selecting unlabeled data of higher gradient norm leads to a lower upper-bound of test loss, resulting in a better test performance. However, due to the lack of label information, directly computing gradient norm for unlabeled data is infeasible. To address this challenge, we propose two schemes, namely expected-gradnorm and entropy-gradnorm. The former computes the gradient norm by constructing an expected empirical loss while the latter constructs an unsupervised loss with entropy. Furthermore, we integrate the two schemes in a universal AL framework. We evaluate our method on classical image classification and semantic segmentation tasks. To demonstrate its competency in domain applications and its robustness to noise, we also validate our method on a cellular imaging analysis task, namely cryo-Electron Tomography subtomogram classification. Results demonstrate that our method achieves superior performance against the state of the art. We refer readers to https://arxiv.org/pdf/2112.05683.pdf for the full version of this paper which includes the appendix and source code link