3 research outputs found
ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data
Radiology report generation, as a key step in medical image analysis, is
critical to the quantitative analysis of clinically informed decision-making
levels. However, complex and diverse radiology reports with cross-source
heterogeneity pose a huge generalizability challenge to the current methods
under massive data volume, mainly because the style and normativity of
radiology reports are obviously distinctive among institutions, body regions
inspected and radiologists. Recently, the advent of large language models (LLM)
offers great potential for recognizing signs of health conditions. To resolve
the above problem, we collaborate with the Second Xiangya Hospital in China and
propose ChatRadio-Valuer based on the LLM, a tailored model for automatic
radiology report generation that learns generalizable representations and
provides a basis pattern for model adaptation in sophisticated analysts' cases.
Specifically, ChatRadio-Valuer is trained based on the radiology reports from a
single institution by means of supervised fine-tuning, and then adapted to
disease diagnosis tasks for human multi-system evaluation (i.e., chest,
abdomen, muscle-skeleton, head, and maxillofacial neck) from six different
institutions in clinical-level events. The clinical dataset utilized in this
study encompasses a remarkable total of \textbf{332,673} observations. From the
comprehensive results on engineering indicators, clinical efficacy and
deployment cost metrics, it can be shown that ChatRadio-Valuer consistently
outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and
GPT-4 et al., in terms of the diseases diagnosis from radiology reports.
ChatRadio-Valuer provides an effective avenue to boost model generalization
performance and alleviate the annotation workload of experts to enable the
promotion of clinical AI applications in radiology reports
Performance engineering of data-intensive applications
Data-intensive programs deal with big chunks of data and often contain compute-intensive characteristics. Among various HPC application domains, big data analytics, machine learning and the more recent deep-learning models are well-known data-intensive applications. An efficient design of such applications demands extensive knowledge of the target hardware and software, particularly the memory/cache hierarchy and the data communication among threads/processes. Such a requirement makes code development an arduous task, as inappropriate data structures and algorithm design may result in superfluous runtime, let alone hardware incompatibilities while porting the code to other platforms.
In this dissertation, we introduce a set of tools and methods for the performance engineering of parallel data-intensive programs. We start with performance profiling to gain insights on thread communications and relevant code optimizations. Then, by narrowing down our scope to deep-learning applications, we introduce our tools for enhancing the performance portability and scalability of convolutional neural networks (ConvNet) at inference and training phases.
Our first contribution is a novel performance-profiling method to unveil potential communication bottlenecks caused by data-access patterns and thread interactions. Our findings show that the data shared between a pair of threads should be reused with a reasonably short intervals to preserve data locality, yet existing profilers neglect them and mainly report the communication volume. We propose new hardware-independent metrics to characterize thread communication and provide suggestions for applying appropriate optimizations on a specific code region. Our experiments show that applying relevant optimizations improves the performance in Rodinia benchmarks by up to 56%.
For the next contribution, we developed a framework for automatic generation of efficient and performance-portable convolution kernels, including Winograd convolutions, for various GPU platforms. We employed a synergy of meta-programming, symbolic execution, and auto-tuning. The results demonstrate efficient kernels generated through an automated optimization pipeline with runtimes close to vendor deep-learning libraries, and the minimum required programming effort confirms the performance portability of our approach. Furthermore, our symbolic execution method exploits repetitive patterns in Winograd convolutions, enabling us to reduce the number of arithmetic operations by up to 62% without compromising the numerical stability.
Lastly, we investigate possible methods to scale the performance of ConvNets in training and inference phases. Our specialized training platform equipped with a novel topology-aware network pruning algorithm enables rapid training, neural architecture search, and network compression. Thus, an AI model training can be easily scaled to a multitude of compute nodes, leading to faster model design with less operating costs. Furthermore, the network compression component scales a ConvNet model down by removing redundant layers, preparing the model for a more pertinent deployment.
Altogether, this work demonstrates the necessity and shows the benefit of performance engineering and parallel programming methods in accelerating emerging data-intensive workloads. With the help of the proposed tools and techniques, we pinpoint data communication bottlenecks and achieve performance portability and scalability in data-intensive applications