3 research outputs found
AlpaCare:Instruction-tuned Large Language Models for Medical Application
Large Language Models (LLMs) have demonstrated significant enhancements in
instruction-following abilities through instruction tuning, achieving notable
performances across various tasks. Previous research has focused on fine-tuning
medical domain-specific LLMs using an extensive array of medical-specific data,
incorporating millions of pieces of biomedical literature to augment their
medical capabilities. However, existing medical instruction-tuned LLMs have
been constrained by the limited scope of tasks and instructions available,
restricting the efficacy of instruction tuning and adversely affecting
performance in the general domain. In this paper, we fine-tune LLaMA-series
models using 52k diverse, machine-generated, medical instruction-following
data, MedInstruct-52k, resulting in the model AlpaCare. Comprehensive
experimental results on both general and medical-specific domain free-form
instruction evaluations showcase AlpaCare's strong medical proficiency and
generalizability compared to previous instruction-tuned models in both medical
and general domains. We provide public access to our MedInstruct-52k dataset
and a clinician-crafted free-form instruction test set, MedInstruct-test, along
with our codebase, to foster further research and development. Our project page
is available at https://github.com/XZhang97666/AlpaCare
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks
Automatically evaluating vision-language tasks is challenging, especially
when it comes to reflecting human judgments due to limitations in accounting
for fine-grained details. Although GPT-4V has shown promising results in
various multi-modal tasks, leveraging GPT-4V as a generalist evaluator for
these tasks has not yet been systematically explored. We comprehensively
validate GPT-4V's capabilities for evaluation purposes, addressing tasks
ranging from foundational image-to-text and text-to-image synthesis to
high-level image-to-image translations and multi-images to text alignment. We
employ two evaluation methods, single-answer grading and pairwise comparison,
using GPT-4V. Notably, GPT-4V shows promising agreement with humans across
various tasks and evaluation methods, demonstrating immense potential for
multi-modal LLMs as evaluators. Despite limitations like restricted visual
clarity grading and real-world complex reasoning, its ability to provide
human-aligned scores enriched with detailed explanations is promising for
universal automatic evaluator
An Efficient Numerical Method for Highly Oscillatory Ordinary Differential Equations
137 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1978.U of I OnlyRestricted to the U of I community idenfinitely during batch ingest of legacy ETD