136 research outputs found
A New Dataset and Method for Creativity Assessment Using the Alternate Uses Task
Creativity ratings by humans for the alternate uses task (AUT) tend to be subjective and inefficient. To automate the scoring process of the AUT, previous literature suggested using semantic distance from non-contextual models. In this paper, we extend this line of research by including contextual semantic models and more importantly, exploring the feasibility of predicting creativity ratings with supervised discriminative machine learning models. Based on a newly collected dataset, our results show that supervised models can successfully classify between creative and non-creative responses even with unbalanced data, and can generalise well to out-of-domain unseen prompts
Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section
Recent advances in large language models have led to renewed interest in
natural language processing in healthcare using the free text of clinical
notes. One distinguishing characteristic of clinical notes is their long time
span over multiple long documents. The unique structure of clinical notes
creates a new design choice: when the context length for a language model
predictor is limited, which part of clinical notes should we choose as the
input? Existing studies either choose the inputs with domain knowledge or
simply truncate them. We propose a framework to analyze the sections with high
predictive power. Using MIMIC-III, we show that: 1) predictive power
distribution is different between nursing notes and discharge notes and 2)
combining different types of notes could improve performance when the context
length is large. Our findings suggest that a carefully selected sampling
function could enable more efficient information extraction from clinical
notes.Comment: Our code is publicly available on GitHub
(https://github.com/nyuolab/EfficientTransformer
Crustal and Upper Mantle Structure Beneath the Northeastern Tibetan Plateau from Joint Analysis of Receiver Functions and Rayleigh Wave Dispersions
The crustal and upper mantle velocity structure in the northeastern Tibetan Plateau is obtained from joint analysis of receiver functions and Rayleigh wave dispersions. The resulting velocity model reveals a close correlation between the thick (\u3e60 km) crust and the presence of an intracrustal low-velocity zone beneath the Qiangtang and Songpan-Ganzi terranes as well as the northwestern Qilian orogen. However, the high Vp/Vs ratio of the crust is found only beneath the Qiangtang and Songpan-Ganzi terranes. The crustal low velocity zone does not appear in the west Qinling and southeastern Qilian orogens, which have a relatively thin (∼50 km) crust, indicating that crustal channel flow is not the primary mechanism by which the northeastern Tibetan Plateau grows. A continuous low velocity zone from the mid-to-lower crust down to 160 km beneath the eastern Kunlun fault suggests an induced local mantle upwelling after partial detachment of the lithosphere
RRHF: Rank Responses to Align Language Models with Human Feedback without tears
Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment
of large language models with human preferences, significantly enhancing the
quality of interactions between humans and these models. InstructGPT implements
RLHF through several stages, including Supervised Fine-Tuning (SFT), reward
model training, and Proximal Policy Optimization (PPO). PPO, however, is
sensitive to hyperparameters and requires a minimum of four models in its
standard implementation, which makes it hard to train. In contrast, we propose
a novel learning paradigm called RRHF, which scores responses generated by
different sampling policies and learns to align them with human preferences
through ranking loss. RRHF can efficiently align language model output
probabilities with human preferences as robust as fine-tuning and it only needs
1 to 2 models during tuning. In addition, RRHF can be considered an extension
of SFT and reward models while being simpler than PPO in terms of coding, model
counts, and hyperparameters. The entire alignment process can be accomplished
within a single RRHF training session. We evaluate RRHF using LLaMA and Alpaca
on Helpful and Harmless data, demonstrating performance comparable to PPO.Comment: Codes available at https://github.com/GanjinZero/RRH
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
The complementary potential of Large Language Models (LLM) assumes
off-the-shelf LLMs have heterogeneous expertise in a wide range of domains and
tasks so that an ensemble of LLMs can achieve consistently better performance.
Existing ensemble methods for LLMs mainly focus on reward model ranking of
outputs, leading to significant computation overhead. To combat this issue, we
revisit the complementary potential of LLMs and further elaborate it by mining
latent expertise with off-the-shelf reward models. We propose Zooter, a
reward-guided routing method distilling rewards on training queries to train a
routing function, which can precisely distribute each query to the LLM with
expertise about it. We also integrate a tag-based label enhancement to mitigate
noise from uncertainty when using rewards as silver supervision. Zooter shows
computation efficiency in inference as it introduces only a minor computation
overhead of a routing function compared with reward model ranking methods. We
evaluate Zooter on a comprehensive benchmark collection with 26 subsets on
different domains and tasks. Zooter outperforms the best single model on
average and ranks first on 44% of tasks, even surpassing multiple reward model
ranking methods
- …