136 research outputs found

    A New Dataset and Method for Creativity Assessment Using the Alternate Uses Task

    Get PDF
    Creativity ratings by humans for the alternate uses task (AUT) tend to be subjective and inefficient. To automate the scoring process of the AUT, previous literature suggested using semantic distance from non-contextual models. In this paper, we extend this line of research by including contextual semantic models and more importantly, exploring the feasibility of predicting creativity ratings with supervised discriminative machine learning models. Based on a newly collected dataset, our results show that supervised models can successfully classify between creative and non-creative responses even with unbalanced data, and can generalise well to out-of-domain unseen prompts

    Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section

    Full text link
    Recent advances in large language models have led to renewed interest in natural language processing in healthcare using the free text of clinical notes. One distinguishing characteristic of clinical notes is their long time span over multiple long documents. The unique structure of clinical notes creates a new design choice: when the context length for a language model predictor is limited, which part of clinical notes should we choose as the input? Existing studies either choose the inputs with domain knowledge or simply truncate them. We propose a framework to analyze the sections with high predictive power. Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large. Our findings suggest that a carefully selected sampling function could enable more efficient information extraction from clinical notes.Comment: Our code is publicly available on GitHub (https://github.com/nyuolab/EfficientTransformer

    Crustal and Upper Mantle Structure Beneath the Northeastern Tibetan Plateau from Joint Analysis of Receiver Functions and Rayleigh Wave Dispersions

    Get PDF
    The crustal and upper mantle velocity structure in the northeastern Tibetan Plateau is obtained from joint analysis of receiver functions and Rayleigh wave dispersions. The resulting velocity model reveals a close correlation between the thick (\u3e60 km) crust and the presence of an intracrustal low-velocity zone beneath the Qiangtang and Songpan-Ganzi terranes as well as the northwestern Qilian orogen. However, the high Vp/Vs ratio of the crust is found only beneath the Qiangtang and Songpan-Ganzi terranes. The crustal low velocity zone does not appear in the west Qinling and southeastern Qilian orogens, which have a relatively thin (∼50 km) crust, indicating that crustal channel flow is not the primary mechanism by which the northeastern Tibetan Plateau grows. A continuous low velocity zone from the mid-to-lower crust down to 160 km beneath the eastern Kunlun fault suggests an induced local mantle upwelling after partial detachment of the lithosphere

    RRHF: Rank Responses to Align Language Models with Human Feedback without tears

    Full text link
    Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models with human preferences, significantly enhancing the quality of interactions between humans and these models. InstructGPT implements RLHF through several stages, including Supervised Fine-Tuning (SFT), reward model training, and Proximal Policy Optimization (PPO). PPO, however, is sensitive to hyperparameters and requires a minimum of four models in its standard implementation, which makes it hard to train. In contrast, we propose a novel learning paradigm called RRHF, which scores responses generated by different sampling policies and learns to align them with human preferences through ranking loss. RRHF can efficiently align language model output probabilities with human preferences as robust as fine-tuning and it only needs 1 to 2 models during tuning. In addition, RRHF can be considered an extension of SFT and reward models while being simpler than PPO in terms of coding, model counts, and hyperparameters. The entire alignment process can be accomplished within a single RRHF training session. We evaluate RRHF using LLaMA and Alpaca on Helpful and Harmless data, demonstrating performance comparable to PPO.Comment: Codes available at https://github.com/GanjinZero/RRH

    Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models

    Full text link
    The complementary potential of Large Language Models (LLM) assumes off-the-shelf LLMs have heterogeneous expertise in a wide range of domains and tasks so that an ensemble of LLMs can achieve consistently better performance. Existing ensemble methods for LLMs mainly focus on reward model ranking of outputs, leading to significant computation overhead. To combat this issue, we revisit the complementary potential of LLMs and further elaborate it by mining latent expertise with off-the-shelf reward models. We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function, which can precisely distribute each query to the LLM with expertise about it. We also integrate a tag-based label enhancement to mitigate noise from uncertainty when using rewards as silver supervision. Zooter shows computation efficiency in inference as it introduces only a minor computation overhead of a routing function compared with reward model ranking methods. We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks. Zooter outperforms the best single model on average and ranks first on 44% of tasks, even surpassing multiple reward model ranking methods
    • …
    corecore