128 research outputs found

    Aligning Language Models with Human Preferences via a Bayesian Approach

    Full text link
    In the quest to advance human-centric natural language generation (NLG) systems, ensuring alignment between NLG models and human preferences is crucial. For this alignment, current popular methods leverage a reinforcement learning (RL) approach with a reward model trained on feedback from humans. However, inherent disagreements due to the subjective nature of human preferences pose a significant challenge for training the reward model, resulting in a deterioration of the NLG performance. To tackle this issue, previous approaches typically rely on majority voting or averaging to consolidate multiple inconsistent preferences into a merged one. Although straightforward to understand and execute, such methods suffer from an inability to capture the nuanced degrees of disaggregation among humans and may only represent a specialized subset of individuals, thereby lacking the ability to quantitatively disclose the universality of human preferences. To address this challenge, this paper proposes a novel approach, which employs a Bayesian framework to account for the distribution of disagreements among human preferences as training a preference model, and names it as d-PM. Besides, considering the RL strategy's inefficient and complex training process over the training efficiency, we further propose utilizing the contrastive learning strategy to train the NLG model with the preference scores derived from the d-PM model. Extensive experiments on two human-centric NLG tasks, i.e., emotional support conversation and integrity "Rule-of-Thumb" generation, show that our method consistently exceeds previous SOTA models in both automatic and human evaluations.Comment: NeurIPS 202

    On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets

    Full text link
    Different distribution shifts require different algorithmic and operational interventions. Methodological research must be grounded by the specific shifts they address. Although nascent benchmarks provide a promising empirical foundation, they implicitly focus on covariate shifts, and the validity of empirical findings depends on the type of shift, e.g., previous observations on algorithmic performance can fail to be valid when the Y∣XY|X distribution changes. We conduct a thorough investigation of natural shifts in 5 tabular datasets over 86,000 model configurations, and find that Y∣XY|X-shifts are most prevalent. To encourage researchers to develop a refined language for distribution shifts, we build WhyShift, an empirical testbed of curated real-world shifts where we characterize the type of shift we benchmark performance over. Since Y∣XY|X-shifts are prevalent in tabular settings, we identify covariate regions that suffer the biggest Y∣XY|X-shifts and discuss implications for algorithmic and data-based interventions. Our testbed highlights the importance of future research that builds an understanding of how distributions differ.Comment: 41 page

    Long-Term Rhythmic Video Soundtracker

    Full text link
    We consider the problem of generating musical soundtracks in sync with rhythmic visual cues. Most existing works rely on pre-defined music representations, leading to the incompetence of generative flexibility and complexity. Other methods directly generating video-conditioned waveforms suffer from limited scenarios, short lengths, and unstable generation quality. To this end, we present Long-Term Rhythmic Video Soundtracker (LORIS), a novel framework to synthesize long-term conditional waveforms. Specifically, our framework consists of a latent conditional diffusion probabilistic model to perform waveform synthesis. Furthermore, a series of context-aware conditioning encoders are proposed to take temporal information into consideration for a long-term generation. Notably, we extend our model's applicability from dances to multiple sports scenarios such as floor exercise and figure skating. To perform comprehensive evaluations, we establish a benchmark for rhythmic video soundtracks including the pre-processed dataset, improved evaluation metrics, and robust generative baselines. Extensive experiments show that our model generates long-term soundtracks with state-of-the-art musical quality and rhythmic correspondence. Codes are available at \url{https://github.com/OpenGVLab/LORIS}.Comment: ICML202

    Influence of uniform currents on nonlinear characteristics of double-wave-group focusing

    Get PDF
    Current is considered to be a crucial environmental factor in producing extreme waves. The study of nonlinear characteristics in wave–current interactions has been explored, but the role of currents in the more complex interaction processes of double-wave-group focusing is not yet known. Based on our previous research about the nonlinear interactions between wave groups, the impact of uniform current on nonlinear characteristics of double-wave-group focusing is to be investigated in this paper. A fully nonlinear numerical model using the high-order spectral method is developed to simulate various currents interacting with focused bimodal waves. Three ranges of variation exist: strongly opposing current, weakly opposing current, and following current. Unlike the conclusion in the unimodal waves, the asymmetries of the wave crest and that of the wave envelope influenced by currents are not synchronous, which is explained by the changes in the asymmetry of the secondary crests received energy from the currents, in addition to those of the magnitude of the maximum crest and the adjacent secondary crests. When opposing currents enhance to a certain level, a dynamic equilibrium between the energy of waves and currents would be achieved, in which the proportion of the linear components to their own is almost equivalent to that in the non-current state, revealing that the majority of nonlinearity generated by wave–current interaction is blocked at that time. These findings can promote an understanding of nonlinear characteristics due to wave–current interactions

    Evaluations of 5-fluorourcil treated lung cancer cells by atomic force microscopy

    Get PDF
    Atomic force microscopy (AFM) can be used to obtain the physical information of single live cancer cells; however, the physical changes in live cells with time based on AFM remain to be studied, which play a key role in the evaluation of the efficacy and side effects of drugs. Herein, the treatment of the A549 cell line with the anticarcinogen 5-fluorouracil has been discussed based on the AFM analysis of their continuous physical changes, including their surface morphology, height, adhesion and Young's modulus, with time. In comparison, the African green monkey kidney (Vero) cell line was tested as normal cells to determine the side effects of 5-fluorouracil. The results show that the optimal concentration of 5-fluorouracil is about 500 μM, which presents the best anticancer effect and mild side effects

    Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning

    Full text link
    Providing Emotional Support (ES) to soothe people in emotional distress is an essential capability in social interactions. Most existing researches on building ES conversation systems only considered single-turn interactions with users, which was over-simplified. In comparison, multi-turn ES conversation systems can provide ES more effectively, but face several new technical challenges, including: (1) how to adopt appropriate support strategies to achieve the long-term dialogue goal of comforting the user's emotion; (2) how to dynamically model the user's state. In this paper, we propose a novel system MultiESC to address these issues. For strategy planning, drawing inspiration from the A* search algorithm, we propose lookahead heuristics to estimate the future user feedback after using particular strategies, which helps to select strategies that can lead to the best long-term effects. For user state modeling, MultiESC focuses on capturing users' subtle emotional expressions and understanding their emotion causes. Extensive experiments show that MultiESC significantly outperforms competitive baselines in both dialogue generation and strategy planning. Our codes are available at https://github.com/lwgkzl/MultiESC.Comment: Accepted by the main conference of EMNLP 202

    InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

    Full text link
    This paper introduces InternVid, a large-scale video-centric multimodal dataset that enables learning powerful and transferable video-text representations for multimodal understanding and generation. The InternVid dataset contains over 7 million videos lasting nearly 760K hours, yielding 234M video clips accompanied by detailed descriptions of total 4.1B words. Our core contribution is to develop a scalable approach to autonomously build a high-quality video-text dataset with large language models (LLM), thereby showcasing its efficacy in learning video-language representation at scale. Specifically, we utilize a multi-scale approach to generate video-related descriptions. Furthermore, we introduce ViCLIP, a video-text representation learning model based on ViT-L. Learned on InternVid via contrastive learning, this model demonstrates leading zero-shot action recognition and competitive video retrieval performance. Beyond basic video understanding tasks like recognition and retrieval, our dataset and model have broad applications. They are particularly beneficial for generating interleaved video-text data for learning a video-centric dialogue system, advancing video-to-text and text-to-video generation research. These proposed resources provide a tool for researchers and practitioners interested in multimodal video understanding and generation.Comment: Data and Code: https://github.com/OpenGVLab/InternVideo/tree/main/Data/InternVi

    Low carbon transition of global power sector enhances sustainable development goals

    Get PDF
    Low-carbon power transition, key to combatting climate change, brings far-reaching effects on achieving Sustainable Development Goals (SDGs), in terms of resources use, environmental emissions, employment, and many more. Here we assessed the potential impacts of power transition on 49 regional multiple SDGs progress under three different climate scenarios. We found that power transition could increase global SDG index score from 72.36 in 2015 to 74.38 in 2040 under the 1.5℃ scenario, compared with 70.55 and 71.44 under ‘Coal-dependent’ and ‘Middle of the road’ scenario, respectively. The power transition related global SDG progress would mainly come from switching to renewables in developing economies. Power transition also improves the overall SDG in most developed economies under all scenarios, while undermining their employment-related SDG progress. The global SDG progress would be jeopardized by power transition related international trade changes under ‘Coal-dependent’ and ‘Middle of the road’ scenario, while improved under the 1.5℃ scenario.<br/

    VBench: Comprehensive Benchmark Suite for Video Generative Models

    Full text link
    Video generation has witnessed significant advancements, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal evaluation system should provide insights to inform future developments of video generation. To this end, we present VBench, a comprehensive benchmark suite that dissects "video generation quality" into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods. VBench has three appealing properties: 1) Comprehensive Dimensions: VBench comprises 16 dimensions in video generation (e.g., subject identity inconsistency, motion smoothness, temporal flickering, and spatial relationship, etc). The evaluation metrics with fine-grained levels reveal individual models' strengths and weaknesses. 2) Human Alignment: We also provide a dataset of human preference annotations to validate our benchmarks' alignment with human perception, for each evaluation dimension respectively. 3) Valuable Insights: We look into current models' ability across various evaluation dimensions, and various content types. We also investigate the gaps between video and image generation models. We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.Comment: Equal contributions from first four authors. Project page: https://vchitect.github.io/VBench-project/ Code: https://github.com/Vchitect/VBenc
    • …
    corecore