128 research outputs found
Aligning Language Models with Human Preferences via a Bayesian Approach
In the quest to advance human-centric natural language generation (NLG)
systems, ensuring alignment between NLG models and human preferences is
crucial. For this alignment, current popular methods leverage a reinforcement
learning (RL) approach with a reward model trained on feedback from humans.
However, inherent disagreements due to the subjective nature of human
preferences pose a significant challenge for training the reward model,
resulting in a deterioration of the NLG performance. To tackle this issue,
previous approaches typically rely on majority voting or averaging to
consolidate multiple inconsistent preferences into a merged one. Although
straightforward to understand and execute, such methods suffer from an
inability to capture the nuanced degrees of disaggregation among humans and may
only represent a specialized subset of individuals, thereby lacking the ability
to quantitatively disclose the universality of human preferences. To address
this challenge, this paper proposes a novel approach, which employs a Bayesian
framework to account for the distribution of disagreements among human
preferences as training a preference model, and names it as d-PM. Besides,
considering the RL strategy's inefficient and complex training process over the
training efficiency, we further propose utilizing the contrastive learning
strategy to train the NLG model with the preference scores derived from the
d-PM model. Extensive experiments on two human-centric NLG tasks, i.e.,
emotional support conversation and integrity "Rule-of-Thumb" generation, show
that our method consistently exceeds previous SOTA models in both automatic and
human evaluations.Comment: NeurIPS 202
On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets
Different distribution shifts require different algorithmic and operational
interventions. Methodological research must be grounded by the specific shifts
they address. Although nascent benchmarks provide a promising empirical
foundation, they implicitly focus on covariate shifts, and the validity of
empirical findings depends on the type of shift, e.g., previous observations on
algorithmic performance can fail to be valid when the distribution
changes. We conduct a thorough investigation of natural shifts in 5 tabular
datasets over 86,000 model configurations, and find that -shifts are most
prevalent. To encourage researchers to develop a refined language for
distribution shifts, we build WhyShift, an empirical testbed of curated
real-world shifts where we characterize the type of shift we benchmark
performance over. Since -shifts are prevalent in tabular settings, we
identify covariate regions that suffer the biggest -shifts and discuss
implications for algorithmic and data-based interventions. Our testbed
highlights the importance of future research that builds an understanding of
how distributions differ.Comment: 41 page
Long-Term Rhythmic Video Soundtracker
We consider the problem of generating musical soundtracks in sync with
rhythmic visual cues. Most existing works rely on pre-defined music
representations, leading to the incompetence of generative flexibility and
complexity. Other methods directly generating video-conditioned waveforms
suffer from limited scenarios, short lengths, and unstable generation quality.
To this end, we present Long-Term Rhythmic Video Soundtracker (LORIS), a novel
framework to synthesize long-term conditional waveforms. Specifically, our
framework consists of a latent conditional diffusion probabilistic model to
perform waveform synthesis. Furthermore, a series of context-aware conditioning
encoders are proposed to take temporal information into consideration for a
long-term generation. Notably, we extend our model's applicability from dances
to multiple sports scenarios such as floor exercise and figure skating. To
perform comprehensive evaluations, we establish a benchmark for rhythmic video
soundtracks including the pre-processed dataset, improved evaluation metrics,
and robust generative baselines. Extensive experiments show that our model
generates long-term soundtracks with state-of-the-art musical quality and
rhythmic correspondence. Codes are available at
\url{https://github.com/OpenGVLab/LORIS}.Comment: ICML202
Influence of uniform currents on nonlinear characteristics of double-wave-group focusing
Current is considered to be a crucial environmental factor in producing extreme waves. The study of nonlinear characteristics in wave–current interactions has been explored, but the role of currents in the more complex interaction processes of double-wave-group focusing is not yet known. Based on our previous research about the nonlinear interactions between wave groups, the impact of uniform current on nonlinear characteristics of double-wave-group focusing is to be investigated in this paper. A fully nonlinear numerical model using the high-order spectral method is developed to simulate various currents interacting with focused bimodal waves. Three ranges of variation exist: strongly opposing current, weakly opposing current, and following current. Unlike the conclusion in the unimodal waves, the asymmetries of the wave crest and that of the wave envelope influenced by currents are not synchronous, which is explained by the changes in the asymmetry of the secondary crests received energy from the currents, in addition to those of the magnitude of the maximum crest and the adjacent secondary crests. When opposing currents enhance to a certain level, a dynamic equilibrium between the energy of waves and currents would be achieved, in which the proportion of the linear components to their own is almost equivalent to that in the non-current state, revealing that the majority of nonlinearity generated by wave–current interaction is blocked at that time. These findings can promote an understanding of nonlinear characteristics due to wave–current interactions
Evaluations of 5-fluorourcil treated lung cancer cells by atomic force microscopy
Atomic force microscopy (AFM) can be used to obtain the physical information of single live cancer cells; however, the physical changes in live cells with time based on AFM remain to be studied, which play a key role in the evaluation of the efficacy and side effects of drugs. Herein, the treatment of the A549 cell line with the anticarcinogen 5-fluorouracil has been discussed based on the AFM analysis of their continuous physical changes, including their surface morphology, height, adhesion and Young's modulus, with time. In comparison, the African green monkey kidney (Vero) cell line was tested as normal cells to determine the side effects of 5-fluorouracil. The results show that the optimal concentration of 5-fluorouracil is about 500 μM, which presents the best anticancer effect and mild side effects
Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning
Providing Emotional Support (ES) to soothe people in emotional distress is an
essential capability in social interactions. Most existing researches on
building ES conversation systems only considered single-turn interactions with
users, which was over-simplified. In comparison, multi-turn ES conversation
systems can provide ES more effectively, but face several new technical
challenges, including: (1) how to adopt appropriate support strategies to
achieve the long-term dialogue goal of comforting the user's emotion; (2) how
to dynamically model the user's state. In this paper, we propose a novel system
MultiESC to address these issues. For strategy planning, drawing inspiration
from the A* search algorithm, we propose lookahead heuristics to estimate the
future user feedback after using particular strategies, which helps to select
strategies that can lead to the best long-term effects. For user state
modeling, MultiESC focuses on capturing users' subtle emotional expressions and
understanding their emotion causes. Extensive experiments show that MultiESC
significantly outperforms competitive baselines in both dialogue generation and
strategy planning. Our codes are available at
https://github.com/lwgkzl/MultiESC.Comment: Accepted by the main conference of EMNLP 202
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
This paper introduces InternVid, a large-scale video-centric multimodal
dataset that enables learning powerful and transferable video-text
representations for multimodal understanding and generation. The InternVid
dataset contains over 7 million videos lasting nearly 760K hours, yielding 234M
video clips accompanied by detailed descriptions of total 4.1B words. Our core
contribution is to develop a scalable approach to autonomously build a
high-quality video-text dataset with large language models (LLM), thereby
showcasing its efficacy in learning video-language representation at scale.
Specifically, we utilize a multi-scale approach to generate video-related
descriptions. Furthermore, we introduce ViCLIP, a video-text representation
learning model based on ViT-L. Learned on InternVid via contrastive learning,
this model demonstrates leading zero-shot action recognition and competitive
video retrieval performance. Beyond basic video understanding tasks like
recognition and retrieval, our dataset and model have broad applications. They
are particularly beneficial for generating interleaved video-text data for
learning a video-centric dialogue system, advancing video-to-text and
text-to-video generation research. These proposed resources provide a tool for
researchers and practitioners interested in multimodal video understanding and
generation.Comment: Data and Code:
https://github.com/OpenGVLab/InternVideo/tree/main/Data/InternVi
Low carbon transition of global power sector enhances sustainable development goals
Low-carbon power transition, key to combatting climate change, brings far-reaching effects on achieving Sustainable Development Goals (SDGs), in terms of resources use, environmental emissions, employment, and many more. Here we assessed the potential impacts of power transition on 49 regional multiple SDGs progress under three different climate scenarios. We found that power transition could increase global SDG index score from 72.36 in 2015 to 74.38 in 2040 under the 1.5℃ scenario, compared with 70.55 and 71.44 under ‘Coal-dependent’ and ‘Middle of the road’ scenario, respectively. The power transition related global SDG progress would mainly come from switching to renewables in developing economies. Power transition also improves the overall SDG in most developed economies under all scenarios, while undermining their employment-related SDG progress. The global SDG progress would be jeopardized by power transition related international trade changes under ‘Coal-dependent’ and ‘Middle of the road’ scenario, while improved under the 1.5℃ scenario.<br/
VBench: Comprehensive Benchmark Suite for Video Generative Models
Video generation has witnessed significant advancements, yet evaluating these
models remains a challenge. A comprehensive evaluation benchmark for video
generation is indispensable for two reasons: 1) Existing metrics do not fully
align with human perceptions; 2) An ideal evaluation system should provide
insights to inform future developments of video generation. To this end, we
present VBench, a comprehensive benchmark suite that dissects "video generation
quality" into specific, hierarchical, and disentangled dimensions, each with
tailored prompts and evaluation methods. VBench has three appealing properties:
1) Comprehensive Dimensions: VBench comprises 16 dimensions in video generation
(e.g., subject identity inconsistency, motion smoothness, temporal flickering,
and spatial relationship, etc). The evaluation metrics with fine-grained levels
reveal individual models' strengths and weaknesses. 2) Human Alignment: We also
provide a dataset of human preference annotations to validate our benchmarks'
alignment with human perception, for each evaluation dimension respectively. 3)
Valuable Insights: We look into current models' ability across various
evaluation dimensions, and various content types. We also investigate the gaps
between video and image generation models. We will open-source VBench,
including all prompts, evaluation methods, generated videos, and human
preference annotations, and also include more video generation models in VBench
to drive forward the field of video generation.Comment: Equal contributions from first four authors. Project page:
https://vchitect.github.io/VBench-project/ Code:
https://github.com/Vchitect/VBenc
- …