20 research outputs found
Competing for Shareable Arms in Multi-Player Multi-Armed Bandits
Competitions for shareable and limited resources have long been studied with
strategic agents. In reality, agents often have to learn and maximize the
rewards of the resources at the same time. To design an individualized
competing policy, we model the competition between agents in a novel
multi-player multi-armed bandit (MPMAB) setting where players are selfish and
aim to maximize their own rewards. In addition, when several players pull the
same arm, we assume that these players averagely share the arms' rewards by
expectation. Under this setting, we first analyze the Nash equilibrium when
arms' rewards are known. Subsequently, we propose a novel SelfishMPMAB with
Averaging Allocation (SMAA) approach based on the equilibrium. We theoretically
demonstrate that SMAA could achieve a good regret guarantee for each player
when all players follow the algorithm. Additionally, we establish that no
single selfish player can significantly increase their rewards through
deviation, nor can they detrimentally affect other players' rewards without
incurring substantial losses for themselves. We finally validate the
effectiveness of the method in extensive synthetic experiments.Comment: ICML 202
Rethinking the Evaluation Protocol of Domain Generalization
Domain generalization aims to solve the challenge of Out-of-Distribution
(OOD) generalization by leveraging common knowledge learned from multiple
training domains to generalize to unseen test domains. To accurately evaluate
the OOD generalization ability, it is necessary to ensure that test data
information is unavailable. However, the current domain generalization protocol
may still have potential test data information leakage. This paper examines the
potential risks of test data information leakage in two aspects of the current
protocol: pretraining on ImageNet and oracle model selection. We propose that
training from scratch and using multiple test domains would result in a more
precise evaluation of OOD generalization ability. We also rerun the algorithms
with the modified protocol and introduce a new leaderboard to encourage future
research in domain generalization with a fairer comparison
Flatness-Aware Minimization for Domain Generalization
Domain generalization (DG) seeks to learn robust models that generalize well
under unknown distribution shifts. As a critical aspect of DG, optimizer
selection has not been explored in depth. Currently, most DG methods follow the
widely used benchmark, DomainBed, and utilize Adam as the default optimizer for
all datasets. However, we reveal that Adam is not necessarily the optimal
choice for the majority of current DG methods and datasets. Based on the
perspective of loss landscape flatness, we propose a novel approach,
Flatness-Aware Minimization for Domain Generalization (FAD), which can
efficiently optimize both zeroth-order and first-order flatness simultaneously
for DG. We provide theoretical analyses of the FAD's out-of-distribution (OOD)
generalization error and convergence. Our experimental results demonstrate the
superiority of FAD on various DG datasets. Additionally, we confirm that FAD is
capable of discovering flatter optima in comparison to other zeroth-order and
first-order flatness-aware optimization methods.Comment: Accepted by ICCV202
TouchStone: Evaluating Vision-Language Models by Language Models
Large vision-language models (LVLMs) have recently witnessed rapid
advancements, exhibiting a remarkable capacity for perceiving, understanding,
and processing visual information by connecting visual receptor with large
language models (LLMs). However, current assessments mainly focus on
recognizing and reasoning abilities, lacking direct evaluation of
conversational skills and neglecting visual storytelling abilities. In this
paper, we propose an evaluation method that uses strong LLMs as judges to
comprehensively evaluate the various abilities of LVLMs. Firstly, we construct
a comprehensive visual dialogue dataset TouchStone, consisting of open-world
images and questions, covering five major categories of abilities and 27
subtasks. This dataset not only covers fundamental recognition and
comprehension but also extends to literary creation. Secondly, by integrating
detailed image annotations we effectively transform the multimodal input
content into a form understandable by LLMs. This enables us to employ advanced
LLMs for directly evaluating the quality of the multimodal dialogue without
requiring human intervention. Through validation, we demonstrate that powerful
LVLMs, such as GPT-4, can effectively score dialogue quality by leveraging
their textual capabilities alone, aligning with human preferences. We hope our
work can serve as a touchstone for LVLMs' evaluation and pave the way for
building stronger LVLMs. The evaluation code is available at
https://github.com/OFA-Sys/TouchStone.Comment: https://github.com/OFA-Sys/TouchSton
On the Out-Of-Distribution Generalization of Multimodal Large Language Models
We investigate the generalization boundaries of current Multimodal Large
Language Models (MLLMs) via comprehensive evaluation under out-of-distribution
scenarios and domain-specific tasks. We evaluate their zero-shot generalization
across synthetic images, real-world distributional shifts, and specialized
datasets like medical and molecular imagery. Empirical results indicate that
MLLMs struggle with generalization beyond common training domains, limiting
their direct application without adaptation. To understand the cause of
unreliable performance, we analyze three hypotheses: semantic
misinterpretation, visual feature extraction insufficiency, and mapping
deficiency. Results identify mapping deficiency as the primary hurdle. To
address this problem, we show that in-context learning (ICL) can significantly
enhance MLLMs' generalization, opening new avenues for overcoming
generalization barriers. We further explore the robustness of ICL under
distribution shifts and show its vulnerability to domain shifts, label shifts,
and spurious correlation shifts between in-context examples and test data
Serum Creatinine Level: A Supplemental Index to Distinguish Duchenne Muscular Dystrophy from Becker Muscular Dystrophy
Background. To improve assessment of dystrophinopathy, the aim of this study was to identify whether serum creatinine (Crn) level reflects disease severity. Methods. Biochemical, Vignos score, and genetic data were collected on 212 boys with dystrophinopathy. Results. Serum Crn level had a strong inverse correlation with Vignos score by simple correlation ( = −0.793) and partial correlation analysis after adjustment for age, height, and weight ( = −0.791; both < 0.01). Serum Crn level was significantly higher in patients with in-frame than out-of-frame mutations ( = −4.716, < 0.01) and in Becker muscular dystrophy (BMD) patients than Duchenne muscular dystrophy (DMD) patients at ages 4, 5, 7, and 9 yr (all < 0.0125). After adjusting for age, height, and weight, BMD patients still had a significantly higher serum Crn level than DMD patients ( = 7.140, = 6.277, < 0.01). Conclusions. Serum Crn level reflected disease severity and may serve as a supplemental index to distinguish DMD from BMD in clinical practice
Single Fasting Plasma Glucose Versus 75-g Oral Glucose-Tolerance Test in Prediction of Adverse Perinatal Outcomes::A Cohort Study
Background: There remains uncertainty regarding whether a single fasting glucose measurement is sufficient to predict risk of adverse perinatal outcomes.
Methods: We included 12,594 pregnant women who underwent a 75-g oral glucose-tolerance test (OGTT) at 22–28 weeks' gestation in the Born in Guangzhou Cohort Study, China. Outcomes were large for gestational age (LGA) baby, cesarean section, and spontaneous preterm birth. We calculated the area under the receiver operator characteristic curves (AUCs) to assess the capacity of OGTT glucose values to predict adverse outcomes, and compared the AUCs of different components of OGTT.
Results: 1325 women had a LGA baby (10.5%). Glucose measurements were linearly associated with LGA, with strongest associations for fasting glucose (odds ratio 1.37, 95% confidence interval 1.30–1.45). Weaker associations were observed for cesarean section and spontaneous preterm birth. Fasting glucose have a comparable discriminative power for prediction of LGA to the combination of fasting, 1 h, and 2 h glucose values during OGTT (AUCs, 0.611 vs. 0.614, P = 0.166). The LGA risk was consistently increased in women with abnormal fasting glucose (≥5.1 mmol/l), irrespective of 1 h or 2 h glucose levels.
Conclusions: A single fasting glucose measurement performs comparably to 75-g OGTT in predicting risk of having a LGA baby
Qwen Technical Report
Large language models (LLMs) have revolutionized the field of artificial
intelligence, enabling natural language processing tasks that were previously
thought to be exclusive to humans. In this work, we introduce Qwen, the first
installment of our large language model series. Qwen is a comprehensive
language model series that encompasses distinct models with varying parameter
counts. It includes Qwen, the base pretrained language models, and Qwen-Chat,
the chat models finetuned with human alignment techniques. The base language
models consistently demonstrate superior performance across a multitude of
downstream tasks, and the chat models, particularly those trained using
Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The
chat models possess advanced tool-use and planning capabilities for creating
agent applications, showcasing impressive performance even when compared to
bigger models on complex tasks like utilizing a code interpreter. Furthermore,
we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as
well as mathematics-focused models, Math-Qwen-Chat, which are built upon base
language models. These models demonstrate significantly improved performance in
comparison with open-source models, and slightly fall behind the proprietary
models.Comment: 59 pages, 5 figure
Response of methane production via propionate oxidation to carboxylated multiwalled carbon nanotubes in paddy soil enrichments
Carboxylated multiwalled carbon nanotubes (MWCNTs-COOH) have become a growing concern in terms of their fate and toxicity in aqueous environments. Methane (CH4) is a major product of organic matter degradation in waterlogged environments. In this study, we determined the effect of MWCNTs-COOH on the production of CH4 from propionate oxidation in paddy soil enrichments. The results showed that the methanogenesis from propionate degradation was accelerated in the presence of MWCNTs-COOH. In addition, the rates of CH4 production and propionate degradation increased with increasing concentrations of MWCNTs-COOH. Scanning electron microscopy (SEM) observations showed that the cells were intact and maintained their structure in the presence of MWCNTs-COOH. In addition, SEM and fluorescence in situ hybridization (FISH) images revealed that the cells were in direct contact with the MWCNTs and formed cell-MWCNTs aggregates that contained both bacteria and archaea. On the other hand, nontoxic magnetite nanoparticles (Fe3O4) had similar effects on the CH4 production and cell integrity as the MWCNTs-COOH. Compared with no nanomaterial addition, the relative abundances of Geobacter and Methanosarcina species increased in the presence of MWCNTs-COOH. This study suggests that MWCNTs-COOH exerted positive rather than cytotoxic effects on the syntrophic oxidation of propionate in paddy soil enrichments and affected the bacterial and archaeal community structure at the test concentrations. These findings provide novel insight into the consequences of nanomaterial release into anoxic natural environments