20 research outputs found

    Competing for Shareable Arms in Multi-Player Multi-Armed Bandits

    Full text link
    Competitions for shareable and limited resources have long been studied with strategic agents. In reality, agents often have to learn and maximize the rewards of the resources at the same time. To design an individualized competing policy, we model the competition between agents in a novel multi-player multi-armed bandit (MPMAB) setting where players are selfish and aim to maximize their own rewards. In addition, when several players pull the same arm, we assume that these players averagely share the arms' rewards by expectation. Under this setting, we first analyze the Nash equilibrium when arms' rewards are known. Subsequently, we propose a novel SelfishMPMAB with Averaging Allocation (SMAA) approach based on the equilibrium. We theoretically demonstrate that SMAA could achieve a good regret guarantee for each player when all players follow the algorithm. Additionally, we establish that no single selfish player can significantly increase their rewards through deviation, nor can they detrimentally affect other players' rewards without incurring substantial losses for themselves. We finally validate the effectiveness of the method in extensive synthetic experiments.Comment: ICML 202

    Rethinking the Evaluation Protocol of Domain Generalization

    Full text link
    Domain generalization aims to solve the challenge of Out-of-Distribution (OOD) generalization by leveraging common knowledge learned from multiple training domains to generalize to unseen test domains. To accurately evaluate the OOD generalization ability, it is necessary to ensure that test data information is unavailable. However, the current domain generalization protocol may still have potential test data information leakage. This paper examines the potential risks of test data information leakage in two aspects of the current protocol: pretraining on ImageNet and oracle model selection. We propose that training from scratch and using multiple test domains would result in a more precise evaluation of OOD generalization ability. We also rerun the algorithms with the modified protocol and introduce a new leaderboard to encourage future research in domain generalization with a fairer comparison

    Flatness-Aware Minimization for Domain Generalization

    Full text link
    Domain generalization (DG) seeks to learn robust models that generalize well under unknown distribution shifts. As a critical aspect of DG, optimizer selection has not been explored in depth. Currently, most DG methods follow the widely used benchmark, DomainBed, and utilize Adam as the default optimizer for all datasets. However, we reveal that Adam is not necessarily the optimal choice for the majority of current DG methods and datasets. Based on the perspective of loss landscape flatness, we propose a novel approach, Flatness-Aware Minimization for Domain Generalization (FAD), which can efficiently optimize both zeroth-order and first-order flatness simultaneously for DG. We provide theoretical analyses of the FAD's out-of-distribution (OOD) generalization error and convergence. Our experimental results demonstrate the superiority of FAD on various DG datasets. Additionally, we confirm that FAD is capable of discovering flatter optima in comparison to other zeroth-order and first-order flatness-aware optimization methods.Comment: Accepted by ICCV202

    TouchStone: Evaluating Vision-Language Models by Language Models

    Full text link
    Large vision-language models (LVLMs) have recently witnessed rapid advancements, exhibiting a remarkable capacity for perceiving, understanding, and processing visual information by connecting visual receptor with large language models (LLMs). However, current assessments mainly focus on recognizing and reasoning abilities, lacking direct evaluation of conversational skills and neglecting visual storytelling abilities. In this paper, we propose an evaluation method that uses strong LLMs as judges to comprehensively evaluate the various abilities of LVLMs. Firstly, we construct a comprehensive visual dialogue dataset TouchStone, consisting of open-world images and questions, covering five major categories of abilities and 27 subtasks. This dataset not only covers fundamental recognition and comprehension but also extends to literary creation. Secondly, by integrating detailed image annotations we effectively transform the multimodal input content into a form understandable by LLMs. This enables us to employ advanced LLMs for directly evaluating the quality of the multimodal dialogue without requiring human intervention. Through validation, we demonstrate that powerful LVLMs, such as GPT-4, can effectively score dialogue quality by leveraging their textual capabilities alone, aligning with human preferences. We hope our work can serve as a touchstone for LVLMs' evaluation and pave the way for building stronger LVLMs. The evaluation code is available at https://github.com/OFA-Sys/TouchStone.Comment: https://github.com/OFA-Sys/TouchSton

    On the Out-Of-Distribution Generalization of Multimodal Large Language Models

    Full text link
    We investigate the generalization boundaries of current Multimodal Large Language Models (MLLMs) via comprehensive evaluation under out-of-distribution scenarios and domain-specific tasks. We evaluate their zero-shot generalization across synthetic images, real-world distributional shifts, and specialized datasets like medical and molecular imagery. Empirical results indicate that MLLMs struggle with generalization beyond common training domains, limiting their direct application without adaptation. To understand the cause of unreliable performance, we analyze three hypotheses: semantic misinterpretation, visual feature extraction insufficiency, and mapping deficiency. Results identify mapping deficiency as the primary hurdle. To address this problem, we show that in-context learning (ICL) can significantly enhance MLLMs' generalization, opening new avenues for overcoming generalization barriers. We further explore the robustness of ICL under distribution shifts and show its vulnerability to domain shifts, label shifts, and spurious correlation shifts between in-context examples and test data

    Serum Creatinine Level: A Supplemental Index to Distinguish Duchenne Muscular Dystrophy from Becker Muscular Dystrophy

    Get PDF
    Background. To improve assessment of dystrophinopathy, the aim of this study was to identify whether serum creatinine (Crn) level reflects disease severity. Methods. Biochemical, Vignos score, and genetic data were collected on 212 boys with dystrophinopathy. Results. Serum Crn level had a strong inverse correlation with Vignos score by simple correlation ( = −0.793) and partial correlation analysis after adjustment for age, height, and weight ( = −0.791; both < 0.01). Serum Crn level was significantly higher in patients with in-frame than out-of-frame mutations ( = −4.716, < 0.01) and in Becker muscular dystrophy (BMD) patients than Duchenne muscular dystrophy (DMD) patients at ages 4, 5, 7, and 9 yr (all < 0.0125). After adjusting for age, height, and weight, BMD patients still had a significantly higher serum Crn level than DMD patients ( = 7.140, = 6.277, < 0.01). Conclusions. Serum Crn level reflected disease severity and may serve as a supplemental index to distinguish DMD from BMD in clinical practice

    Single Fasting Plasma Glucose Versus 75-g Oral Glucose-Tolerance Test in Prediction of Adverse Perinatal Outcomes::A Cohort Study

    Get PDF
    Background: There remains uncertainty regarding whether a single fasting glucose measurement is sufficient to predict risk of adverse perinatal outcomes. Methods: We included 12,594 pregnant women who underwent a 75-g oral glucose-tolerance test (OGTT) at 22–28 weeks' gestation in the Born in Guangzhou Cohort Study, China. Outcomes were large for gestational age (LGA) baby, cesarean section, and spontaneous preterm birth. We calculated the area under the receiver operator characteristic curves (AUCs) to assess the capacity of OGTT glucose values to predict adverse outcomes, and compared the AUCs of different components of OGTT. Results: 1325 women had a LGA baby (10.5%). Glucose measurements were linearly associated with LGA, with strongest associations for fasting glucose (odds ratio 1.37, 95% confidence interval 1.30–1.45). Weaker associations were observed for cesarean section and spontaneous preterm birth. Fasting glucose have a comparable discriminative power for prediction of LGA to the combination of fasting, 1 h, and 2 h glucose values during OGTT (AUCs, 0.611 vs. 0.614, P = 0.166). The LGA risk was consistently increased in women with abnormal fasting glucose (≥5.1 mmol/l), irrespective of 1 h or 2 h glucose levels. Conclusions: A single fasting glucose measurement performs comparably to 75-g OGTT in predicting risk of having a LGA baby

    Qwen Technical Report

    Full text link
    Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Qwen, the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. The base language models consistently demonstrate superior performance across a multitude of downstream tasks, and the chat models, particularly those trained using Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The chat models possess advanced tool-use and planning capabilities for creating agent applications, showcasing impressive performance even when compared to bigger models on complex tasks like utilizing a code interpreter. Furthermore, we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat, which are built upon base language models. These models demonstrate significantly improved performance in comparison with open-source models, and slightly fall behind the proprietary models.Comment: 59 pages, 5 figure

    Response of methane production via propionate oxidation to carboxylated multiwalled carbon nanotubes in paddy soil enrichments

    No full text
    Carboxylated multiwalled carbon nanotubes (MWCNTs-COOH) have become a growing concern in terms of their fate and toxicity in aqueous environments. Methane (CH4) is a major product of organic matter degradation in waterlogged environments. In this study, we determined the effect of MWCNTs-COOH on the production of CH4 from propionate oxidation in paddy soil enrichments. The results showed that the methanogenesis from propionate degradation was accelerated in the presence of MWCNTs-COOH. In addition, the rates of CH4 production and propionate degradation increased with increasing concentrations of MWCNTs-COOH. Scanning electron microscopy (SEM) observations showed that the cells were intact and maintained their structure in the presence of MWCNTs-COOH. In addition, SEM and fluorescence in situ hybridization (FISH) images revealed that the cells were in direct contact with the MWCNTs and formed cell-MWCNTs aggregates that contained both bacteria and archaea. On the other hand, nontoxic magnetite nanoparticles (Fe3O4) had similar effects on the CH4 production and cell integrity as the MWCNTs-COOH. Compared with no nanomaterial addition, the relative abundances of Geobacter and Methanosarcina species increased in the presence of MWCNTs-COOH. This study suggests that MWCNTs-COOH exerted positive rather than cytotoxic effects on the syntrophic oxidation of propionate in paddy soil enrichments and affected the bacterial and archaeal community structure at the test concentrations. These findings provide novel insight into the consequences of nanomaterial release into anoxic natural environments
    corecore