704 research outputs found
Client Selection for Federated Policy Optimization with Environment Heterogeneity
The development of Policy Iteration (PI) has inspired many recent algorithms
for Reinforcement Learning (RL), including several policy gradient methods that
gained both theoretical soundness and empirical success on a variety of tasks.
The theory of PI is rich in the context of centralized learning, but its study
under the federated setting is still in the infant stage. This paper
investigates the federated version of Approximate PI (API) and derives its
error bound, taking into account the approximation error introduced by
environment heterogeneity. We theoretically prove that a proper client
selection scheme can reduce this error bound. Based on the theoretical result,
we propose a client selection algorithm to alleviate the additional
approximation error caused by environment heterogeneity. Experiment results
show that the proposed algorithm outperforms other biased and unbiased client
selection methods on the federated mountain car problem and the Mujoco Hopper
problem by effectively selecting clients with a lower level of heterogeneity
from the population distribution
FedKL: Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence
As a distributed learning paradigm, Federated Learning (FL) faces the
communication bottleneck issue due to many rounds of model synchronization and
aggregation. Heterogeneous data further deteriorates the situation by causing
slow convergence. Although the impact of data heterogeneity on supervised FL
has been widely studied, the related investigation for Federated Reinforcement
Learning (FRL) is still in its infancy. In this paper, we first define the type
and level of data heterogeneity for policy gradient based FRL systems. By
inspecting the connection between the global and local objective functions, we
prove that local training can benefit the global objective, if the local update
is properly penalized by the total variation (TV) distance between the local
and global policies. A necessary condition for the global policy to be
learn-able from the local policy is also derived, which is directly related to
the heterogeneity level. Based on the theoretical result, a Kullback-Leibler
(KL) divergence based penalty is proposed, which, different from the
conventional method that penalizes the model divergence in the parameter space,
directly constrains the model outputs in the distribution space. By jointly
penalizing the divergence of the local policy from the global policy with a
global penalty and constraining each iteration of the local training with a
local penalty, the proposed method achieves a better trade-off between training
speed (step size) and convergence. Experiment results on two popular RL
experiment platforms demonstrate the advantage of the proposed algorithm over
existing methods in accelerating and stabilizing the training process with
heterogeneous data
Decentralized and Dynamic Home Health Care Resource Scheduling Using an Agent-Based Model
The purpose of this thesis is to design an agent-based scheduling system, simulated in a dynamic environment that will reduce home healthcare service costs. The study focuses on situations where a health care agency needs to assign home visits among a group of independent healthcare practitioners. Each practitioner has different skill sets, time constraints, and cost structures, given the nature, time and location of each home visit. Each expects reasonable payment commensurate with their skill levels as well as the costs incurred. The healthcare agency in turn needs all planned visits performed by qualified practitioners while minimizing overall service costs. Decisions about scheduling are made both before and during the scheduling period, requiring the health care agency to respond to unexpected situations based on the latest scheduling information.
This problem is examined in a multi-agent system environment where practitioners are modeled as self-interested agents. The study first analyzes the problem for insights into the combinatorial nature of such a problem occurring in a centralized environment, then discusses the decentralized and dynamic challenges. An iterated bidding mechanism is designed as the negotiation protocol for the system. The effectiveness of this system is evaluated through a computational study, with results showing the proposed multi-agent scheduling system is able to compute high quality schedules in the decentralized home healthcare environment. Following this, the system is also implemented in a simulation model that can accommodate unexpected situations. We presents different simulation scenarios which illustrate the process of how the system dynamically schedules incoming visits, and cost reduction can be observed from the results
Large Ecosystem Service Benefits of Assisted Natural Regeneration
China manages the largest monoculture plantations in the world, with 24% being Chinese fir plantations. Maximizing the ecosystem services of Chinese fir plantations has important implications in global carbon cycle and biodiversity protection. Assisted natural regeneration (ANR) is a practice to convert degraded lands into more productive forests with great ecosystems services. However, the quantitative understanding of ANR ecosystem service benefits is very limited. We conducted a comprehensive field manipulation experiment to evaluate the ANR potentials. We quantified and compared key ecosystem services including surface runoff, sediment yield, dissolved organic carbon export, plant diversity, and aboveground carbon accumulation of ANR of secondary forests dominated by Castanopsis carlesii to that of Chinese fir and C. carlesii plantations. Our results showed that ANR of C. carlesii forest reduced surface runoff and sediment yield up to 50% compared with other young plantations in the first 3 years and substantially increased plant diversity. ANR also reduced the export of dissolved organic carbon by 60–90% in the first 2 years. Aboveground biomass of the young ANR forest was approximately 3–4 times of that of other young plantations, while aboveground biomass of mature ANR forests was approximately 1.4 times of that of mature Chinese fir plantations of the same age. If all Chinese fir plantations in China were replaced by ANR forests, potentially 0.7 Pg more carbon will be stored in aboveground in one rotation (25 years). The results indicate that ANR triggers positive feedbacks among soil and water conservation, biodiversity protection, and biomass accumulation and thereby enhances ecosystem services
Multi-granularity Item-based Contrastive Recommendation
Contrastive learning (CL) has shown its power in recommendation. However,
most CL-based recommendation models build their CL tasks merely focusing on the
user's aspects, ignoring the rich diverse information in items. In this work,
we propose a novel Multi-granularity item-based contrastive learning (MicRec)
framework for the matching stage (i.e., candidate generation) in
recommendation, which systematically introduces multi-aspect item-related
information to representation learning with CL. Specifically, we build three
item-based CL tasks as a set of plug-and-play auxiliary objectives to capture
item correlations in feature, semantic and session levels. The feature-level
item CL aims to learn the fine-grained feature-level item correlations via
items and their augmentations. The semantic-level item CL focuses on the
coarse-grained semantic correlations between semantically related items. The
session-level item CL highlights the global behavioral correlations of items
from users' sequential behaviors in all sessions. In experiments, we conduct
both offline and online evaluations on real-world datasets, verifying the
effectiveness and universality of three proposed CL tasks. Currently, MicRec
has been deployed on a real-world recommender system, affecting millions of
users. The source code will be released in the future.Comment: 17 pages, under revie
2,2,4,4-Tetraphenyl-1,3-bis(3,3,5,5-tetramethyl-1,1-diphenyl-5-vinyltrisiloxan-1-yl)cyclodisilazane
The title molecule, C60H70N2O4Si8, lies on an inversion center. In the asymmetric unit, one of the phenyl rings is disordered over two sets of sites with refined occupancies 0.58 (2) and 0.42 (2). In addition, in two substitution sites of the terminal dimethyl(vinyl)silyl unit, a methyl group and the vinyl group are disordered over the same site with refined occupancies 0.523 (13) and 0.477 (13)
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Aligning large language models (LLMs) with human preference has recently
gained tremendous attention, with the canonical yet costly RLHF-PPO and the
simple and straightforward Direct Preference Optimization (DPO) as two
examples. Despite the efficiency, DPO has rarely be used in the
state-of-the-art production-level LLMs, implying its potential pathologies. In
this work, we revisit DPO with a comprehensive examination of its empirical
efficacy and a systematic comparison with RLHF-PPO. We identify the
\textbf{3D}-properties of DPO's learning outcomes: the \textbf{D}rastic drop in
the likelihood of rejected responses, the \textbf{D}egradation into LLM
unlearning, and the \textbf{D}ispersion effect on unseen responses through
experiments with both a carefully designed toy model and practical LLMs on
tasks including mathematical problem-solving and instruction following. These
findings inherently connect to some observations made by related works and we
additionally contribute a plausible theoretical explanation for them.
Accordingly, we propose easy regularization methods to mitigate the issues
caused by \textbf{3D}-properties, improving the training stability and final
performance of DPO. Our contributions also include an investigation into how
the distribution of the paired preference data impacts the effectiveness of
DPO. We hope this work could offer research directions to narrow the gap
between reward-free preference learning methods and reward-based ones
- …
