22 research outputs found
Volterra type McKean-Vlasov SDEs with singular kernels: Well-posedness, Propagation of Chaos and Euler schemes
In this paper, our work is devoted to studying Volterra type McKean-Vlasov
stochastic differential equations with singular kernels. Firstly, the
well-posedness of Volterra type McKean-Vlasov stochastic differential equations
are established. And then propagation of chaos is proved with explicit estimate
of the convergence rate. Finally, We also propose an explicit Euler scheme for
an interacting particle system associated with the Volterra type McKean-Vlasov
equation.Comment: 21 page
True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning
Despite the impressive performance across numerous tasks, large language
models (LLMs) often fail in solving simple decision-making tasks due to the
misalignment of the knowledge in LLMs with environments. On the contrary,
reinforcement learning (RL) agents learn policies from scratch, which makes
them always align with environments but difficult to incorporate prior
knowledge for efficient explorations. To narrow the gap, we propose TWOSOME, a
novel general online framework that deploys LLMs as decision-making agents to
efficiently interact and align with embodied environments via RL without
requiring any prepared datasets or prior knowledge of the environments.
Firstly, we query the joint probabilities of each valid action with LLMs to
form behavior policies. Then, to enhance the stability and robustness of the
policies, we propose two normalization methods and summarize four prompt design
principles. Finally, we design a novel parameter-efficient training
architecture where the actor and critic share one frozen LLM equipped with
low-rank adapters (LoRA) updated by PPO. We conduct extensive experiments to
evaluate TWOSOME. i) TWOSOME exhibits significantly better sample efficiency
and performance compared to the conventional RL method, PPO, and prompt tuning
method, SayCan, in both classical decision-making environment, Overcooked, and
simulated household environment, VirtualHome. ii) Benefiting from LLMs'
open-vocabulary feature, TWOSOME shows superior generalization ability to
unseen tasks. iii) Under our framework, there is no significant loss of the
LLMs' original ability during online PPO finetuning.Comment: Accepted by ICLR202
Adaptive Value Decomposition with Greedy Marginal Contribution Computation for Cooperative Multi-Agent Reinforcement Learning
Real-world cooperation often requires intensive coordination among agents
simultaneously. This task has been extensively studied within the framework of
cooperative multi-agent reinforcement learning (MARL), and value decomposition
methods are among those cutting-edge solutions. However, traditional methods
that learn the value function as a monotonic mixing of per-agent utilities
cannot solve the tasks with non-monotonic returns. This hinders their
application in generic scenarios. Recent methods tackle this problem from the
perspective of implicit credit assignment by learning value functions with
complete expressiveness or using additional structures to improve cooperation.
However, they are either difficult to learn due to large joint action spaces or
insufficient to capture the complicated interactions among agents which are
essential to solving tasks with non-monotonic returns. To address these
problems, we propose a novel explicit credit assignment method to address the
non-monotonic problem. Our method, Adaptive Value decomposition with Greedy
Marginal contribution (AVGM), is based on an adaptive value decomposition that
learns the cooperative value of a group of dynamically changing agents. We
first illustrate that the proposed value decomposition can consider the
complicated interactions among agents and is feasible to learn in large-scale
scenarios. Then, our method uses a greedy marginal contribution computed from
the value decomposition as an individual credit to incentivize agents to learn
the optimal cooperative policy. We further extend the module with an action
encoder to guarantee the linear time complexity for computing the greedy
marginal contribution. Experimental results demonstrate that our method
achieves significant performance improvements in several non-monotonic domains.Comment: This paper is accepted by aamas 202
Satisfactory orthogonal array and its checking method
An orthogonal array (OA) is said to be a satisfactory orthogonal array if it is impossible to obtain another OA from it by adding one or more columns. By exploring the relationship between OAs and orthogonal decompositions of projection matrices, we present a method of checking a satisfactory OA.Projection matrix Satisfactory orthogonal array
Further results on the orthogonal arrays obtained by generalized Hadamard product
By combining generalized Hadamard product with difference matrix and exploring the relationship between orthogonal arrays and decomposition of projection matrix, we furthermore develop the method in Zhang et al. (Discrete Math. 238 (2001) 151). As an application of it, some new orthogonal arrays of run size 100 are constructed.Mixed-level orthogonal array Generalized Hadamard product Difference matrix Projection matrix
Normal mixed difference matrix and the construction of orthogonal arrays
We present the definitions of normal orthogonal array (OA) and normal mixed difference matrix and extend the mixed difference matrix method introduced by Wang (Statist. Probab. Lett. 28, 121). Some new mixed-level OAs are constructed through the generalized Kronecker sum of (nonorthogonal) mixed-level matrix and normal mixed difference matrices.Mixed orthogonal array Mixed difference matrix Normal mixed difference matrix Generalized Kronecker sum
A note on orthogonal arrays obtained by orthogonal decomposition of projection matrices
A method of constructing mixed-level orthogonal arrays is presented by Zhang et al. (Statist. Sinica 2, 595). It is somewhat difficult to use this method to obtain new orthogonal arrays. This paper illustrates with examples the application of the method. Several classes of mixed-level orthogonal arrays are obtained.Mixed-level orthogonal array Projection matrix Permutation matrix
Measuring joint space-time accessibility in transit network under travel time uncertainty
In densely populated areas, joint activity participation such as shopping and eating outside represents a substantial portion of individuals\u27 daily activity-travel patterns. Individuals have several considerations in making joint activity choice behaviour such as space-time coordination, activity start time, activity location and joint activity duration. In a congested transit network, the travel time uncertainty significantly affects individuals\u27 activity and travel choice behaviour. To conduct some important joint activities, individuals normally have high expectations of on-time arrival. Thus, the measurement of space-time accessibility in transportation networks should be extended to consider joint activity choices and travel time uncertainty. In this paper, a new method for measuring space-time accessibility is proposed for individuals\u27 joint activities in transit network with consideration of travel time uncertainty. A reliable alighting location model is proposed to identify all feasible alighting locations in transit network to perform joint activities by explicitly considering constraints of on-time arrival probabilities. Individuals’ joint space-time accessibility (JSTA) is measured based on reliable alighting locations, travel time budget, and points of interests. Massive smart card data from metro network in Nanjing, China are used to show the merits of the proposed JSTA measure. The results show that there is a significant difference between the space-time accessibility to independent activity and that to joint activity under various on-time arrival probabilities. The effects of spatial (de)concentration of anchor locations and minimum joint activity duration on JSTA, and the role of additional participants are also presented in various scenarios. The proposed JSTA measure can be used to help policy makers to evaluate transit network development and land use planning in urban areas