22 research outputs found

    Volterra type McKean-Vlasov SDEs with singular kernels: Well-posedness, Propagation of Chaos and Euler schemes

    Full text link
    In this paper, our work is devoted to studying Volterra type McKean-Vlasov stochastic differential equations with singular kernels. Firstly, the well-posedness of Volterra type McKean-Vlasov stochastic differential equations are established. And then propagation of chaos is proved with explicit estimate of the convergence rate. Finally, We also propose an explicit Euler scheme for an interacting particle system associated with the Volterra type McKean-Vlasov equation.Comment: 21 page

    True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning

    Full text link
    Despite the impressive performance across numerous tasks, large language models (LLMs) often fail in solving simple decision-making tasks due to the misalignment of the knowledge in LLMs with environments. On the contrary, reinforcement learning (RL) agents learn policies from scratch, which makes them always align with environments but difficult to incorporate prior knowledge for efficient explorations. To narrow the gap, we propose TWOSOME, a novel general online framework that deploys LLMs as decision-making agents to efficiently interact and align with embodied environments via RL without requiring any prepared datasets or prior knowledge of the environments. Firstly, we query the joint probabilities of each valid action with LLMs to form behavior policies. Then, to enhance the stability and robustness of the policies, we propose two normalization methods and summarize four prompt design principles. Finally, we design a novel parameter-efficient training architecture where the actor and critic share one frozen LLM equipped with low-rank adapters (LoRA) updated by PPO. We conduct extensive experiments to evaluate TWOSOME. i) TWOSOME exhibits significantly better sample efficiency and performance compared to the conventional RL method, PPO, and prompt tuning method, SayCan, in both classical decision-making environment, Overcooked, and simulated household environment, VirtualHome. ii) Benefiting from LLMs' open-vocabulary feature, TWOSOME shows superior generalization ability to unseen tasks. iii) Under our framework, there is no significant loss of the LLMs' original ability during online PPO finetuning.Comment: Accepted by ICLR202

    Adaptive Value Decomposition with Greedy Marginal Contribution Computation for Cooperative Multi-Agent Reinforcement Learning

    Full text link
    Real-world cooperation often requires intensive coordination among agents simultaneously. This task has been extensively studied within the framework of cooperative multi-agent reinforcement learning (MARL), and value decomposition methods are among those cutting-edge solutions. However, traditional methods that learn the value function as a monotonic mixing of per-agent utilities cannot solve the tasks with non-monotonic returns. This hinders their application in generic scenarios. Recent methods tackle this problem from the perspective of implicit credit assignment by learning value functions with complete expressiveness or using additional structures to improve cooperation. However, they are either difficult to learn due to large joint action spaces or insufficient to capture the complicated interactions among agents which are essential to solving tasks with non-monotonic returns. To address these problems, we propose a novel explicit credit assignment method to address the non-monotonic problem. Our method, Adaptive Value decomposition with Greedy Marginal contribution (AVGM), is based on an adaptive value decomposition that learns the cooperative value of a group of dynamically changing agents. We first illustrate that the proposed value decomposition can consider the complicated interactions among agents and is feasible to learn in large-scale scenarios. Then, our method uses a greedy marginal contribution computed from the value decomposition as an individual credit to incentivize agents to learn the optimal cooperative policy. We further extend the module with an action encoder to guarantee the linear time complexity for computing the greedy marginal contribution. Experimental results demonstrate that our method achieves significant performance improvements in several non-monotonic domains.Comment: This paper is accepted by aamas 202

    Satisfactory orthogonal array and its checking method

    No full text
    An orthogonal array (OA) is said to be a satisfactory orthogonal array if it is impossible to obtain another OA from it by adding one or more columns. By exploring the relationship between OAs and orthogonal decompositions of projection matrices, we present a method of checking a satisfactory OA.Projection matrix Satisfactory orthogonal array

    Further results on the orthogonal arrays obtained by generalized Hadamard product

    No full text
    By combining generalized Hadamard product with difference matrix and exploring the relationship between orthogonal arrays and decomposition of projection matrix, we furthermore develop the method in Zhang et al. (Discrete Math. 238 (2001) 151). As an application of it, some new orthogonal arrays of run size 100 are constructed.Mixed-level orthogonal array Generalized Hadamard product Difference matrix Projection matrix

    Normal mixed difference matrix and the construction of orthogonal arrays

    No full text
    We present the definitions of normal orthogonal array (OA) and normal mixed difference matrix and extend the mixed difference matrix method introduced by Wang (Statist. Probab. Lett. 28, 121). Some new mixed-level OAs are constructed through the generalized Kronecker sum of (nonorthogonal) mixed-level matrix and normal mixed difference matrices.Mixed orthogonal array Mixed difference matrix Normal mixed difference matrix Generalized Kronecker sum

    A note on orthogonal arrays obtained by orthogonal decomposition of projection matrices

    No full text
    A method of constructing mixed-level orthogonal arrays is presented by Zhang et al. (Statist. Sinica 2, 595). It is somewhat difficult to use this method to obtain new orthogonal arrays. This paper illustrates with examples the application of the method. Several classes of mixed-level orthogonal arrays are obtained.Mixed-level orthogonal array Projection matrix Permutation matrix

    Measuring joint space-time accessibility in transit network under travel time uncertainty

    No full text
    In densely populated areas, joint activity participation such as shopping and eating outside represents a substantial portion of individuals\u27 daily activity-travel patterns. Individuals have several considerations in making joint activity choice behaviour such as space-time coordination, activity start time, activity location and joint activity duration. In a congested transit network, the travel time uncertainty significantly affects individuals\u27 activity and travel choice behaviour. To conduct some important joint activities, individuals normally have high expectations of on-time arrival. Thus, the measurement of space-time accessibility in transportation networks should be extended to consider joint activity choices and travel time uncertainty. In this paper, a new method for measuring space-time accessibility is proposed for individuals\u27 joint activities in transit network with consideration of travel time uncertainty. A reliable alighting location model is proposed to identify all feasible alighting locations in transit network to perform joint activities by explicitly considering constraints of on-time arrival probabilities. Individuals’ joint space-time accessibility (JSTA) is measured based on reliable alighting locations, travel time budget, and points of interests. Massive smart card data from metro network in Nanjing, China are used to show the merits of the proposed JSTA measure. The results show that there is a significant difference between the space-time accessibility to independent activity and that to joint activity under various on-time arrival probabilities. The effects of spatial (de)concentration of anchor locations and minimum joint activity duration on JSTA, and the role of additional participants are also presented in various scenarios. The proposed JSTA measure can be used to help policy makers to evaluate transit network development and land use planning in urban areas
    corecore