481 research outputs found

    Multidimensional Graph Neural Networks for Wireless Communications

    Full text link
    Graph neural networks (GNNs) have been shown promising in improving the efficiency of learning communication policies by leveraging their permutation properties. Nonetheless, existing works design GNNs only for specific wireless policies, lacking a systematical approach for modeling graph and selecting structure. Based on the observation that the mismatched permutation property from the policies and the information loss during the update of hidden representations have large impact on the learning performance and efficiency, in this paper we propose a unified framework to learn permutable wireless policies with multidimensional GNNs. To avoid the information loss, the GNNs update the hidden representations of hyper-edges. To exploit all possible permutations of a policy, we provide a method to identify vertices in a graph. We also investigate the permutability of wireless channels that affects the sample efficiency, and show how to trade off the training, inference, and designing complexities of GNNs. We take precoding in different systems as examples to demonstrate how to apply the framework. Simulation results show that the proposed GNNs can achieve close performance to numerical algorithms, and require much fewer training samples and trainable parameters to achieve the same learning performance as the commonly used convolutional neural networks

    Size distributions reveal regime transition of lake systems under different dominant driving forces

    Full text link
    Power law size distribution is found to associate with fractal, self-organized behaviors and patterns of complex systems. Such distribution also emerges from natural lakes, with potentially important links to the dynamics of lake systems. But the driving mechanism that generates and shapes this feature in lake systems remains unclear. Moreover, the power law itself was found inadequate for fully describing the size distribution of lakes, due to deviations at the two ends of size range. Based on observed and simulated lakes in 11 hydro-climatic zones of China, we established a conceptual model for lake systems, which covers the whole size range of lake size distribution and reveals the underlying driving mechanism. The full lake size distribution is composed of three components, with three phases featured by exponential, stretched-exponential and power law distribution. The three phases represent system states with successively increasing degrees of heterogeneity and orderliness, and more importantly, indicate the dominance of exogenic and endogenic forces, respectively. As the dominant driving force changes from endogenic to exogenic, a phase transition occurs with lake size distribution shifted from power law to stretched-exponential and further to exponential distribution. Apart from compressing the power law phase, exogenic force also increases its scaling exponent, driving the corresponding lake size power spectrum into the regime of blue noise. During this process, the autocorrelation function of the lake system diverges with a possibility of going to infinity, indicating the loss of system resilience

    Study on adaptive cycle life extension method of li-ion battery based on differential thermal voltammetry parameter decoupling

    Get PDF
    Battery aging leads to reduction in a battery’s cycle life, which restricts the development of energy storage technology. At present, the state of health (SOH) assessment technology, which is used to indicate the battery cycle life, has been widely studied. This paper tries to find a way to adjust the battery management system adaptively in order to prolong the battery cycle life with the change of SOH. In this paper, an improved Galvanostatic Intermittent Titration Technique (GITT) method is proposed to decouple the terminal voltage into overpotential (induced by total internal resistance) and stoichiometric drift (caused by battery aging, indicated by OCV). Based on improved GITT, the open circuit voltage-temperature change (OCV-dT/dV) characteristics of SOH are described more accurately. With such an accurate description of SOH change, the adaptive method to change the discharge and charge cut-off voltage is obtained, whose application can prolong battery cycle life. Experiments verify that, in the middle of a battery’s life-cycle, the adaptive method to change the discharge and charge cut-off voltage can effectively improve the cycle life of the battery. This method can be applied during the period of preventive maintenance in battery storage systems

    Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization

    Full text link
    Reinforcement learning (RL) has achieved promising results on most robotic control tasks. Safety of learning-based controllers is an essential notion of ensuring the effectiveness of the controllers. Current methods adopt whole consistency constraints during the training, thus resulting in inefficient exploration in the early stage. In this paper, we propose an algorithm named Constrained Policy Optimization with Extra Safety Budget (ESB-CPO) to strike a balance between the exploration efficiency and the constraints satisfaction. In the early stage, our method loosens the practical constraints of unsafe transitions (adding extra safety budget) with the aid of a new metric we propose. With the training process, the constraints in our optimization problem become tighter. Meanwhile, theoretical analysis and practical experiments demonstrate that our method gradually meets the cost limit's demand in the final training stage. When evaluated on Safety-Gym and Bullet-Safety-Gym benchmarks, our method has shown its advantages over baseline algorithms in terms of safety and optimality. Remarkably, our method gains remarkable performance improvement under the same cost limit compared with baselines.Comment: 7 pages, 8 figure

    DexCatch: Learning to Catch Arbitrary Objects with Dexterous Hands

    Full text link
    Achieving human-like dexterous manipulation remains a crucial area of research in robotics. Current research focuses on improving the success rate of pick-and-place tasks. Compared with pick-and-place, throw-catching behavior has the potential to increase picking speed without transporting objects to their destination. However, dynamic dexterous manipulation poses a major challenge for stable control due to a large number of dynamic contacts. In this paper, we propose a Stability-Constrained Reinforcement Learning (SCRL) algorithm to learn to catch diverse objects with dexterous hands. The SCRL algorithm outperforms baselines by a large margin, and the learned policies show strong zero-shot transfer performance on unseen objects. Remarkably, even though the object in a hand facing sideward is extremely unstable due to the lack of support from the palm, our method can still achieve a high level of success in the most challenging task. Video demonstrations of learned behaviors and the code can be found on the supplementary website

    Foundation Reinforcement Learning: towards Embodied Generalist Agents with Foundation Prior Assistance

    Full text link
    Recently, people have shown that large-scale pre-training from internet-scale data is the key to building generalist models, as witnessed in NLP. To build embodied generalist agents, we and many other researchers hypothesize that such foundation prior is also an indispensable component. However, it is unclear what is the proper concrete form to represent those embodied foundation priors and how they should be used in the downstream task. In this paper, we propose an intuitive and effective set of embodied priors that consist of foundation policy, value, and success reward. The proposed priors are based on the goal-conditioned MDP. To verify their effectiveness, we instantiate an actor-critic method assisted by the priors, called Foundation Actor-Critic (FAC). We name our framework as Foundation Reinforcement Learning (FRL), since it completely relies on embodied foundation priors to explore, learn and reinforce. The benefits of FRL are threefold. (1) Sample efficient. With foundation priors, FAC learns significantly faster than traditional RL. Our evaluation on the Meta-World has proved that FAC can achieve 100% success rates for 7/8 tasks under less than 200k frames, which outperforms the baseline method with careful manual-designed rewards under 1M frames. (2) Robust to noisy priors. Our method tolerates the unavoidable noise in embodied foundation models. We show that FAC works well even under heavy noise or quantization errors. (3) Minimal human intervention: FAC completely learns from the foundation priors, without the need of human-specified dense reward, or providing teleoperated demos. Thus, FAC can be easily scaled up. We believe our FRL framework could enable the future robot to autonomously explore and learn without human intervention in the physical world. In summary, our proposed FRL is a novel and powerful learning paradigm, towards achieving embodied generalist agents
    • …
    corecore