481 research outputs found
Multidimensional Graph Neural Networks for Wireless Communications
Graph neural networks (GNNs) have been shown promising in improving the
efficiency of learning communication policies by leveraging their permutation
properties. Nonetheless, existing works design GNNs only for specific wireless
policies, lacking a systematical approach for modeling graph and selecting
structure. Based on the observation that the mismatched permutation property
from the policies and the information loss during the update of hidden
representations have large impact on the learning performance and efficiency,
in this paper we propose a unified framework to learn permutable wireless
policies with multidimensional GNNs. To avoid the information loss, the GNNs
update the hidden representations of hyper-edges. To exploit all possible
permutations of a policy, we provide a method to identify vertices in a graph.
We also investigate the permutability of wireless channels that affects the
sample efficiency, and show how to trade off the training, inference, and
designing complexities of GNNs. We take precoding in different systems as
examples to demonstrate how to apply the framework. Simulation results show
that the proposed GNNs can achieve close performance to numerical algorithms,
and require much fewer training samples and trainable parameters to achieve the
same learning performance as the commonly used convolutional neural networks
Size distributions reveal regime transition of lake systems under different dominant driving forces
Power law size distribution is found to associate with fractal,
self-organized behaviors and patterns of complex systems. Such distribution
also emerges from natural lakes, with potentially important links to the
dynamics of lake systems. But the driving mechanism that generates and shapes
this feature in lake systems remains unclear. Moreover, the power law itself
was found inadequate for fully describing the size distribution of lakes, due
to deviations at the two ends of size range. Based on observed and simulated
lakes in 11 hydro-climatic zones of China, we established a conceptual model
for lake systems, which covers the whole size range of lake size distribution
and reveals the underlying driving mechanism. The full lake size distribution
is composed of three components, with three phases featured by exponential,
stretched-exponential and power law distribution. The three phases represent
system states with successively increasing degrees of heterogeneity and
orderliness, and more importantly, indicate the dominance of exogenic and
endogenic forces, respectively. As the dominant driving force changes from
endogenic to exogenic, a phase transition occurs with lake size distribution
shifted from power law to stretched-exponential and further to exponential
distribution. Apart from compressing the power law phase, exogenic force also
increases its scaling exponent, driving the corresponding lake size power
spectrum into the regime of blue noise. During this process, the
autocorrelation function of the lake system diverges with a possibility of
going to infinity, indicating the loss of system resilience
Study on adaptive cycle life extension method of li-ion battery based on differential thermal voltammetry parameter decoupling
Battery aging leads to reduction in a battery’s cycle life, which restricts the development of energy storage technology. At present, the state of health (SOH) assessment technology, which is used to indicate the battery cycle life, has been widely studied. This paper tries to find a way to adjust the battery management system adaptively in order to prolong the battery cycle life with the change of SOH. In this paper, an improved Galvanostatic Intermittent Titration Technique (GITT) method is proposed to decouple the terminal voltage into overpotential (induced by total internal resistance) and stoichiometric drift (caused by battery aging, indicated by OCV). Based on improved GITT, the open circuit voltage-temperature change (OCV-dT/dV) characteristics of SOH are described more accurately. With such an accurate description of SOH change, the adaptive method to change the discharge and charge cut-off voltage is obtained, whose application can prolong battery cycle life. Experiments verify that, in the middle of a battery’s life-cycle, the adaptive method to change the discharge and charge cut-off voltage can effectively improve the cycle life of the battery. This method can be applied during the period of preventive maintenance in battery storage systems
Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization
Reinforcement learning (RL) has achieved promising results on most robotic
control tasks. Safety of learning-based controllers is an essential notion of
ensuring the effectiveness of the controllers. Current methods adopt whole
consistency constraints during the training, thus resulting in inefficient
exploration in the early stage. In this paper, we propose an algorithm named
Constrained Policy Optimization with Extra Safety Budget (ESB-CPO) to strike a
balance between the exploration efficiency and the constraints satisfaction. In
the early stage, our method loosens the practical constraints of unsafe
transitions (adding extra safety budget) with the aid of a new metric we
propose. With the training process, the constraints in our optimization problem
become tighter. Meanwhile, theoretical analysis and practical experiments
demonstrate that our method gradually meets the cost limit's demand in the
final training stage. When evaluated on Safety-Gym and Bullet-Safety-Gym
benchmarks, our method has shown its advantages over baseline algorithms in
terms of safety and optimality. Remarkably, our method gains remarkable
performance improvement under the same cost limit compared with baselines.Comment: 7 pages, 8 figure
DexCatch: Learning to Catch Arbitrary Objects with Dexterous Hands
Achieving human-like dexterous manipulation remains a crucial area of
research in robotics. Current research focuses on improving the success rate of
pick-and-place tasks. Compared with pick-and-place, throw-catching behavior has
the potential to increase picking speed without transporting objects to their
destination. However, dynamic dexterous manipulation poses a major challenge
for stable control due to a large number of dynamic contacts. In this paper, we
propose a Stability-Constrained Reinforcement Learning (SCRL) algorithm to
learn to catch diverse objects with dexterous hands. The SCRL algorithm
outperforms baselines by a large margin, and the learned policies show strong
zero-shot transfer performance on unseen objects. Remarkably, even though the
object in a hand facing sideward is extremely unstable due to the lack of
support from the palm, our method can still achieve a high level of success in
the most challenging task. Video demonstrations of learned behaviors and the
code can be found on the supplementary website
Foundation Reinforcement Learning: towards Embodied Generalist Agents with Foundation Prior Assistance
Recently, people have shown that large-scale pre-training from internet-scale
data is the key to building generalist models, as witnessed in NLP. To build
embodied generalist agents, we and many other researchers hypothesize that such
foundation prior is also an indispensable component. However, it is unclear
what is the proper concrete form to represent those embodied foundation priors
and how they should be used in the downstream task. In this paper, we propose
an intuitive and effective set of embodied priors that consist of foundation
policy, value, and success reward. The proposed priors are based on the
goal-conditioned MDP. To verify their effectiveness, we instantiate an
actor-critic method assisted by the priors, called Foundation Actor-Critic
(FAC). We name our framework as Foundation Reinforcement Learning (FRL), since
it completely relies on embodied foundation priors to explore, learn and
reinforce. The benefits of FRL are threefold. (1) Sample efficient. With
foundation priors, FAC learns significantly faster than traditional RL. Our
evaluation on the Meta-World has proved that FAC can achieve 100% success rates
for 7/8 tasks under less than 200k frames, which outperforms the baseline
method with careful manual-designed rewards under 1M frames. (2) Robust to
noisy priors. Our method tolerates the unavoidable noise in embodied foundation
models. We show that FAC works well even under heavy noise or quantization
errors. (3) Minimal human intervention: FAC completely learns from the
foundation priors, without the need of human-specified dense reward, or
providing teleoperated demos. Thus, FAC can be easily scaled up. We believe our
FRL framework could enable the future robot to autonomously explore and learn
without human intervention in the physical world. In summary, our proposed FRL
is a novel and powerful learning paradigm, towards achieving embodied
generalist agents
- …