263 research outputs found
Stackelberg Game-Theoretic Trajectory Guidance for Multi-Robot Systems with Koopman Operator
Guided trajectory planning involves a leader robotic agent strategically
directing a follower robotic agent to collaboratively reach a designated
destination. However, this task becomes notably challenging when the leader
lacks complete knowledge of the follower's decision-making model. There is a
need for learning-based methods to effectively design the cooperative plan. To
this end, we develop a Stackelberg game-theoretic approach based on Koopman
operator to address the challenge. We first formulate the guided trajectory
planning problem through the lens of a dynamic Stackelberg game. We then
leverage Koopman operator theory to acquire a learning-based linear system
model that approximates the follower's feedback dynamics. Based on this learned
model, the leader devises a collision-free trajectory to guide the follower,
employing receding horizon planning. We use simulations to elaborate the
effectiveness of our approach in generating learning models that accurately
predict the follower's multi-step behavior when compared to alternative
learning techniques. Moreover, our approach successfully accomplishes the
guidance task and notably reduces the leader's planning time to nearly half
when contrasted with the model-based baseline method
Stackelberg Meta-Learning Based Control for Guided Cooperative LQG Systems
Guided cooperation allows intelligent agents with heterogeneous capabilities
to work together by following a leader-follower type of interaction. However,
the associated control problem becomes challenging when the leader agent does
not have complete information about follower agents. There is a need for
learning and adaptation of cooperation plans. To this end, we develop a
meta-learning-based Stackelberg game-theoretic framework to address the
challenges in the guided cooperative control for linear systems. We first
formulate the guided cooperation between agents as a dynamic Stackelberg game
and use the feedback Stackelberg equilibrium as the agent-wise cooperation
strategy. We further leverage meta-learning to address the incomplete
information of follower agents, where the leader agent learns a meta-response
model from a prescribed set of followers offline and adapts to a new coming
cooperation task with a small amount of learning data. We use a case study in
robot teaming to corroborate the effectiveness of our framework. Comparison
with other learning approaches also shows that our learned cooperation strategy
provides better transferability for different cooperation tasks
Stackelberg Meta-Learning for Strategic Guidance in Multi-Robot Trajectory Planning
Guided cooperation is a common task in many multi-agent teaming applications.
The planning of the cooperation is difficult when the leader robot has
incomplete information about the follower, and there is a need to learn,
customize, and adapt the cooperation plan online. To this end, we develop a
learning-based Stackelberg game-theoretic framework to address this challenge
to achieve optimal trajectory planning for heterogeneous robots. We first
formulate the guided trajectory planning problem as a dynamic Stackelberg game
and design the cooperation plans using open-loop Stackelberg equilibria. We
leverage meta-learning to deal with the unknown follower in the game and
propose a Stackelberg meta-learning framework to create online adaptive
trajectory guidance plans, where the leader robot learns a meta-best-response
model from a prescribed set of followers offline and then fast adapts to a
specific online trajectory guidance task using limited learning data. We use
simulations in three different scenarios to elaborate on the effectiveness of
our framework. Comparison with other learning approaches and no guidance cases
show that our framework provides a more time- and data-efficient planning
method in trajectory guidance tasks
Quasi-optimal Learning with Continuous Treatments
Many real-world applications of reinforcement learning (RL) require making
decisions in continuous action environments. In particular, determining the
optimal dose level plays a vital role in developing medical treatment regimes.
One challenge in adapting existing RL algorithms to medical applications,
however, is that the popular infinite support stochastic policies, e.g.,
Gaussian policy, may assign riskily high dosages and harm patients seriously.
Hence, it is important to induce a policy class whose support only contains
near-optimal actions, and shrink the action-searching area for effectiveness
and reliability. To achieve this, we develop a novel \emph{quasi-optimal
learning algorithm}, which can be easily optimized in off-policy settings with
guaranteed convergence under general function approximations. Theoretically, we
analyze the consistency, sample complexity, adaptability, and convergence of
the proposed algorithm. We evaluate our algorithm with comprehensive simulated
experiments and a dose suggestion real application to Ohio Type 1 diabetes
dataset.Comment: The first two authors contributed equally to this wor
Policy Learning for Individualized Treatment Regimes on Infinite Time Horizon
With the recent advancements of technology in facilitating real-time
monitoring and data collection, "just-in-time" interventions can be delivered
via mobile devices to achieve both real-time and long-term management and
control. Reinforcement learning formalizes such mobile interventions as a
sequence of decision rules and assigns treatment arms based on the user's
status at each decision point. In practice, real applications concern a large
number of decision points beyond the time horizon of the currently collected
data. This usually refers to reinforcement learning in the infinite horizon
setting, which becomes much more challenging. This article provides a selective
overview of some statistical methodologies on this topic. We discuss their
modeling framework, generalizability, and interpretability and provide some use
case examples. Some future research directions are discussed in the end
Adaptive Data Augmentation for Contrastive Learning
In computer vision, contrastive learning is the most advanced unsupervised
learning framework. Yet most previous methods simply apply fixed composition of
data augmentations to improve data efficiency, which ignores the changes in
their optimal settings over training. Thus, the pre-determined parameters of
augmentation operations cannot always fit well with an evolving network during
the whole training period, which degrades the quality of the learned
representations. In this work, we propose AdDA, which implements a closed-loop
feedback structure to a generic contrastive learning network. AdDA works by
allowing the network to adaptively adjust the augmentation compositions
according to the real-time feedback. This online adjustment helps maintain the
dynamic optimal composition and enables the network to acquire more
generalizable representations with minimal computational overhead. AdDA
achieves competitive results under the common linear protocol on ImageNet-100
classification (+1.11% on MoCo v2).Comment: Accepted by ICASSP 202
Combined Voronoi-FDEM approach for modelling post-fracture response of laminated tempered glass
In this work, a combined Voronoi and finite-discrete element method (FDEM) approach for reconstructing the post-fracture model of laminated glass (LG) was proposed. The fracture morphology was determined via introducing Voronoi tessellation with statistical distribution parameters such as the fragment face numbers, volume and sphericity. The residual interaction between glass fragments was described with cohesive zone model. One fractured LG block under uniaxial tension, which was taken from a triple layered LG beam with ionoplast interlayers, was modelled and validated with experimentally recorded data. Through iteration analysis, the key cohesive parameters were determined for the most applicable model. It is followed by investigating the influence due to the fragments interaction property. The results show that the cohesion and frictional property can be combined to well describe the residual interaction behaviour between fragments. The frictional property has a remarkable effect on the post-fracture resistance whereas the associated effect on the stiffness is not evident. Compared to other cohesive parameters, the cohesive stiffness factors present predominant effect on both the post-fracture stiffness and resistance
Multi-Modal Wireless Flexible Gel-Free Sensors with Edge Deep Learning for Detecting and Alerting Freezing of Gait in Parkinson's Patients
Freezing of gait (FoG) is a debilitating symptom of Parkinson's disease (PD).
This work develops flexible wearable sensors that can detect FoG and alert
patients and companions to help prevent falls. FoG is detected on the sensors
using a deep learning (DL) model with multi-modal sensory inputs collected from
distributed wireless sensors. Two types of wireless sensors are developed,
including: (1) a C-shape central node placed around the patient's ears, which
collects electroencephalogram (EEG), detects FoG using an on-device DL model,
and generates auditory alerts when FoG is detected; (2) a stretchable
patch-type sensor attached to the patient's legs, which collects
electromyography (EMG) and movement information from accelerometers. The
patch-type sensors wirelessly send collected data to the central node through
low-power ultra-wideband (UWB) transceivers. All sensors are fabricated on
flexible printed circuit boards. Adhesive gel-free acetylene carbon black and
polydimethylsiloxane electrodes are fabricated on the flexible substrate to
allow conformal wear over the long term. Custom integrated circuits (IC) are
developed in 180 nm CMOS technology and used in both types of sensors for
signal acquisition, digitization, and wireless communication. A novel
lightweight DL model is trained using multi-modal sensory data. The inference
of the DL model is performed on a low-power microcontroller in the central
node. The DL model achieves a high detection sensitivity of 0.81 and a
specificity of 0.88. The developed wearable sensors are ready for clinical
experiments and hold great promise in improving the quality of life of patients
with PD. The proposed design methodologies can be used in wearable medical
devices for the monitoring and treatment of a wide range of neurodegenerative
diseases
- …