15 research outputs found
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations
The general sequential decision-making problem, which includes Markov
decision processes (MDPs) and partially observable MDPs (POMDPs) as special
cases, aims at maximizing a cumulative reward by making a sequence of decisions
based on a history of observations and actions over time. Recent studies have
shown that the sequential decision-making problem is statistically learnable if
it admits a low-rank structure modeled by predictive state representations
(PSRs). Despite these advancements, existing approaches typically involve
oracles or steps that are not computationally efficient. On the other hand, the
upper confidence bound (UCB) based approaches, which have served successfully
as computationally efficient methods in bandits and MDPs, have not been
investigated for more general PSRs, due to the difficulty of optimistic bonus
design in these more challenging settings. This paper proposes the first known
UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the
total variation distance between the estimated and true models. We further
characterize the sample complexity bounds for our designed UCB-type algorithms
for both online and offline PSRs. In contrast to existing approaches for PSRs,
our UCB-type algorithms enjoy computational efficiency, last-iterate guaranteed
near-optimal policy, and guaranteed model accuracy
Temporal-Distributed Backdoor Attack Against Video Based Action Recognition
Deep neural networks (DNNs) have achieved tremendous success in various
applications including video action recognition, yet remain vulnerable to
backdoor attacks (Trojans). The backdoor-compromised model will mis-classify to
the target class chosen by the attacker when a test instance (from a non-target
class) is embedded with a specific trigger, while maintaining high accuracy on
attack-free instances. Although there are extensive studies on backdoor attacks
against image data, the susceptibility of video-based systems under backdoor
attacks remains largely unexplored. Current studies are direct extensions of
approaches proposed for image data, e.g., the triggers are
\textbf{independently} embedded within the frames, which tend to be detectable
by existing defenses. In this paper, we introduce a \textit{simple} yet
\textit{effective} backdoor attack against video data. Our proposed attack,
adding perturbations in a transformed domain, plants an \textbf{imperceptible,
temporally distributed} trigger across the video frames, and is shown to be
resilient to existing defensive strategies. The effectiveness of the proposed
attack is demonstrated by extensive experiments with various well-known models
on two video recognition benchmarks, UCF101 and HMDB51, and a sign language
recognition benchmark, Greek Sign Language (GSL) dataset. We delve into the
impact of several influential factors on our proposed attack and identify an
intriguing effect termed "collateral damage" through extensive studies
Federated Linear Contextual Bandits with User-level Differential Privacy
This paper studies federated linear contextual bandits under the notion of
user-level differential privacy (DP). We first introduce a unified federated
bandits framework that can accommodate various definitions of DP in the
sequential decision-making setting. We then formally introduce user-level
central DP (CDP) and local DP (LDP) in the federated bandits framework, and
investigate the fundamental trade-offs between the learning regrets and the
corresponding DP guarantees in a federated linear contextual bandits model. For
CDP, we propose a federated algorithm termed as \robin and show that it is
near-optimal in terms of the number of clients and the privacy budget
by deriving nearly-matching upper and lower regret bounds when
user-level DP is satisfied. For LDP, we obtain several lower bounds, indicating
that learning under user-level -LDP must suffer a regret
blow-up factor at least { or
} under different conditions.Comment: Accepted by ICML 202
Differentially Private Wireless Federated Learning Using Orthogonal Sequences
We propose a privacy-preserving uplink over-the-air computation (AirComp)
method, termed FLORAS, for single-input single-output (SISO) wireless federated
learning (FL) systems. From the perspective of communication designs, FLORAS
eliminates the requirement of channel state information at the transmitters
(CSIT) by leveraging the properties of orthogonal sequences. From the privacy
perspective, we prove that FLORAS offers both item-level and client-level
differential privacy (DP) guarantees. Moreover, by properly adjusting the
system parameters, FLORAS can flexibly achieve different DP levels at no
additional cost. A new FL convergence bound is derived which, combined with the
privacy guarantees, allows for a smooth tradeoff between the achieved
convergence rate and differential privacy levels. Experimental results
demonstrate the advantages of FLORAS compared with the baseline AirComp method,
and validate that the analytical results can guide the design of
privacy-preserving FL with different tradeoff requirements on the model
convergence and privacy levels.Comment: 33 pages, 5 figure
TTMFN: Two-stream Transformer-based Multimodal Fusion Network for Survival Prediction
Survival prediction plays a crucial role in assisting clinicians with the
development of cancer treatment protocols. Recent evidence shows that
multimodal data can help in the diagnosis of cancer disease and improve
survival prediction. Currently, deep learning-based approaches have experienced
increasing success in survival prediction by integrating pathological images
and gene expression data. However, most existing approaches overlook the
intra-modality latent information and the complex inter-modality correlations.
Furthermore, existing modalities do not fully exploit the immense
representational capabilities of neural networks for feature aggregation and
disregard the importance of relationships between features. Therefore, it is
highly recommended to address these issues in order to enhance the prediction
performance by proposing a novel deep learning-based method. We propose a novel
framework named Two-stream Transformer-based Multimodal Fusion Network for
survival prediction (TTMFN), which integrates pathological images and gene
expression data. In TTMFN, we present a two-stream multimodal co-attention
transformer module to take full advantage of the complex relationships between
different modalities and the potential connections within the modalities.
Additionally, we develop a multi-head attention pooling approach to effectively
aggregate the feature representations of the two modalities. The experiment
results on four datasets from The Cancer Genome Atlas demonstrate that TTMFN
can achieve the best performance or competitive results compared to the
state-of-the-art methods in predicting the overall survival of patients
Learning-Based Visual Servoing for High-Precision Peg-in-Hole Assembly
Visual servoing is widely used in the peg-in-hole assembly due to the uncertainty of pose. Humans can easily align the peg with the hole according to key visual points/edges. By imitating human behavior, we propose P2HNet, a learning-based neural network that can directly extract desired landmarks for visual servoing. To avoid collecting and annotating a large number of real images for training, we built a virtual assembly scene to generate many synthetic data for transfer learning. A multi-modal peg-in-hole strategy is then introduced to combine image-based search-and-force-based insertion. P2HNet-based visual servoing and spiral search are used to align the peg with the hole from coarse to fine. Force control is then used to complete the insertion. The strategy exploits the flexibility of neural networks and the stability of traditional methods. The effectiveness of the method was experimentally verified in the D-sub connector assembly with sub-millimeter clearance. The results show that the proposed method can achieve a higher success rate and efficiency than the baseline method in the high-precision peg-in-hole assembly
A Design of FPGA-Based Neural Network PID Controller for Motion Control System
In the actual industrial production process, the method of adaptively tuning proportional–integral–derivative (PID) parameters online by neural network can adapt to different characteristics of different controlled objects better than the controller with PID. However, the commonly used microcontroller unit (MCU) cannot meet the application scenarios of real time and high reliability. Therefore, in this paper, a closed-loop motion control system based on BP neural network (BPNN) PID controller by using a Xilinx field programmable gate array (FPGA) solution is proposed. In the design of the controller, it is divided into several sub-modules according to the modular design idea. The forward propagation module is used to complete the forward propagation operation from the input layer to the output layer. The PID module implements the mapping of PID arithmetic to register transfer level (RTL) and is responsible for completing the output of control amount. The main state machine module generates enable signals that control the sequential execution of each sub-module. The error backpropagation and weight update module completes the update of the weights of each layer of the network. The peripheral modules of the control system are divided into two main parts. The speed measurement module completes the acquisition of the output pulse signal of the encoder and the measurement of the motor speed. The pulse width modulation (PWM) signal generation module generates PWM waves with different duty cycles to control the rotation speed of the motor. A co-simulation of Modelsim and Simulink is used to simulate and verify the system, and a test analysis is also performed on the development platform. The results show that the proposed system can realize the self-tuning of PID control parameters, and also has the characteristics of reliable performance, high real-time performance, and strong anti-interference. Compared with MCU, the convergence speed is far more than three orders of magnitude, which proves its superiority
Multiband Spectrum Sensing and Power Allocation for aCognitive Radio-Enabled Smart Grid
As part of an Internet of Things (IoT) framework, the Smart Grid (SG) relies on advanced communication technologies for efficient energy management and utilization. Cognitive Radio (CR), which allows Secondary Users (SUs) to opportunistically access and use the spectrum bands owned by Primary Users (PUs), is regarded as the key technology of the next-generation wireless communication. With the assistance of CR technology, the quality of communication in the SG could be improved. In this paper, based on a hybrid CR-enabled SG communication network, a new system architecture for multiband-CR-enabled SG communication is proposed. Then, some optimization mathematical models are also proposed to jointly find the optimal sensing time and the optimal power allocation strategy. By using convex optimization techniques, several optimal methods are proposed to maximize the data rate of multiband-CR-enabled SG while considering the minimum detection probabilities to the active PUs. Finally, simulations are presented to show the validity of the proposed methods
Reinforcement Learning-Based Reactive Obstacle Avoidance Method for Redundant Manipulators
Redundant manipulators are widely used in fields such as human-robot collaboration due to their good flexibility. To ensure efficiency and safety, the manipulator is required to avoid obstacles while tracking a desired trajectory in many tasks. Conventional methods for obstacle avoidance of redundant manipulators may encounter joint singularity or exceed joint position limits while tracking the desired trajectory. By integrating deep reinforcement learning into the gradient projection method, a reactive obstacle avoidance method for redundant manipulators is proposed. We establish a general DRL framework for obstacle avoidance, and then a reinforcement learning agent is applied to learn motion in the null space of the redundant manipulator Jacobian matrix. The reward function of reinforcement learning is redesigned to handle multiple constraints automatically. Specifically, the manipulability index is introduced into the reward function, and thus the manipulator can maintain high manipulability to avoid joint singularity while executing tasks. To show the effectiveness of the proposed method, the simulation of 4 degrees of planar manipulator freedom is given. Compared with the gradient projection method, the proposed method outperforms in a success rate of obstacles avoidance, average manipulability, and time efficiency