15 research outputs found

    Provably Efficient UCB-type Algorithms For Learning Predictive State Representations

    Full text link
    The general sequential decision-making problem, which includes Markov decision processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at maximizing a cumulative reward by making a sequence of decisions based on a history of observations and actions over time. Recent studies have shown that the sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs). Despite these advancements, existing approaches typically involve oracles or steps that are not computationally efficient. On the other hand, the upper confidence bound (UCB) based approaches, which have served successfully as computationally efficient methods in bandits and MDPs, have not been investigated for more general PSRs, due to the difficulty of optimistic bonus design in these more challenging settings. This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. We further characterize the sample complexity bounds for our designed UCB-type algorithms for both online and offline PSRs. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational efficiency, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy

    Temporal-Distributed Backdoor Attack Against Video Based Action Recognition

    Full text link
    Deep neural networks (DNNs) have achieved tremendous success in various applications including video action recognition, yet remain vulnerable to backdoor attacks (Trojans). The backdoor-compromised model will mis-classify to the target class chosen by the attacker when a test instance (from a non-target class) is embedded with a specific trigger, while maintaining high accuracy on attack-free instances. Although there are extensive studies on backdoor attacks against image data, the susceptibility of video-based systems under backdoor attacks remains largely unexplored. Current studies are direct extensions of approaches proposed for image data, e.g., the triggers are \textbf{independently} embedded within the frames, which tend to be detectable by existing defenses. In this paper, we introduce a \textit{simple} yet \textit{effective} backdoor attack against video data. Our proposed attack, adding perturbations in a transformed domain, plants an \textbf{imperceptible, temporally distributed} trigger across the video frames, and is shown to be resilient to existing defensive strategies. The effectiveness of the proposed attack is demonstrated by extensive experiments with various well-known models on two video recognition benchmarks, UCF101 and HMDB51, and a sign language recognition benchmark, Greek Sign Language (GSL) dataset. We delve into the impact of several influential factors on our proposed attack and identify an intriguing effect termed "collateral damage" through extensive studies

    Federated Linear Contextual Bandits with User-level Differential Privacy

    Full text link
    This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP). We first introduce a unified federated bandits framework that can accommodate various definitions of DP in the sequential decision-making setting. We then formally introduce user-level central DP (CDP) and local DP (LDP) in the federated bandits framework, and investigate the fundamental trade-offs between the learning regrets and the corresponding DP guarantees in a federated linear contextual bandits model. For CDP, we propose a federated algorithm termed as \robin and show that it is near-optimal in terms of the number of clients MM and the privacy budget ε\varepsilon by deriving nearly-matching upper and lower regret bounds when user-level DP is satisfied. For LDP, we obtain several lower bounds, indicating that learning under user-level (ε,δ)(\varepsilon,\delta)-LDP must suffer a regret blow-up factor at least {min{1/ε,M}\min\{1/\varepsilon,M\} or min{1/ε,M}\min\{1/\sqrt{\varepsilon},\sqrt{M}\}} under different conditions.Comment: Accepted by ICML 202

    Differentially Private Wireless Federated Learning Using Orthogonal Sequences

    Full text link
    We propose a privacy-preserving uplink over-the-air computation (AirComp) method, termed FLORAS, for single-input single-output (SISO) wireless federated learning (FL) systems. From the perspective of communication designs, FLORAS eliminates the requirement of channel state information at the transmitters (CSIT) by leveraging the properties of orthogonal sequences. From the privacy perspective, we prove that FLORAS offers both item-level and client-level differential privacy (DP) guarantees. Moreover, by properly adjusting the system parameters, FLORAS can flexibly achieve different DP levels at no additional cost. A new FL convergence bound is derived which, combined with the privacy guarantees, allows for a smooth tradeoff between the achieved convergence rate and differential privacy levels. Experimental results demonstrate the advantages of FLORAS compared with the baseline AirComp method, and validate that the analytical results can guide the design of privacy-preserving FL with different tradeoff requirements on the model convergence and privacy levels.Comment: 33 pages, 5 figure

    TTMFN: Two-stream Transformer-based Multimodal Fusion Network for Survival Prediction

    Full text link
    Survival prediction plays a crucial role in assisting clinicians with the development of cancer treatment protocols. Recent evidence shows that multimodal data can help in the diagnosis of cancer disease and improve survival prediction. Currently, deep learning-based approaches have experienced increasing success in survival prediction by integrating pathological images and gene expression data. However, most existing approaches overlook the intra-modality latent information and the complex inter-modality correlations. Furthermore, existing modalities do not fully exploit the immense representational capabilities of neural networks for feature aggregation and disregard the importance of relationships between features. Therefore, it is highly recommended to address these issues in order to enhance the prediction performance by proposing a novel deep learning-based method. We propose a novel framework named Two-stream Transformer-based Multimodal Fusion Network for survival prediction (TTMFN), which integrates pathological images and gene expression data. In TTMFN, we present a two-stream multimodal co-attention transformer module to take full advantage of the complex relationships between different modalities and the potential connections within the modalities. Additionally, we develop a multi-head attention pooling approach to effectively aggregate the feature representations of the two modalities. The experiment results on four datasets from The Cancer Genome Atlas demonstrate that TTMFN can achieve the best performance or competitive results compared to the state-of-the-art methods in predicting the overall survival of patients

    Learning-Based Visual Servoing for High-Precision Peg-in-Hole Assembly

    No full text
    Visual servoing is widely used in the peg-in-hole assembly due to the uncertainty of pose. Humans can easily align the peg with the hole according to key visual points/edges. By imitating human behavior, we propose P2HNet, a learning-based neural network that can directly extract desired landmarks for visual servoing. To avoid collecting and annotating a large number of real images for training, we built a virtual assembly scene to generate many synthetic data for transfer learning. A multi-modal peg-in-hole strategy is then introduced to combine image-based search-and-force-based insertion. P2HNet-based visual servoing and spiral search are used to align the peg with the hole from coarse to fine. Force control is then used to complete the insertion. The strategy exploits the flexibility of neural networks and the stability of traditional methods. The effectiveness of the method was experimentally verified in the D-sub connector assembly with sub-millimeter clearance. The results show that the proposed method can achieve a higher success rate and efficiency than the baseline method in the high-precision peg-in-hole assembly

    A Design of FPGA-Based Neural Network PID Controller for Motion Control System

    No full text
    In the actual industrial production process, the method of adaptively tuning proportional–integral–derivative (PID) parameters online by neural network can adapt to different characteristics of different controlled objects better than the controller with PID. However, the commonly used microcontroller unit (MCU) cannot meet the application scenarios of real time and high reliability. Therefore, in this paper, a closed-loop motion control system based on BP neural network (BPNN) PID controller by using a Xilinx field programmable gate array (FPGA) solution is proposed. In the design of the controller, it is divided into several sub-modules according to the modular design idea. The forward propagation module is used to complete the forward propagation operation from the input layer to the output layer. The PID module implements the mapping of PID arithmetic to register transfer level (RTL) and is responsible for completing the output of control amount. The main state machine module generates enable signals that control the sequential execution of each sub-module. The error backpropagation and weight update module completes the update of the weights of each layer of the network. The peripheral modules of the control system are divided into two main parts. The speed measurement module completes the acquisition of the output pulse signal of the encoder and the measurement of the motor speed. The pulse width modulation (PWM) signal generation module generates PWM waves with different duty cycles to control the rotation speed of the motor. A co-simulation of Modelsim and Simulink is used to simulate and verify the system, and a test analysis is also performed on the development platform. The results show that the proposed system can realize the self-tuning of PID control parameters, and also has the characteristics of reliable performance, high real-time performance, and strong anti-interference. Compared with MCU, the convergence speed is far more than three orders of magnitude, which proves its superiority

    Multiband Spectrum Sensing and Power Allocation for aCognitive Radio-Enabled Smart Grid

    No full text
    As part of an Internet of Things (IoT) framework, the Smart Grid (SG) relies on advanced communication technologies for efficient energy management and utilization. Cognitive Radio (CR), which allows Secondary Users (SUs) to opportunistically access and use the spectrum bands owned by Primary Users (PUs), is regarded as the key technology of the next-generation wireless communication. With the assistance of CR technology, the quality of communication in the SG could be improved. In this paper, based on a hybrid CR-enabled SG communication network, a new system architecture for multiband-CR-enabled SG communication is proposed. Then, some optimization mathematical models are also proposed to jointly find the optimal sensing time and the optimal power allocation strategy. By using convex optimization techniques, several optimal methods are proposed to maximize the data rate of multiband-CR-enabled SG while considering the minimum detection probabilities to the active PUs. Finally, simulations are presented to show the validity of the proposed methods

    Reinforcement Learning-Based Reactive Obstacle Avoidance Method for Redundant Manipulators

    No full text
    Redundant manipulators are widely used in fields such as human-robot collaboration due to their good flexibility. To ensure efficiency and safety, the manipulator is required to avoid obstacles while tracking a desired trajectory in many tasks. Conventional methods for obstacle avoidance of redundant manipulators may encounter joint singularity or exceed joint position limits while tracking the desired trajectory. By integrating deep reinforcement learning into the gradient projection method, a reactive obstacle avoidance method for redundant manipulators is proposed. We establish a general DRL framework for obstacle avoidance, and then a reinforcement learning agent is applied to learn motion in the null space of the redundant manipulator Jacobian matrix. The reward function of reinforcement learning is redesigned to handle multiple constraints automatically. Specifically, the manipulability index is introduced into the reward function, and thus the manipulator can maintain high manipulability to avoid joint singularity while executing tasks. To show the effectiveness of the proposed method, the simulation of 4 degrees of planar manipulator freedom is given. Compared with the gradient projection method, the proposed method outperforms in a success rate of obstacles avoidance, average manipulability, and time efficiency
    corecore