1,151 research outputs found

    Safety-guided deep reinforcement learning via online gaussian process estimation

    Full text link
    An important facet of reinforcement learning (RL) has to do with how the agent goes about exploring the environment. Traditional exploration strategies typically focus on efficiency and ignore safety. However, for practical applications, ensuring safety of the agent during exploration is crucial since performing an unsafe action or reaching an unsafe state could result in irreversible damage to the agent. The main challenge of safe exploration is that characterizing the unsafe states and actions is difficult for large continuous state or action spaces and unknown environments. In this paper, we propose a novel approach to incorporate estimations of safety to guide exploration and policy search in deep reinforcement learning. By using a cost function to capture trajectory-based safety, our key idea is to formulate the state-action value function of this safety cost as a candidate Lyapunov function and extend control-theoretic results to approximate its derivative using online Gaussian Process (GP) estimation. We show how to use these statistical models to guide the agent in unknown environments to obtain high-performance control policies with provable stability certificates.Accepted manuscrip

    Resilience of multi-robot systems to physical masquerade attacks

    Full text link
    The advent of autonomous mobile multi-robot systems has driven innovation in both the industrial and defense sectors. The integration of such systems in safety-and security-critical applications has raised concern over their resilience to attack. In this work, we investigate the security problem of a stealthy adversary masquerading as a properly functioning agent. We show that conventional multi-agent pathfinding solutions are vulnerable to these physical masquerade attacks. Furthermore, we provide a constraint-based formulation of multi-agent pathfinding that yields multi-agent plans that are provably resilient to physical masquerade attacks. This formalization leverages inter-agent observations to facilitate introspective monitoring to guarantee resilience.Accepted manuscrip

    Masquerade attack detection through observation planning for multi-robot systems

    Full text link
    The increasing adoption of autonomous mobile robots comes with a rising concern over the security of these systems. In this work, we examine the dangers that an adversary could pose in a multi-agent robot system. We show that conventional multi-agent plans are vulnerable to strong attackers masquerading as a properly functioning agent. We propose a novel technique to incorporate attack detection into the multi-agent path-finding problem through the simultaneous synthesis of observation plans. We show that by specially crafting the multi-agent plan, the induced inter-agent observations can provide introspective monitoring guarantees; we achieve guarantees that any adversarial agent that plans to break the system-wide security specification must necessarily violate the induced observation plan.Accepted manuscrip

    Revisiting Manner/Result Complementarity: with evidence from Japanese and Chinese verb compounds

    Get PDF
    This paper brings data of verb compounds (V-Vs) from Japanese and Chinese, in an effort to uncover two issues: (a) whether the lexicalisation constraint (i.e. manner/result complementarity) applies to the languages that contain compound verbs; (b) how complex it can be to build compound verb. The finding reveals that manner and result are well encoded in most Japanese verb compounds, which gives rise to the assumption that the complementary constraint is not applicable to Japanese. In Chinese, the application of manner/result complementarity varies according to the types of V-V. In pair relation V-V, only manner meaning is conveyed. In predicate-complement V-V, both manner and result are lexicalised, with V1 encoding the manner and V2 denoting the result. Modifier-predicate V-V appears to only convey the manner. The conclusion emerging from the differing applications in the languages is that the manner/result complementary constraint does not apply to the languages that extensively employ verb compounds. This paper brings data of verb compounds (V-Vs) from Japanese and Chinese, in an effort to uncover two issues: (a) whether the lexicalisation constraint (i.e. manner/result complementarity) applies to the languages that contain compound verbs; (b) how complex it can be to build compound verb. The finding reveals that manner and result are well encoded in most Japanese verb compounds, which gives rise to the assumption that the complementary constraint is not applicable to Japanese. In Chinese, the application of manner/result complementarity varies according to the types of V-V. In pair relation V-V, only manner meaning is conveyed. In predicate-complement V-V, both manner and result are lexicalised, with V1 encoding the manner and V2 denoting the result. Modifier-predicate V-V appears to only convey the manner. The conclusion emerging from the differing applications in the languages is that the manner/result complementary constraint does not apply to the languages that extensively employ verb compounds

    Morphologic, Syntactic, and Phonologic Distance Between Japanese and Altaic, Dravidian, Austronesian, and Korean Languages

    Get PDF
    The present study measures the resemblances of Japanese with Altaic languages (Turkic; Tungstic; Mongolic; Nivkh); the Dravidian language Tamil; Austronesian languages (Western Malayo-Polynesian; Malayo-Sumbawan; Central Luzon; Central Malayo-Polynesian), and Korean, in an effort to pin down the genealogy of Japanese. Morphologic, syntactic, and phonologic distance are calculated using data from corpora. The chi-square homogeneity test and Euclidean distances are used for statistical analysis. The finding brings to light, morphologically, in the light of preferences of causative/inchoative verb alternation patterning and morphemes that convey the alternation, that Japanese and Korean are close for the most part. Syntactically, Altaics and Tamil convey case via suffixes; case in Austronesian languages is marked by prefixes. Japanese and Korean share a similarity in rendering case with particles. Phonologically, the Tamil and Austronesian languages share a resemblance in the harmony of vowel height. The Korean, Altaic languages, and Austronesian languages show similarities in the harmony of vowel backness. Japanese, the Altaic languages, and the Austronesian language Madurese display vowel-consonant harmony. Pulling these strands together, a conclusion is thus drawn that Japanese is most closely related to Korean

    TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents

    Full text link
    Recent work has identified that classification models implemented as neural networks are vulnerable to data-poisoning and Trojan attacks at training time. In this work, we show that these training-time vulnerabilities extend to deep reinforcement learning (DRL) agents and can be exploited by an adversary with access to the training process. In particular, we focus on Trojan attacks that augment the function of reinforcement learning policies with hidden behaviors. We demonstrate that such attacks can be implemented through minuscule data poisoning (as little as 0.025% of the training data) and in-band reward modification that does not affect the reward on normal inputs. The policies learned with our proposed attack approach perform imperceptibly similar to benign policies but deteriorate drastically when the Trojan is triggered in both targeted and untargeted settings. Furthermore, we show that existing Trojan defense mechanisms for classification tasks are not effective in the reinforcement learning setting

    DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck

    Full text link
    Deep reinforcement learning (DRL) agents are often sensitive to visual changes that were unseen in their training environments. To address this problem, we leverage the sequential nature of RL to learn robust representations that encode only task-relevant information from observations based on the unsupervised multi-view setting. Specifically, we introduce an auxiliary objective based on the multi-view in-formation bottleneck (MIB) principle which quantifies the amount of task-irrelevant information and encourages learning representations that are both predictive of the future and less sensitive to task-irrelevant distractions. This enables us to train high-performance policies that are robust to visual distractions and can generalize to unseen environments. We demonstrate that our approach can achieve SOTA performance on diverse visual control tasks on the DeepMind Control Suite, even when the background is replaced with natural videos. In addition, we show that our approach outperforms well-established baselines for generalization to unseen environments on the Procgen benchmark. Our code is open-sourced and available at https://github.com/JmfanBU/DRIBO.Comment: 27 page

    Adversarial Training and Provable Robustness: A Tale of Two Objectives

    Full text link
    We propose a principled framework that combines adversarial training and provable robustness verification for training certifiably robust neural networks. We formulate the training problem as a joint optimization problem with both empirical and provable robustness objectives and develop a novel gradient-descent technique that can eliminate bias in stochastic multi-gradients. We perform both theoretical analysis on the convergence of the proposed technique and experimental comparison with state-of-the-arts. Results on MNIST and CIFAR-10 show that our method can consistently match or outperform prior approaches for provable l infinity robustness. Notably, we achieve 6.60% verified test error on MNIST at epsilon = 0.3, and 66.57% on CIFAR-10 with epsilon = 8/255.Comment: Accepted at AAAI 202

    Association of Serum Adropin Concentrations with Diabetic Nephropathy

    Get PDF
    • …
    corecore