1,151 research outputs found
Safety-guided deep reinforcement learning via online gaussian process estimation
An important facet of reinforcement learning (RL) has to do with how the agent goes about exploring the environment. Traditional exploration strategies typically focus on efficiency and ignore safety. However, for practical applications, ensuring safety of the agent during exploration is crucial since performing an unsafe action or reaching an unsafe state could result in irreversible damage to the agent. The main challenge of safe exploration is that characterizing the unsafe states and actions is difficult for large continuous state or action spaces and unknown environments. In this paper, we propose a novel approach to incorporate estimations of safety to guide exploration and policy search in deep reinforcement learning. By using a cost function to capture trajectory-based safety, our key idea is to formulate the state-action value function of this safety cost as a candidate Lyapunov function and extend control-theoretic results to approximate its derivative using online Gaussian Process (GP) estimation. We show how to use these statistical models to guide the agent in unknown environments to obtain high-performance control policies with provable stability certificates.Accepted manuscrip
Resilience of multi-robot systems to physical masquerade attacks
The advent of autonomous mobile multi-robot systems has driven innovation in both the industrial and defense sectors. The integration of such systems in safety-and security-critical applications has raised concern over their resilience to attack. In this work, we investigate the security problem of a stealthy adversary masquerading as a properly functioning agent. We show that conventional multi-agent pathfinding solutions are vulnerable to these physical masquerade attacks. Furthermore, we provide a constraint-based formulation of multi-agent pathfinding that yields multi-agent plans that are provably resilient to physical masquerade attacks. This formalization leverages inter-agent observations to facilitate introspective monitoring to guarantee resilience.Accepted manuscrip
Masquerade attack detection through observation planning for multi-robot systems
The increasing adoption of autonomous mobile robots comes with
a rising concern over the security of these systems. In this work, we
examine the dangers that an adversary could pose in a multi-agent
robot system. We show that conventional multi-agent plans are
vulnerable to strong attackers masquerading as a properly functioning
agent. We propose a novel technique to incorporate attack
detection into the multi-agent path-finding problem through the
simultaneous synthesis of observation plans. We show that by
specially crafting the multi-agent plan, the induced inter-agent
observations can provide introspective monitoring guarantees; we
achieve guarantees that any adversarial agent that plans to break
the system-wide security specification must necessarily violate the
induced observation plan.Accepted manuscrip
Revisiting Manner/Result Complementarity: with evidence from Japanese and Chinese verb compounds
This paper brings data of verb compounds (V-Vs) from Japanese and Chinese, in an effort to uncover two issues: (a) whether the lexicalisation constraint (i.e. manner/result complementarity) applies to the languages that contain compound verbs; (b) how complex it can be to build compound verb. The finding reveals that manner and result are well encoded in most Japanese verb compounds, which gives rise to the assumption that the complementary constraint is not applicable to Japanese. In Chinese, the application of manner/result complementarity varies according to the types of V-V. In pair relation V-V, only manner meaning is conveyed. In predicate-complement V-V, both manner and result are lexicalised, with V1 encoding the manner and V2 denoting the result. Modifier-predicate V-V appears to only convey the manner. The conclusion emerging from the differing applications in the languages is that the manner/result complementary constraint does not apply to the languages that extensively employ verb compounds. This paper brings data of verb compounds (V-Vs) from Japanese and Chinese, in an effort to uncover two issues: (a) whether the lexicalisation constraint (i.e. manner/result complementarity) applies to the languages that contain compound verbs; (b) how complex it can be to build compound verb. The finding reveals that manner and result are well encoded in most Japanese verb compounds, which gives rise to the assumption that the complementary constraint is not applicable to Japanese. In Chinese, the application of manner/result complementarity varies according to the types of V-V. In pair relation V-V, only manner meaning is conveyed. In predicate-complement V-V, both manner and result are lexicalised, with V1 encoding the manner and V2 denoting the result. Modifier-predicate V-V appears to only convey the manner. The conclusion emerging from the differing applications in the languages is that the manner/result complementary constraint does not apply to the languages that extensively employ verb compounds
Morphologic, Syntactic, and Phonologic Distance Between Japanese and Altaic, Dravidian, Austronesian, and Korean Languages
The present study measures the resemblances of Japanese with Altaic languages (Turkic; Tungstic; Mongolic; Nivkh); the Dravidian language Tamil; Austronesian languages (Western Malayo-Polynesian; Malayo-Sumbawan; Central Luzon; Central Malayo-Polynesian), and Korean, in an effort to pin down the genealogy of Japanese. Morphologic, syntactic, and phonologic distance are calculated using data from corpora. The chi-square homogeneity test and Euclidean distances are used for statistical analysis. The finding brings to light, morphologically, in the light of preferences of causative/inchoative verb alternation patterning and morphemes that convey the alternation, that Japanese and Korean are close for the most part. Syntactically, Altaics and Tamil convey case via suffixes; case in Austronesian languages is marked by prefixes. Japanese and Korean share a similarity in rendering case with particles. Phonologically, the Tamil and Austronesian languages share a resemblance in the harmony of vowel height. The Korean, Altaic languages, and Austronesian languages show similarities in the harmony of vowel backness. Japanese, the Altaic languages, and the Austronesian language Madurese display vowel-consonant harmony. Pulling these strands together, a conclusion is thus drawn that Japanese is most closely related to Korean
TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents
Recent work has identified that classification models implemented as
neural networks are vulnerable to
data-poisoning and Trojan attacks at training time.
In this work, we show that these
training-time vulnerabilities extend to
deep reinforcement learning (DRL) agents
and can be exploited by an adversary with access
to the training process.
In particular, we focus on
Trojan attacks that augment the function of
reinforcement learning policies
with hidden behaviors.
We demonstrate that such attacks can be implemented
through minuscule data poisoning (as little as 0.025% of the training data) and
in-band
reward modification that does not affect
the reward on normal inputs.
The policies learned with our proposed attack approach perform imperceptibly similar to benign policies but deteriorate drastically when the Trojan is triggered
in both targeted and untargeted settings.
Furthermore, we show that existing Trojan defense mechanisms for classification tasks are not effective in the reinforcement learning setting
DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck
Deep reinforcement learning (DRL) agents are often sensitive to visual
changes that were unseen in their training environments. To address this
problem, we leverage the sequential nature of RL to learn robust
representations that encode only task-relevant information from observations
based on the unsupervised multi-view setting. Specifically, we introduce an
auxiliary objective based on the multi-view in-formation bottleneck (MIB)
principle which quantifies the amount of task-irrelevant information and
encourages learning representations that are both predictive of the future and
less sensitive to task-irrelevant distractions. This enables us to train
high-performance policies that are robust to visual distractions and can
generalize to unseen environments. We demonstrate that our approach can achieve
SOTA performance on diverse visual control tasks on the DeepMind Control Suite,
even when the background is replaced with natural videos. In addition, we show
that our approach outperforms well-established baselines for generalization to
unseen environments on the Procgen benchmark. Our code is open-sourced and
available at https://github.com/JmfanBU/DRIBO.Comment: 27 page
Adversarial Training and Provable Robustness: A Tale of Two Objectives
We propose a principled framework that combines adversarial training and
provable robustness verification for training certifiably robust neural
networks. We formulate the training problem as a joint optimization problem
with both empirical and provable robustness objectives and develop a novel
gradient-descent technique that can eliminate bias in stochastic
multi-gradients. We perform both theoretical analysis on the convergence of the
proposed technique and experimental comparison with state-of-the-arts. Results
on MNIST and CIFAR-10 show that our method can consistently match or outperform
prior approaches for provable l infinity robustness. Notably, we achieve 6.60%
verified test error on MNIST at epsilon = 0.3, and 66.57% on CIFAR-10 with
epsilon = 8/255.Comment: Accepted at AAAI 202
- …