118 research outputs found

    Participant observation of griefing in a journey through the World of Warcraft

    Get PDF
    Through the ethnographic method of participant observation in World of Warcraft, this paper aims to document various actions that may be considered griefing among the Massively Multiplayer Online Role-Playing Game community. Griefing as a term can be very subjective, so witnessing the anti-social and intentional actions first-hand can be used as a means to understand this subjectivity among players as well as produce a thorough recount of some of the toxic behavior in this genre. The participant observation was conducted across several years and expansions of World of Warcraft and the author became familiar with many griefing related actions; although some of these were perceived as acceptable game-play elements

    The impact of environmental stochasticity on value-based multiobjective reinforcement learning

    Get PDF
    A common approach to address multiobjective problems using reinforcement learning methods is to extend model-free, value-based algorithms such as Q-learning to use a vector of Q-values in combination with an appropriate action selection mechanism that is often based on scalarisation. Most prior empirical evaluation of these approaches has focused on deterministic environments. This study examines the impact on stochasticity in rewards and state transitions on the behaviour of multi-objective Q-learning. It shows that the nature of the optimal solution depends on these environmental characteristics, and also on whether we desire to maximise the Expected Scalarised Return (ESR) or the Scalarised Expected Return (SER). We also identify a novel aim which may arise in some applications of maximising SER subject to satisfying constraints on the variation in return and show that this may require different solutions than ESR or conventional SER. The analysis of the interaction between environmental stochasticity and multi-objective Q-learning is supported by empirical evaluations on several simple multiobjective Markov Decision Processes with varying characteristics. This includes a demonstration of a novel approach to learning deterministic SER-optimal policies for environments with stochastic rewards. In addition, we report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. Having highlighted the limitations of value-based model-free MORL methods, we discuss several alternative methods that may be more suitable for maximising SER in MOMDPs with stochastic transitions. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature

    Explainable reinforcement learning for broad-XAI: a conceptual framework and survey

    Get PDF
    Broad-XAI moves away from interpreting individual decisions based on a single datum and aims to provide integrated explanations from multiple machine learning algorithms into a coherent explanation of an agent’s behaviour that is aligned to the communication needs of the explainee. Reinforcement Learning (RL) methods, we propose, provide a potential backbone for the cognitive model required for the development of Broad-XAI. RL represents a suite of approaches that have had increasing success in solving a range of sequential decision-making problems. However, these algorithms operate as black-box problem solvers, where they obfuscate their decision-making policy through a complex array of values and functions. EXplainable RL (XRL) aims to develop techniques to extract concepts from the agent’s: perception of the environment; intrinsic/extrinsic motivations/beliefs; Q-values, goals and objectives. This paper aims to introduce the Causal XRL Framework (CXF), that unifies the current XRL research and uses RL as a backbone to the development of Broad-XAI. CXF is designed to incorporate many standard RL extensions and integrated with external ontologies and communication facilities so that the agent can answer questions that explain outcomes its decisions. This paper aims to: establish XRL as a distinct branch of XAI; introduce a conceptual framework for XRL; review existing approaches explaining agent behaviour; and identify opportunities for future research. Finally, this paper discusses how additional information can be extracted and ultimately integrated into models of communication, facilitating the development of Broad-XAI. © 2023, The Author(s)

    Softmax exploration strategies for multiobjective reinforcement learning

    Get PDF
    Despite growing interest over recent years in applying reinforcement learning to multiobjective problems, there has been little research into the applicability and effectiveness of exploration strategies within the multiobjective context. This work considers several widely-used approaches to exploration from the single-objective reinforcement learning literature, and examines their incorporation into multiobjective Q-learning. In particular this paper proposes two novel approaches which extend the softmax operator to work with vector-valued rewards. The performance of these exploration strategies is evaluated across a set of benchmark environments. Issues arising from the multiobjective formulation of these benchmarks which impact on the performance of the exploration strategies are identified. It is shown that of the techniques considered, the combination of the novel softmax–epsilon exploration with optimistic initialisation provides the most effective trade-off between exploration and exploitation

    Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

    Full text link
    The rapid advancement of artificial intelligence (AI) systems suggests that artificial general intelligence (AGI) systems may soon arrive. Many researchers are concerned that AIs and AGIs will harm humans via intentional misuse (AI-misuse) or through accidents (AI-accidents). In respect of AI-accidents, there is an increasing effort focused on developing algorithms and paradigms that ensure AI systems are aligned to what humans intend, e.g. AI systems that yield actions or recommendations that humans might judge as consistent with their intentions and goals. Here we argue that alignment to human intent is insufficient for safe AI systems and that preservation of long-term agency of humans may be a more robust standard, and one that needs to be separated explicitly and a priori during optimization. We argue that AI systems can reshape human intention and discuss the lack of biological and psychological mechanisms that protect humans from loss of agency. We provide the first formal definition of agency-preserving AI-human interactions which focuses on forward-looking agency evaluations and argue that AI systems - not humans - must be increasingly tasked with making these evaluations. We show how agency loss can occur in simple environments containing embedded agents that use temporal-difference learning to make action recommendations. Finally, we propose a new area of research called "agency foundations" and pose four initial topics designed to improve our understanding of agency in AI-human interactions: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural-networks and reinforcement learning from internal states

    Function similarity using family context

    Get PDF
    Finding changed and similar functions between a pair of binaries is an important problem in malware attribution and for the identification of new malware capabilities. This paper presents a new technique called Function Similarity using Family Context (FSFC) for this problem. FSFC trains a Support Vector Machine (SVM) model using pairs of similar functions from two program variants. This method improves upon previous research called Cross Version Contextual Function Similarity (CVCFS) e epresenting a function using features extracted not just from the function itself, but also, from other functions with which it has a caller and callee relationship. We present the results of an initial experiment that shows that the use of additional features from the context of a function significantly decreases the false positive rate, obviating the need for a separate pass for cleaning false positives. The more surprising and unexpected finding is that the SVM model produced by FSFC can abstract function similarity features from one pair of program variants to find similar functions in an unrelated pair of program variants. If validated by a larger study, this new property leads to the possibility of creating generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra.This research was performed in the Internet Commerce Security Lab (ICSL), which is a joint venture with research partners Westpac, IBM, and Federation University Australia

    Language representations for generalization in reinforcement learning

    Get PDF
    The choice of state and action representation in Reinforcement Learning (RL) has a significant effect on agent performance for the training task. But its relationship with generalization to new tasks is under-explored. One approach to improving generalization investigated here is the use of language as a representation. We compare vector-states and discreteactions to language representations. We find the agents using language representations generalize better and could solve tasks with more entities, new entities, and more complexity than seen in the training task. We attribute this to the compositionality of languag

    Hybrid intrusion detection system based on the stacking ensemble of C5 decision tree classifier and one class support vector machine

    Get PDF
    Cyberttacks are becoming increasingly sophisticated, necessitating the efficient intrusion detection mechanisms to monitor computer resources and generate reports on anomalous or suspicious activities. Many Intrusion Detection Systems (IDSs) use a single classifier for identifying intrusions. Single classifier IDSs are unable to achieve high accuracy and low false alarm rates due to polymorphic, metamorphic, and zero-day behaviors of malware. In this paper, a Hybrid IDS (HIDS) is proposed by combining the C5 decision tree classifier and One Class Support Vector Machine (OC-SVM). HIDS combines the strengths of SIDS) and Anomaly-based Intrusion Detection System (AIDS). The SIDS was developed based on the C5.0 Decision tree classifier and AIDS was developed based on the one-class Support Vector Machine (SVM). This framework aims to identify both the well-known intrusions and zero-day attacks with high detection accuracy and low false-alarm rates. The proposed HIDS is evaluated using the benchmark datasets, namely, Network Security Laboratory-Knowledge Discovery in Databases (NSL-KDD) and Australian Defence Force Academy (ADFA) datasets. Studies show that the performance of HIDS is enhanced, compared to SIDS and AIDS in terms of detection rate and low false-alarm rates. © 2020 by the authors. Licensee MDPI, Basel, Switzerland
    • …
    corecore