1,497 research outputs found

    An LSPI based reinforcement learning approach to enable network cooperation in cognitive wireless sensor networks

    Get PDF
    The number of wirelessly communicating devices increases every day, along with the number of communication standards and technologies that they use to exchange data. A relatively new form of research is trying to find a way to make all these co-located devices not only capable of detecting each other's presence, but to go one step further - to make them cooperate. One recently proposed way to tackle this problem is to engage into cooperation by activating 'network services' (such as internet sharing, interference avoidance, etc.) that offer benefits for other co-located networks. This approach reduces the problem to the following research topic: how to determine which network services would be beneficial for all the cooperating networks. In this paper we analyze and propose a conceptual solution for this problem using the reinforcement learning technique known as the Least Square Policy Iteration (LSPI). The proposes solution uses a self-learning entity that negotiates between different independent and co-located networks. First, the reasoning entity uses self-learning techniques to determine which service configuration should be used to optimize the network performance of each single network. Afterwards, this performance is used as a reference point and LSPI is used to deduce if cooperating with other co-located networks can lead to even further performance improvements

    Optimization's Neglected Normative Commitments

    Full text link
    Optimization is offered as an objective approach to resolving complex, real-world decisions involving uncertainty and conflicting interests. It drives business strategies as well as public policies and, increasingly, lies at the heart of sophisticated machine learning systems. A paradigm used to approach potentially high-stakes decisions, optimization relies on abstracting the real world to a set of decision(s), objective(s) and constraint(s). Drawing from the modeling process and a range of actual cases, this paper describes the normative choices and assumptions that are necessarily part of using optimization. It then identifies six emergent problems that may be neglected: 1) Misspecified values can yield optimizations that omit certain imperatives altogether or incorporate them incorrectly as a constraint or as part of the objective, 2) Problematic decision boundaries can lead to faulty modularity assumptions and feedback loops, 3) Failing to account for multiple agents' divergent goals and decisions can lead to policies that serve only certain narrow interests, 4) Mislabeling and mismeasurement can introduce bias and imprecision, 5) Faulty use of relaxation and approximation methods, unaccompanied by formal characterizations and guarantees, can severely impede applicability, and 6) Treating optimization as a justification for action, without specifying the necessary contextual information, can lead to ethically dubious or faulty decisions. Suggestions are given to further understand and curb the harms that can arise when optimization is used wrongfully.Comment: 14 pages, 1 figure, presentation at FAccT2

    Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck

    Full text link
    The practice of code reuse is crucial in software development for a faster and more efficient development lifecycle. In reality, however, code reuse practices lack proper control, resulting in issues such as vulnerability propagation and intellectual property infringements. Assembly clone search, a critical shift-right defence mechanism, has been effective in identifying vulnerable code resulting from reuse in released executables. Recent studies on assembly clone search demonstrate a trend towards using machine learning-based methods to match assembly code variants produced by different toolchains. However, these methods are limited to what they learn from a small number of toolchain variants used in training, rendering them inapplicable to unseen architectures and their corresponding compilation toolchain variants. This paper presents the first study on the problem of assembly clone search with unseen architectures and libraries. We propose incorporating human common knowledge through large-scale pre-trained natural language models, in the form of transfer learning, into current learning-based approaches for assembly clone search. Transfer learning can aid in addressing the limitations of the existing approaches, as it can bring in broader knowledge from human experts in assembly code. We further address the sequence limit issue by proposing a reinforcement learning agent to remove unnecessary and redundant tokens. Coupled with a new Variational Information Bottleneck learning strategy, the proposed system minimizes the reliance on potential indicators of architectures and optimization settings, for a better generalization of unseen architectures. We simulate the unseen architecture clone search scenarios and the experimental results show the effectiveness of the proposed approach against the state-of-the-art solutions.Comment: 13 pages and 4 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Deep Reinforcement Learning for Distribution Network Operation and Electricity Market

    Full text link
    The conventional distribution network and electricity market operation have become challenging under complicated network operating conditions, due to emerging distributed electricity generations, coupled energy networks, and new market behaviours. These challenges include increasing dynamics and stochastics, and vast problem dimensions such as control points, measurements, and multiple objectives, etc. Previously the optimization models were often formulated as conventional programming problems and then solved mathematically, which could now become highly time-consuming or sometimes infeasible. On the other hand, with the recent advancement of artificial intelligence technologies, deep reinforcement learning (DRL) algorithms have demonstrated their excellent performances in various control and optimization fields. This indicates a potential alternative to address these challenges. In this thesis, DRL-based solutions for distribution network operation and electricity market have been investigated and proposed. Firstly, a DRL-based methodology is proposed for Volt/Var Control (VVC) optimization in a large distribution network, to effectively control bus voltages and reduce network power losses. Further, this thesis proposes a multi-agent (MA)DRL-based methodology under a complex regional coordinated VVC framework, and it can address spatial and temporal uncertainties. The DRL algorithm is also improved to adapt to the applications. Then, an integrated energy and heating systems (IEHS) optimization problem is solved by a MADRL-based methodology, where conventionally this could only be solved by simplifications or iterations. Beyond the applications in distribution network operation, a new electricity market service pricing method based on a DRL algorithm is also proposed. This DRL-based method has demonstrated good performance in this virtual storage rental service pricing problem, whereas this bi-level problem could hardly be solved directly due to a non-convex and non-continuous lower-level problem. These proposed methods have demonstrated advantageous performances under comprehensive case studies, and numerical simulation results have validated the effectiveness and high efficiency under different sophisticated operation conditions, solution robustness against temporal and spatial uncertainties, and optimality under large problem dimensions

    A Policy Gradient Method for Task-Agnostic Exploration

    Get PDF
    In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy? In this paper, we argue that the entropy of the state distribution induced by limited-horizon trajectories is a sensible target. Especially, we present a novel and practical policy-search algorithm, Maximum Entropy POLicy optimization (MEPOL), to learn a policy that maximizes a non-parametric, kk-nearest neighbors estimate of the state distribution entropy. In contrast to known methods, MEPOL is completely model-free as it requires neither to estimate the state distribution of any policy nor to model transition dynamics. Then, we empirically show that MEPOL allows learning a maximum-entropy exploration policy in high-dimensional, continuous-control domains, and how this policy facilitates learning a variety of meaningful reward-based tasks downstream
    • …
    corecore