1,497 research outputs found
An LSPI based reinforcement learning approach to enable network cooperation in cognitive wireless sensor networks
The number of wirelessly communicating devices increases every day, along with the number of communication standards and technologies that they use to exchange data. A relatively new form of research is trying to find a way to make all these co-located devices not only capable of detecting each other's presence, but to go one step further - to make them cooperate. One recently proposed way to tackle this problem is to engage into cooperation by activating 'network services' (such as internet sharing, interference avoidance, etc.) that offer benefits for other co-located networks. This approach reduces the problem to the following research topic: how to determine which network services would be beneficial for all the cooperating networks. In this paper we analyze and propose a conceptual solution for this problem using the reinforcement learning technique known as the Least Square Policy Iteration (LSPI). The proposes solution uses a self-learning entity that negotiates between different independent and co-located networks. First, the reasoning entity uses self-learning techniques to determine which service configuration should be used to optimize the network performance of each single network. Afterwards, this performance is used as a reference point and LSPI is used to deduce if cooperating with other co-located networks can lead to even further performance improvements
Optimization's Neglected Normative Commitments
Optimization is offered as an objective approach to resolving complex,
real-world decisions involving uncertainty and conflicting interests. It drives
business strategies as well as public policies and, increasingly, lies at the
heart of sophisticated machine learning systems. A paradigm used to approach
potentially high-stakes decisions, optimization relies on abstracting the real
world to a set of decision(s), objective(s) and constraint(s). Drawing from the
modeling process and a range of actual cases, this paper describes the
normative choices and assumptions that are necessarily part of using
optimization. It then identifies six emergent problems that may be neglected:
1) Misspecified values can yield optimizations that omit certain imperatives
altogether or incorporate them incorrectly as a constraint or as part of the
objective, 2) Problematic decision boundaries can lead to faulty modularity
assumptions and feedback loops, 3) Failing to account for multiple agents'
divergent goals and decisions can lead to policies that serve only certain
narrow interests, 4) Mislabeling and mismeasurement can introduce bias and
imprecision, 5) Faulty use of relaxation and approximation methods,
unaccompanied by formal characterizations and guarantees, can severely impede
applicability, and 6) Treating optimization as a justification for action,
without specifying the necessary contextual information, can lead to ethically
dubious or faulty decisions. Suggestions are given to further understand and
curb the harms that can arise when optimization is used wrongfully.Comment: 14 pages, 1 figure, presentation at FAccT2
Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck
The practice of code reuse is crucial in software development for a faster
and more efficient development lifecycle. In reality, however, code reuse
practices lack proper control, resulting in issues such as vulnerability
propagation and intellectual property infringements. Assembly clone search, a
critical shift-right defence mechanism, has been effective in identifying
vulnerable code resulting from reuse in released executables. Recent studies on
assembly clone search demonstrate a trend towards using machine learning-based
methods to match assembly code variants produced by different toolchains.
However, these methods are limited to what they learn from a small number of
toolchain variants used in training, rendering them inapplicable to unseen
architectures and their corresponding compilation toolchain variants.
This paper presents the first study on the problem of assembly clone search
with unseen architectures and libraries. We propose incorporating human common
knowledge through large-scale pre-trained natural language models, in the form
of transfer learning, into current learning-based approaches for assembly clone
search. Transfer learning can aid in addressing the limitations of the existing
approaches, as it can bring in broader knowledge from human experts in assembly
code. We further address the sequence limit issue by proposing a reinforcement
learning agent to remove unnecessary and redundant tokens. Coupled with a new
Variational Information Bottleneck learning strategy, the proposed system
minimizes the reliance on potential indicators of architectures and
optimization settings, for a better generalization of unseen architectures. We
simulate the unseen architecture clone search scenarios and the experimental
results show the effectiveness of the proposed approach against the
state-of-the-art solutions.Comment: 13 pages and 4 figures. This work has been submitted to the IEEE for
possible publication. Copyright may be transferred without notice, after
which this version may no longer be accessibl
Deep Reinforcement Learning for Distribution Network Operation and Electricity Market
The conventional distribution network and electricity market operation have become challenging under complicated network operating conditions, due to emerging distributed electricity generations, coupled energy networks, and new market behaviours. These challenges include increasing dynamics and stochastics, and vast problem dimensions such as control points, measurements, and multiple objectives, etc. Previously the optimization models were often formulated as conventional programming problems and then solved mathematically, which could now become highly time-consuming or sometimes infeasible. On the other hand, with the recent advancement of artificial intelligence technologies, deep reinforcement learning (DRL) algorithms have demonstrated their excellent performances in various control and optimization fields. This indicates a potential alternative to address these challenges.
In this thesis, DRL-based solutions for distribution network operation and electricity market have been investigated and proposed. Firstly, a DRL-based methodology is proposed for Volt/Var Control (VVC) optimization in a large distribution network, to effectively control bus voltages and reduce network power losses. Further, this thesis proposes a multi-agent (MA)DRL-based methodology under a complex regional coordinated VVC framework, and it can address spatial and temporal uncertainties. The DRL algorithm is also improved to adapt to the applications. Then, an integrated energy and heating systems (IEHS) optimization problem is solved by a MADRL-based methodology, where conventionally this could only be solved by simplifications or iterations. Beyond the applications in distribution network operation, a new electricity market service pricing method based on a DRL algorithm is also proposed. This DRL-based method has demonstrated good performance in this virtual storage rental service pricing problem, whereas this bi-level problem could hardly be solved directly due to a non-convex and non-continuous lower-level problem. These proposed methods have demonstrated advantageous performances under comprehensive case studies, and numerical simulation results have validated the effectiveness and high efficiency under different sophisticated operation conditions, solution robustness against temporal and spatial uncertainties, and optimality under large problem dimensions
A Policy Gradient Method for Task-Agnostic Exploration
In a reward-free environment, what is a suitable intrinsic objective for an
agent to pursue so that it can learn an optimal task-agnostic exploration
policy? In this paper, we argue that the entropy of the state distribution
induced by limited-horizon trajectories is a sensible target. Especially, we
present a novel and practical policy-search algorithm, Maximum Entropy POLicy
optimization (MEPOL), to learn a policy that maximizes a non-parametric,
-nearest neighbors estimate of the state distribution entropy. In contrast
to known methods, MEPOL is completely model-free as it requires neither to
estimate the state distribution of any policy nor to model transition dynamics.
Then, we empirically show that MEPOL allows learning a maximum-entropy
exploration policy in high-dimensional, continuous-control domains, and how
this policy facilitates learning a variety of meaningful reward-based tasks
downstream
Recommended from our members
Chapter 2Â -Â Data-Driven Energy Efficient Driving Control in Connected Vehicle Environment
- …