10 research outputs found
Incorporating Deep Q -- Network with Multiclass Classification Algorithms
In this study, we explore how Deep Q-Network (DQN) might improve the
functionality of multiclass classification algorithms. We will use a benchmark
dataset from Kaggle to create a framework incorporating DQN with existing
supervised multiclass classification algorithms. The findings of this study
will bring insight into how deep reinforcement learning strategies may be used
to increase multiclass classification accuracy. They have been used in a number
of fields, including image recognition, natural language processing, and
bioinformatics. This study is focused on the prediction of financial distress
in companies in addition to the wider application of Deep Q-Network in
multiclass classification. Identifying businesses that are likely to experience
financial distress is a crucial task in the fields of finance and risk
management. Whenever a business experiences serious challenges keeping its
operations going and meeting its financial responsibilities, it is said to be
in financial distress. It commonly happens when a company has a sharp and
sustained recession in profitability, cash flow issues, or an unsustainable
level of debt
Understanding Reinforcement Learning Control in Cyber-Physical Energy Systems
The possibility of modeling a renewable energy system as a Cyber-Physical Energy System (CPES) offers new possibilities in terms of control. More precisely, this document discusses the applicability of Reinforcement Learning (RL) techniques to CPES. By considering a benchmark algorithm, we focus on conceptual and implementation details and on how such details relate to the problem of interest. In this case, we simulate how a RL model can optimize the energy storage control in order to reduce energy costs. The work also discusses the issues that arise in RL models and the possible approaches to these difficulties. Specifically, we propose investigating a better exploitation of the memory mechanism
A Federated DRL Approach for Smart Micro-Grid Energy Control with Distributed Energy Resources
The prevalence of the Internet of things (IoT) and smart meters devices in
smart grids is providing key support for measuring and analyzing the power
consumption patterns. This approach enables end-user to play the role of
prosumers in the market and subsequently contributes to diminish the carbon
footprint and the burden on utility grids. The coordination of trading
surpluses of energy that is generated by house renewable energy resources
(RERs) and the supply of shortages by external networks (main grid) is a
necessity. This paper proposes a hierarchical architecture to manage energy in
multiple smart buildings leveraging federated deep reinforcement learning
(FDRL) with dynamic load in a distributed manner. Within the context of the
developed FDRL-based framework, each agent that is hosted in local building
energy management systems (BEMS) trains a local deep reinforcement learning
(DRL) model and shares its experience in the form of model hyperparameters to
the federation layer in the energy management system (EMS). Simulation studies
are conducted using one EMS and up to twenty smart houses that are equipped
with photovoltaic (PV) systems and batteries. This iterative training approach
enables the proposed discretized soft actor-critic (SAC) agents to aggregate
the collected knowledge to expedite the overall learning procedure and reduce
costs and CO2 emissions, while the federation approach can mitigate privacy
breaches. The numerical results confirm the performance of the proposed
framework under different daytime periods, loads, and temperatures.Comment: 7 pages, 6 figures, accepted for publication at IEEE CAMAD 202
Adversarial Sample Generation using the Euclidean Jacobian-based Saliency Map Attack (EJSMA) and Classification for IEEE 802.11 using the Deep Deterministic Policy Gradient (DDPG)
One of today's most promising developments is wireless networking, as it enables people across the globe to stay connected. As the wireless networks' transmission medium is open, there are potential issues in safeguarding the privacy of the information. Though several security protocols exist in the literature for the preservation of information, most cases fail with a simple spoof attack. So, intrusion detection systems are vital in wireless networks as they help in the identification of harmful traffic. One of the challenges that exist in wireless intrusion detection systems (WIDS) is finding a balance between accuracy and false alarm rate. The purpose of this study is to provide a practical classification scheme for newer forms of attack. The AWID dataset is used in the experiment, which proposes a feature selection strategy using a combination of Elastic Net and recursive feature elimination. The best feature subset is obtained with 22 features, and a deep deterministic policy gradient learning algorithm is then used to classify attacks based on those features. Samples are generated using the Euclidean Jacobian-based Saliency Map Attack (EJSMA) to evaluate classification outcomes using adversarial samples. The meta-analysis reveals improved results in terms of feature production (22 features), classification accuracy (98.75% for testing samples and 85.24% for adversarial samples), and false alarm rates (0.35%). 
UAV Maneuvering Target Tracking in Uncertain Environments based on Deep Reinforcement Learning and Meta-learning
This paper combines Deep Reinforcement Learning (DRL) with Meta-learning and proposes a novel approach, named Meta Twin Delayed Deep Deterministic policy gradient (Meta-TD3), to realize the control of Unmanned Aerial Vehicle (UAV), allowing a UAV to quickly track a target in an environment where the motion of a target is uncertain. This approach can be applied to a variety of scenarios, such as wildlife protection, emergency aid, and remote sensing. We consider multi-tasks experience replay buffer to provide data for multi-tasks learning of DRL algorithm, and we combine Meta-learning to develop a multi-task reinforcement learning update method to ensure the generalization capability of reinforcement learning. Compared with the state-of-the-art algorithms, Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic policy gradient (TD3), experimental results show that the Meta-TD3 algorithm has achieved a great improvement in terms of both convergence value and convergence rate. In a UAV target tracking problem, Meta-TD3 only requires a few steps to train to enable a UAV to adapt quickly to a new target movement mode more and maintain a better tracking effectiveness
Deep Reinforcement Learning for Computation Offloading in Mobile Edge Computing
As 5G-networks are deployed worldwide, mobile edge computing (MEC) has been developed to help alleviate resource-intensive computations from an application. Here, IoT devices can offload their computation to an MEC server and receive the computed result. This offloading scheme can be viewed as an optimization problem, where the complexity quickly increases when more devices join the system. In this thesis, we solve the optimization problem and introduce different strategies that are compared to the optimal solution. The strategies implemented are full local computing, full offload to an MEC server, random search, optimal solution, Q- learning, and a deep Q-network (DQN). The main objective for each strategy is to minimize the total cost of the system, where the cost is a combination of energy consumption and delay. However, as the number of devices in the system increases, the results view numerous challenges. This thesis shows that the performance of random search, Q-learning, and DQN strategies are very close to the optimal solution for up to 20 devices. However, the results show generally poor performance for the strategies that can address more than 20 devices. In the end, we further discuss the performance and convergence of a DQN in MEC.Masteroppgave i informatikkINF399MAMN-INFMAMN-PRO
Training & acceleration of deep reinforcement learning agents
Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) "Επιστήμη Δεδομένων και Μηχανική Μάθηση
Prioritized experience replay based deep distributional reinforcement learning for battery operation in microgrids
This is the author accepted manuscript. The final version is available on open access from Elsevier via the DOI in this recordData availability: Data will be made available on request.Reinforcement Learning (RL) provides a pathway for efficiently utilizing the battery storage in a microgrid. However, traditional value-based RL algorithms used in battery management focus on formulating the policies based on the reward expectation rather than its probability distribution. Hence the scheduling strategy is solely based on the expectation of the rewards rather than the distribution. This paper focuses on scheduling strategy based on probability distribution of the rewards which optimally reflects the uncertainties in the incoming dataset. Furthermore, the prioritized experience replay samples of the training experience are used to enhance the quality of the learning by reducing bias. The results are obtained with different variants of distributional RL algorithms like C51, Quantile Regression Deep Q-Network (QR-DQN), Fully Quantizable Function (FQF), Implicit Quantile Networks (IQN) and rainbow. Moreover, the results are compared with the traditional deep Q-learning algorithm with prioritized experienced replay. The convergence results on the training dataset are further analyzed by varying the action spaces, using randomized experience replay and without including the tariff-based action while enforcing the penalties for violating battery SoC limits. The best trained Q-network is tested with different load and PV profiles to obtain the battery operation and costs. The performance of the distributional RL algorithms is analyzed under different schemes of Time of Use (ToU) tariff. QR-DQN with prioritized experience replay has been found to be the best performing algorithm in terms of convergence on the training dataset, with least fluctuation in validation dataset and battery operations during different tariff regimes during the day.European Regional Development Fun