71 research outputs found
Enhancing grid reliability with coordination and control of distributed energy resources
The growing utilization of renewable energy resources (RES) within power systems has brought about new challenges due to the inherent uncertainty associated with RES, which makes it challenging to accurately forecast available generation. Further- more, the replacement of synchronous machines with inverter-based RES results in a reduction of power system inertia, complicating the task of maintaining a balance between generation and consumption. In this dissertation, coordinating Distributed Energy Resources (DER) is presented as a viable solution to these challenges.DERs have the potential to offer different ancillary services such as fast frequency response (FFR) when efficiently coordinated. However, the practical implementation of such services demands both effective local sensing and control at the device level and the ability to precisely estimate and predict the availability of synthetic damping from a fleet in real time. Additionally, the inherent trade-off between a fleet being available for fast frequency response while providing other ancillary services needs to be characterized. This dissertation introduces a fully decentralized, packet-based controller for a diverse range of flexible loads. This controller dynamically prioritizes and interrupts DERs to generate synthetic damping suitable for primary frequency control. Moreover, the packet-based control methodology is demonstrated to accu- rately assess the real-time availability of synthetic damping. Furthermore, spectral analysis of historical frequency regulation data is employed to establish a probabilis- tic bound on the expected synthetic damping available for primary frequency control from a fleet and the trade-off of concurrently offering secondary frequency control. It is noteworthy that coordinating a large number of DERs can potentially result in grid constraint violations. To tackle this challenge, this dissertation employs con- vex inner approximations (CIA) of the AC power flow to address the optimization problem of quantifying the capacity of a three-phase distribution feeder to accommo- date DERs. This capacity is often referred to as hosting capacity (HC). However, in this work, we consider separate limits for positive and negative DER injections at each node, ensuring that injections within these nodal limits adhere to feeder voltage and current constraints. The methodology dissects a three-phase feeder into individual phases and applies CIA-based techniques to each phase. Additionally, new approaches are introduced to modify the per-phase optimization problems to mitigate the inherent conservativeness associated with CIA methods and enhance HC. This includes selectively adjusting the per-phase impedances and proposing an iterative relaxation method for per-phase voltage bounds
Laxity-Aware Scalable Reinforcement Learning for HVAC Control
Demand flexibility plays a vital role in maintaining grid balance, reducing
peak demand, and saving customers' energy bills. Given their highly shiftable
load and significant contribution to a building's energy consumption, Heating,
Ventilation, and Air Conditioning (HVAC) systems can provide valuable demand
flexibility to the power systems by adjusting their energy consumption in
response to electricity price and power system needs. To exploit this
flexibility in both operation time and power, it is imperative to accurately
model and aggregate the load flexibility of a large population of HVAC systems
as well as designing effective control algorithms. In this paper, we tackle the
curse of dimensionality issue in modeling and control by utilizing the concept
of laxity to quantify the emergency level of each HVAC operation request. We
further propose a two-level approach to address energy optimization for a large
population of HVAC systems. The lower level involves an aggregator to aggregate
HVAC load laxity information and use least-laxity-first (LLF) rule to allocate
real-time power for individual HVAC systems based on the controller's total
power. Due to the complex and uncertain nature of HVAC systems, we leverage a
reinforcement learning (RL)-based controller to schedule the total power based
on the aggregated laxity information and electricity price. We evaluate the
temperature control and energy cost saving performance of a large-scale group
of HVAC systems in both single-zone and multi-zone scenarios, under varying
climate and electricity market conditions. The experiment results indicate that
proposed approach outperforms the centralized methods in the majority of test
scenarios, and performs comparably to model-based method in some scenarios.Comment: In Submissio
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Training
Collective communications are an indispensable part of distributed training.
Running a topology-aware collective algorithm is crucial for optimizing
communication performance by minimizing congestion. Today such algorithms only
exist for a small set of simple topologies, limiting the topologies employed in
training clusters and handling irregular topologies due to network failures. In
this paper, we propose TACOS, an automated topology-aware collective
synthesizer for arbitrary input network topologies. TACOS synthesized 3.73x
faster All-Reduce algorithm over baselines, and synthesized collective
algorithms for 512-NPU system in just 6.1 minutes
A Proposed Scheduling Algorithm for IoT Applications in a Merged Environment of Edge, Fog, and Cloud
With the rapid increase of Internet of Things (IoT) devices and applications, the ordinary cloud computing paradigm soon becomes outdated. Fog computing paradigm extends services provided by a cloud to the edge of network in order to satisfy requirements of IoT applications such as low latency, locality awareness, low network traffic, mobility support, and so forth. Task scheduling in a Cloud-Fog environment plays a great role to assure diverse computational demands are met. However, the quest for an optimal solution for task scheduling in the such environment is exceedingly hard due to diversity of IoT applications, heterogeneity of computational resources, and multiple criteria. This study approaches the task scheduling problem with aims at improving service quality and load balancing in a merged computing system of Edge-Fog-Cloud. We propose a Multi-Objective Scheduling Algorithm (MOSA) that takes into account the job characteristics and utilization of different computational resources. The proposed solution is evaluated in comparison to other existing policies named LB, WRR, and MPSO. Numerical results show that the proposed algorithm improves the average response time while maintaining load balancing in comparison to three existing policies. Obtained results with the use of real workloads validate the outcomes
Where to Decide? Centralized vs. Distributed Vehicle Assignment for Platoon Formation
Platooning is a promising cooperative driving application for future
intelligent transportation systems. In order to assign vehicles to platoons,
some algorithm for platoon formation is required. Such vehicle-to-platoon
assignments have to be computed on-demand, e.g., when vehicles join or leave
the freeways. In order to get best results from platooning, individual
properties of involved vehicles have to be considered during the assignment
computation. In this paper, we explore the computation of vehicle-to-platoon
assignments as an optimization problem based on similarity between vehicles. We
define the similarity and, vice versa, the deviation among vehicles based on
the desired driving speed of vehicles and their position on the road. We create
three approaches to solve this assignment problem: centralized solver,
centralized greedy, and distributed greedy, using a Mixed Integer Programming
solver and greedy heuristics, respectively. Conceptually, the approaches differ
in both knowledge about vehicles as well as methodology. We perform a
large-scale simulation study using PlaFoSim to compare all approaches. While
the distributed greedy approach seems to have disadvantages due to the limited
local knowledge, it performs as good as the centralized solver approach across
most metrics. Both outperform the centralized greedy approach, which suffers
from synchronization and greedy selection effects.Since the centralized solver
approach assumes global knowledge and requires a complex Mixed Integer
Programming solver to compute vehicle-to-platoon assignments, we consider the
distributed greedy approach to have the best performance among all presented
approaches
FusionPlanner: A Multi-task Motion Planner for Mining Trucks using Multi-sensor Fusion Method
In recent years, significant achievements have been made in motion planning
for intelligent vehicles. However, as a typical unstructured environment,
open-pit mining attracts limited attention due to its complex operational
conditions and adverse environmental factors. A comprehensive paradigm for
unmanned transportation in open-pit mines is proposed in this research,
including a simulation platform, a testing benchmark, and a trustworthy and
robust motion planner. \textcolor{red}{Firstly, we propose a multi-task motion
planning algorithm, called FusionPlanner, for autonomous mining trucks by the
Multi-sensor fusion method to adapt both lateral and longitudinal control tasks
for unmanned transportation. Then, we develop a novel benchmark called
MiningNav, which offers three validation approaches to evaluate the
trustworthiness and robustness of well-trained algorithms in transportation
roads of open-pit mines. Finally, we introduce the Parallel Mining Simulator
(PMS), a new high-fidelity simulator specifically designed for open-pit mining
scenarios. PMS enables the users to manage and control open-pit mine
transportation from both the single-truck control and multi-truck scheduling
perspectives.} \textcolor{red}{The performance of FusionPlanner is tested by
MiningNav in PMS, and the empirical results demonstrate a significant reduction
in the number of collisions and takeovers of our planner. We anticipate our
unmanned transportation paradigm will bring mining trucks one step closer to
trustworthiness and robustness in continuous round-the-clock unmanned
transportation.Comment: 2Pages, 10 figure
ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation
High-Performance Computing (HPC) processors are nowadays integrated
Cyber-Physical Systems demanding complex and high-bandwidth closed-loop power
and thermal control strategies. To efficiently satisfy real-time multi-input
multi-output (MIMO) optimal power requirements, high-end processors integrate
an on-die power controller system (PCS).
While traditional PCSs are based on a simple microcontroller (MCU)-class
core, more scalable and flexible PCS architectures are required to support
advanced MIMO control algorithms for managing the ever-increasing number of
cores, power states, and process, voltage, and temperature variability.
This paper presents ControlPULP, an open-source, HW/SW RISC-V parallel PCS
platform consisting of a single-core MCU with fast interrupt handling coupled
with a scalable multi-core programmable cluster accelerator and a specialized
DMA engine for the parallel acceleration of real-time power management
policies. ControlPULP relies on FreeRTOS to schedule a reactive power control
firmware (PCF) application layer.
We demonstrate ControlPULP in a power management use-case targeting a
next-generation 72-core HPC processor. We first show that the multi-core
cluster accelerates the PCF, achieving 4.9x speedup compared to single-core
execution, enabling more advanced power management algorithms within the
control hyper-period at a shallow area overhead, about 0.1% the area of a
modern HPC CPU die. We then assess the PCS and PCF by designing an FPGA-based,
closed-loop emulation framework that leverages the heterogeneous SoCs paradigm,
achieving DVFS tracking with a mean deviation within 3% the plant's thermal
design power (TDP) against a software-equivalent model-in-the-loop approach.
Finally, we show that the proposed PCF compares favorably with an
industry-grade control algorithm under computational-intensive workloads.Comment: 33 pages, 11 figure
P4FL: An Architecture for Federating Learning with In-Network Processing
The unceasing development of Artificial Intelligence (AI) and Machine Learning (ML) techniques is growing with privacy problems related to the training data. A relatively recent approach to partially cope with such concerns is Federated Learning (FL), a technique in which only the parameters of the trained neural network models are transferred rather than data. Despite the benefits that FL may provide, such an approach can lead to synchronization issues (especially when applied in the context of numerous IoT devices), the network and the server may turn into bottlenecks, and the load may become unsustainable for some nodes. To solve this issue and reduce the traffic on the network, in this paper, we propose P4FL , a novel FL architecture that uses the paradigm of network programmability to program P4 switches to compute intermediate aggregations. In particular, we defined a custom in-band protocol based on MPLS to carry the model parameters and adapted the P4 switch behavior to aggregate model gradients. We then evaluated P4FL in Mininet and verified that using network nodes for in-network model caching and gradient aggregating has two advantages: first, it alleviates the bottleneck effect of the central FL server; second, it further accelerates the entire training progress
Deep neural networks in the cloud: Review, applications, challenges and research directions
Deep neural networks (DNNs) are currently being deployed as machine learning technology in a wide
range of important real-world applications. DNNs consist of a huge number of parameters that require
millions of floating-point operations (FLOPs) to be executed both in learning and prediction modes. A
more effective method is to implement DNNs in a cloud computing system equipped with centralized
servers and data storage sub-systems with high-speed and high-performance computing capabilities.
This paper presents an up-to-date survey on current state-of-the-art deployed DNNs for cloud computing.
Various DNN complexities associated with different architectures are presented and discussed alongside
the necessities of using cloud computing. We also present an extensive overview of different cloud
computing platforms for the deployment of DNNs and discuss them in detail. Moreover, DNN applications
already deployed in cloud computing systems are reviewed to demonstrate the advantages of using
cloud computing for DNNs. The paper emphasizes the challenges of deploying DNNs in cloud computing
systems and provides guidance on enhancing current and new deployments.The EGIA project (KK-2022/00119The
Consolidated Research Group MATHMODE (IT1456-22
A Lightweight, Compiler-Assisted Register File Cache for GPGPU
Modern GPUs require an enormous register file (RF) to store the context of
thousands of active threads. It consumes considerable energy and contains
multiple large banks to provide enough throughput. Thus, a RF caching mechanism
can significantly improve the performance and energy consumption of the GPUs by
avoiding reads from the large banks that consume significant energy and may
cause port conflicts.
This paper introduces an energy-efficient RF caching mechanism called Malekeh
that repurposes an existing component in GPUs' RF to operate as a cache in
addition to its original functionality. In this way, Malekeh minimizes the
overhead of adding a RF cache to GPUs. Besides, Malekeh leverages an issue
scheduling policy that utilizes the reuse distance of the values in the RF
cache and is controlled by a dynamic algorithm. The goal is to adapt the issue
policy to the runtime program characteristics to maximize the GPU's performance
and the hit ratio of the RF cache. The reuse distance is approximated by the
compiler using profiling and is used at run time by the proposed caching
scheme. We show that Malekeh reduces the number of reads to the RF banks by
46.4% and the dynamic energy of the RF by 28.3%. Besides, it improves
performance by 6.1% while adding only 2KB of extra storage per core to the
baseline RF of 256KB, which represents a negligible overhead of 0.78%
- …