904 research outputs found

    Flashpoint: A Low-latency Serverless Platform for Deep Learning Inference Serving

    Get PDF
    Recent breakthroughs in Deep Learning (DL) have led to high demand for executing inferences in interactive services such as ChatGPT and GitHub Copilot. However, these interactive services require low-latency inferences, which can only be met with GPUs and result in exorbitant operating costs. For instance, ChatGPT reportedly requires millions of U.S. dollars in cloud GPUs to serve its 1+ million users. A potential solution to meet low-latency requirements with acceptable costs is to use serverless platforms. These platforms automatically scale resources to meet user demands. However, current serverless systems have long cold starts which worsen with larger DL models and lead to poor performance during bursts of requests. Meanwhile, the demand for larger and larger DL models make it more challenging to deliver an acceptable user experience cost-effectively. While current systems over-provision GPUs to address this issue, they incur high costs in idle resources which greatly reduces the benefit of using a serverless platform. In this thesis, we introduce Flashpoint, a GPU-based serverless platform that serves DL inferences with low latencies. Flashpoint achieves this by reducing cold start durations, especially for large DL models, making serverless computing feasible for latency-sensitive DL workloads. To reduce cold start durations, Flashpoint reduces download times by sourcing the DL model data from within the compute cluster rather than slow cloud storage. Additionally, Flashpoint minimizes in-cluster network congestion from redundant packet transfers of the same DL model to multiple machines with multicasting. Finally, Flashpoint also reduces cold start durations by automatically partitioning models and deploying them in parallel on multiple machines. The reduced cold start durations achieved by Flashpoint enable the platform to scale resource allocations elastically and complete requests with low latencies without over-provisioning expensive GPU resources. We perform large-scale data center simulations that were parameterized with measurements our prototype implementations. We evaluate the system using six state-of-the-art DL models ranging from 499 MB to 11 GB in size. We also measure the performance of the system in representative real-world traces from Twitter and Microsoft Azure. Our results in the full-scale simulations show that Flashpoint achieves an arithmetic mean of 93.51% shorter average cold start durations, leading to 75.42% and 66.90% respective reductions in average and 99th percentile end-to-end request latencies across the DL models with the same amount of resources. These results show that Flashpoint boosts the performance of serving DL inferences on a serverless platform without increasing costs

    RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design

    Full text link
    Software-defined networking (SDN) and software-defined flash (SDF) have been serving as the backbone of modern data centers. They are managed separately to handle I/O requests. At first glance, this is a reasonable design by following the rack-scale hierarchical design principles. However, it suffers from suboptimal end-to-end performance, due to the lack of coordination between SDN and SDF. In this paper, we co-design the SDN and SDF stack by redefining the functions of their control plane and data plane, and splitting up them within a new architecture named RackBlox. RackBlox decouples the storage management functions of flash-based solid-state drives (SSDs), and allow the SDN to track and manage the states of SSDs in a rack. Therefore, we can enable the state sharing between SDN and SDF, and facilitate global storage resource management. RackBlox has three major components: (1) coordinated I/O scheduling, in which it dynamically adjusts the I/O scheduling in the storage stack with the measured and predicted network latency, such that it can coordinate the effort of I/O scheduling across the network and storage stack for achieving predictable end-to-end performance; (2) coordinated garbage collection (GC), in which it will coordinate the GC activities across the SSDs in a rack to minimize their impact on incoming I/O requests; (3) rack-scale wear leveling, in which it enables global wear leveling among SSDs in a rack by periodically swapping data, for achieving improved device lifetime for the entire rack. We implement RackBlox using programmable SSDs and switch. Our experiments demonstrate that RackBlox can reduce the tail latency of I/O requests by up to 5.8x over state-of-the-art rack-scale storage systems.Comment: 14 pages. Published in published in ACM SIGOPS 29th Symposium on Operating Systems Principles (SOSP'23

    Managing Data Replication and Distribution in the Fog with FReD

    Full text link
    The heterogeneous, geographically distributed infrastructure of fog computing poses challenges in data replication, data distribution, and data mobility for fog applications. Fog computing is still missing the necessary abstractions to manage application data, and fog application developers need to re-implement data management for every new piece of software. Proposed solutions are limited to certain application domains, such as the IoT, are not flexible in regard to network topology, or do not provide the means for applications to control the movement of their data. In this paper, we present FReD, a data replication middleware for the fog. FReD serves as a building block for configurable fog data distribution and enables low-latency, high-bandwidth, and privacy-sensitive applications. FReD is a common data access interface across heterogeneous infrastructure and network topologies, provides transparent and controllable data distribution, and can be integrated with applications from different domains. To evaluate our approach, we present a prototype implementation of FReD and show the benefits of developing with FReD using three case studies of fog computing applications

    Resilient and Scalable Forwarding for Software-Defined Networks with P4-Programmable Switches

    Get PDF
    Traditional networking devices support only fixed features and limited configurability. Network softwarization leverages programmable software and hardware platforms to remove those limitations. In this context the concept of programmable data planes allows directly to program the packet processing pipeline of networking devices and create custom control plane algorithms. This flexibility enables the design of novel networking mechanisms where the status quo struggles to meet high demands of next-generation networks like 5G, Internet of Things, cloud computing, and industry 4.0. P4 is the most popular technology to implement programmable data planes. However, programmable data planes, and in particular, the P4 technology, emerged only recently. Thus, P4 support for some well-established networking concepts is still lacking and several issues remain unsolved due to the different characteristics of programmable data planes in comparison to traditional networking. The research of this thesis focuses on two open issues of programmable data planes. First, it develops resilient and efficient forwarding mechanisms for the P4 data plane as there are no satisfying state of the art best practices yet. Second, it enables BIER in high-performance P4 data planes. BIER is a novel, scalable, and efficient transport mechanism for IP multicast traffic which has only very limited support of high-performance forwarding platforms yet. The main results of this thesis are published as 8 peer-reviewed and one post-publication peer-reviewed publication. The results cover the development of suitable resilience mechanisms for P4 data planes, the development and implementation of resilient BIER forwarding in P4, and the extensive evaluations of all developed and implemented mechanisms. Furthermore, the results contain a comprehensive P4 literature study. Two more peer-reviewed papers contain additional content that is not directly related to the main results. They implement congestion avoidance mechanisms in P4 and develop a scheduling concept to find cost-optimized load schedules based on day-ahead forecasts

    Temperature Matrix-Based Data Placement Optimization in Edge Computing Environment

    Get PDF
    The scale of data shows an explosive growth trend, with wide use of cloud storage. However, there are challenges such as network latency and energy consumption. The emergence of edge computing brings data close to the edge of the network, making it a good supplement to cloud computing. The spatiotemporal characteristics of data have been largely ignored in studies of data placement and storage optimization. To this end, a temperature matrix-based data placement method using an improved Hungarian algorithm (TEMPLIH) is proposed in this work. A temperature matrix is used to reflect the influence of data characteristics on its placement. A data replica matrix selection algorithm based on temperature matrix (RSA-TM) is proposed to meet latency requirements. Then, an improved Hungarian algorithm based on replica matrix (IHA-RM) is proposed, which satisfies the balance among the multiple goals of latency, cost, and load balancing. Compared with other data placement strategies, experiments show that the proposed method can effectively reduce the cost of data placement while meeting user access latency requirements and maintaining a reasonable load balance between edge servers. Further improvement is discussed and the idea of regional value is proposed

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Predicting Temporal Aspects of Movement for Predictive Replication in Fog Environments

    Full text link
    To fully exploit the benefits of the fog environment, efficient management of data locality is crucial. Blind or reactive data replication falls short in harnessing the potential of fog computing, necessitating more advanced techniques for predicting where and when clients will connect. While spatial prediction has received considerable attention, temporal prediction remains understudied. Our paper addresses this gap by examining the advantages of incorporating temporal prediction into existing spatial prediction models. We also provide a comprehensive analysis of spatio-temporal prediction models, such as Deep Neural Networks and Markov models, in the context of predictive replication. We propose a novel model using Holt-Winter's Exponential Smoothing for temporal prediction, leveraging sequential and periodical user movement patterns. In a fog network simulation with real user trajectories our model achieves a 15% reduction in excess data with a marginal 1% decrease in data availability

    In-band-full-duplex integrated access and backhaul enabled next generation wireless networks

    Get PDF
    In sixth generation (6G) wireless networks, the severe traffic congestion in the microwave frequencies motivates the exploration of the large available bandwidth in the millimetre-wave (mmWave) frequencies to achieve higher network capacity and data rate. Since large-scale antenna arrays and dense base station deployment are required, the hybrid beamforming architecture and the recently proposed integrated access and backhaul (IAB) networks become potential candidates for providing cost and hardware-friendly techniques for 6G wireless networks. In addition, in-band-full-duplex (IBFD) has been recently paid much more research attention since it can make the transmission and reception occur in the same time and frequency band, which nearly doubles the communication spectral efficiency (SE) compared with state-of-the-art half-duplex (HD) systems. Since 6G will explore sensing as its new capability, future wireless networks can go far beyond communications. Motivated by this, the development of integrated sensing and communications (ISAC) systems, where radar and communication systems share the same spectrum resources and hardware, has become one of the major goals in 6G. This PhD thesis focuses on the design and analysis of IBFD-IAB wireless networks in the frequency range 2 (FR2) band (≥ 24.250 GHz) at mmWave frequencies for the potential use in 6G. Firstly, we develop a novel design for the single-cell FR2-IBFD-IAB networks with subarray-based hybrid beamforming, which can enhance the SE and coverage while reducing the latency. The radio frequency (RF) beamformers are obtained via RF codebooks given by a modified matrix-wise Linde-Buzo-Gray (LBG) algorithm. The self-interference (SI) is cancelled in three stages, where the first stage of antenna isolation is assumed to be successfully deployed. The second stage consists of the optical domain-based RF cancellation, where cancellers are connected with the RF chain pairs. The third stage is comprised of the digital cancellation via successive interference cancellation followed by minimum mean-squared error (MSE) baseband receiver. Multiuser interference in the access link is cancelled by zero-forcing at the IAB-node transmitter. The proposed codebook algorithm avoids undesirable low-rank behaviour, while the proposed staged-SI cancellation (SIC) shows satisfactory cancellation performance in the wideband IBFD scenario. However, the system performance can be affected by the hardware impairments (HWI) and RF effective channel estimation errors. Secondly, we study an FR2-IBFD-ISAC-IAB network for vehicle-to-everything communications, where the IAB-node acts as a roadside unit performing sensing and communication simultaneously (i.e., at the same time and frequency band). The SI due to the IBFD operation will be cancelled in the propagation, analogue, and digital domains; only the residual SI (RSI) is reserved for performance analysis. Considering the subarray-based hybrid beamforming structure, including HWI and RF effective SI channel estimation error, the unscented Kalman filter is used for tracking multiple vehicles in the studied scenario. The proposed system shows an enhanced SE compared with the HD system, and the tracking MSEs averaged across all vehicles of each state parameter are close to their posterior Cramér-Rao lower bounds. Thirdly, we analyse the performance of the multi-cell wideband single-hop backhaul FR2-IBFD-IAB networks by using stochastic geometry analysis. We model the wired-connected next generation NodeBs (gNBs) as the Matérn hard-core point process (MHCPP) to meet the real-world deployment requirement and reduce the cost caused by wired connection in the network. We first derive association probabilities that reflect how likely the typical user-equipment is served by a gNB or an IAB-node based on the maximum long-term averaged biased-received-desired-signal power criteria. Further, by leveraging the composite Gamma-Lognormal distribution, we derive results for the signal to interference plus noise ratio coverage, capacity with outage, and ergodic capacity of the network. In order to assess the impact of noise, we consider the sidelobe gain on inter-cell interference links and the analogue to digital converter quantization noise. Compared with the HD transmission, the designated system shows an enhanced capacity when the SIC operates successfully. We also study how the power bias and density ratio of the IAB-node to gNB, and the hard-core distance can affect system performance. Overall, this thesis aims to contribute to the research efforts of shaping the 6G wireless networks by designing and analysing the FR2-IBFD-IAB inspired networks in the FR2 band at mmWave frequencies that will be potentially used in 6G for both communication only and ISAC scenarios

    Efficiency and Sustainability of the Distributed Renewable Hybrid Power Systems Based on the Energy Internet, Blockchain Technology and Smart Contracts-Volume II

    Get PDF
    The climate changes that are becoming visible today are a challenge for the global research community. In this context, renewable energy sources, fuel cell systems, and other energy generating sources must be optimally combined and connected to the grid system using advanced energy transaction methods. As this reprint presents the latest solutions in the implementation of fuel cell and renewable energy in mobile and stationary applications, such as hybrid and microgrid power systems based on the Energy Internet, Blockchain technology, and smart contracts, we hope that they will be of interest to readers working in the related fields mentioned above

    Energy-efficient RL-based aerial network deployment testbed for disaster areas

    Get PDF
    Rapid deployment of wireless devices with 5G and beyond enabled a connected world. However, an immediate demand increase right after a disaster paralyzes network infrastructure temporarily. The continuous flow of information is crucial during disaster times to coordinate rescue operations and identify the survivors. Communication infrastructures built for users of disaster areas should satisfy rapid deployment, increased coverage, and availability. Unmanned air vehicles (UAV) provide a potential solution for rapid deployment as they are not affected by traffic jams and physical road damage during a disaster. In addition, ad-hoc WiFi communication allows the generation of broadcast domains within a clear channel which eases one-to-many communications. Moreover, using reinforcement learning (RL) helps reduce the computational cost and increases the accuracy of the NP-hard problem of aerial network deployment. To this end, a novel flying WiFi ad-hoc network management model is proposed in this paper. The model utilizes deep-Q-learning to maintain quality-of-service (QoS), increase user equipment (UE) coverage, and optimize power efficiency. Furthermore, a testbed is deployed on Istanbul Technical University (ITU) campus to train the developed model. Training results of the model using testbed accumulates over 90% packet delivery ratio as QoS, over 97% coverage for the users in flow tables, and 0.28 KJ/Bit average power consumption
    corecore