54 research outputs found

    RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design

    Full text link
    Software-defined networking (SDN) and software-defined flash (SDF) have been serving as the backbone of modern data centers. They are managed separately to handle I/O requests. At first glance, this is a reasonable design by following the rack-scale hierarchical design principles. However, it suffers from suboptimal end-to-end performance, due to the lack of coordination between SDN and SDF. In this paper, we co-design the SDN and SDF stack by redefining the functions of their control plane and data plane, and splitting up them within a new architecture named RackBlox. RackBlox decouples the storage management functions of flash-based solid-state drives (SSDs), and allow the SDN to track and manage the states of SSDs in a rack. Therefore, we can enable the state sharing between SDN and SDF, and facilitate global storage resource management. RackBlox has three major components: (1) coordinated I/O scheduling, in which it dynamically adjusts the I/O scheduling in the storage stack with the measured and predicted network latency, such that it can coordinate the effort of I/O scheduling across the network and storage stack for achieving predictable end-to-end performance; (2) coordinated garbage collection (GC), in which it will coordinate the GC activities across the SSDs in a rack to minimize their impact on incoming I/O requests; (3) rack-scale wear leveling, in which it enables global wear leveling among SSDs in a rack by periodically swapping data, for achieving improved device lifetime for the entire rack. We implement RackBlox using programmable SSDs and switch. Our experiments demonstrate that RackBlox can reduce the tail latency of I/O requests by up to 5.8x over state-of-the-art rack-scale storage systems.Comment: 14 pages. Published in published in ACM SIGOPS 29th Symposium on Operating Systems Principles (SOSP'23

    Improved Signal Detection for Ambient Backscatter Communications

    Full text link
    In ambient backscatter communication (AmBC) systems, passive tags connect to a reader by reflecting an ambient radio frequency (RF) signal. However, the reader may not know the channel states and RF source parameters and can experience interference. The traditional energy detector (TED) appears to be an ideal solution. However, it performs poorly under these conditions. To address this, we propose two new detectors: (1) A joint correlation-energy detector (JCED) based on the first-order correlation of the received samples and (2) An improved energy detector (IED) based on the p-th norm of the received signal vector. We compare the performance of the IED and TED under generalized noise modeled using the McLeish distribution and derive a general analytical formula for the area under the receiver operating characteristic (ROC) curves. Based on our results, both detectors outperform TED. For example, the probability of detection with a false alarm rate of 1% for JCED and IED is 14% and 5% higher, respectively, compared to TED. These gains are even higher using the direct interference cancellation (DIC) technique, with increases of 16% and 7%, respectively. Overall, our proposed detectors offer better performance than the TED, making them useful tools for improving AmBC system performance.Comment: This paper has got Major Revision by IEEE TGC

    A Comprehensive Study on Off-path SmartNIC

    Full text link
    SmartNIC has recently emerged as an attractive device to accelerate distributed systems. However, there has been no comprehensive characterization of SmartNIC especially on the network part. This paper presents the first comprehensive study of off-path SmartNIC. Our experimental study uncovers the key performance characteristics of the communication among the client, SmartNIC SoC, and the host. We find without considering SmartNIC hardware architecture, communications with it can cause up to 48% bandwidth degradation due to performance anomalies. We also propose implications to address the anomalies.Comment: This is the short version. Full version will appear at OSDI2

    Personal Data Stores (PDS): A Review

    Get PDF
    Internet services have collected our personal data since their inception. In the beginning, the personal data collection was uncoordinated and was limited to a few selected data types such as names, ages, birthdays, etc. Due to the widespread use of social media, more and more personal data has been collected by different online services. We increasingly see that Internet of Things (IoT) devices are also being adopted by consumers, making it possible for companies to capture personal data (including very sensitive data) with much less effort and autonomously at a very low cost. Current systems architectures aim to collect, store, and process our personal data in the cloud with very limited control when it comes to giving back to citizens. However, Personal Data Stores (PDS) have been proposed as an alternative architecture where personal data will be stored within households, giving us complete control (self-sovereignty) over our data. This paper surveys the current literature on Personal Data Stores (PDS) that enable individuals to collect, control, store, and manage their data. In particular, we provide a comprehensive review of related concepts and the expected benefits of PDS platforms. Further, we compare and analyse existing PDS platforms in terms of their capabilities and core components. Subsequently, we summarise the major challenges and issues facing PDS platforms’ development and widespread adoption

    A one-pass clustering based sketch method for network monitoring

    Get PDF
    Network monitoring solutions need to cope with increasing network traffic volumes, as a result, sketch-based monitoring methods have been extensively studied to trade accuracy for memory scalability and storage reduction. However, sketches are sensitive to skewness in network flow distributions due to hash collisions, and need complicated performance optimization to adapt to line-rate packet streams. We provide Jellyfish, an efficient sketch method that performs one-pass clustering over the network stream. One-pass clustering is realized by adapting the monitoring granularity from the whole network flow to fragments called subflows, which not only reduces the ingestion rate but also provides an efficient intermediate representation for the input to the sketch. Jellyfish provides the network-flow level query interface by reconstructing the network-flow level counters by merging subflow records from the same network flow. We provide probabilistic analysis of the expected accuracy of both existing sketch methods and Jellyfish. Real-world trace-driven experiments show that Jellyfish reduces the average estimation errors by up to six orders of magnitude for per-flow queries, by six orders of magnitude for entropy queries, and up to ten times for heavy-hitter queries.This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 61972409; in part by Hong Kong Research Grants Council (RGC) under Grant TRS T41-603/20-R, Grant GRF-16213621, and Grant ITF ACCESS; in part by the Spanish I+D+i project TRAINER-A, funded by MCIN/AEI/10.13039/501100011033, under Grant PID2020-118011GB-C21; and in part by the Catalan Institution for Research and Advanced Studies (ICREA Academia).Peer ReviewedPostprint (author's final draft

    Towards Scalable OLTP Over Fast Networks

    Get PDF
    Online Transaction Processing (OLTP) underpins real-time data processing in many mission-critical applications, from banking to e-commerce. These applications typically issue short-duration, latency-sensitive transactions that demand immediate processing. High-volume applications, such as Alibaba's e-commerce platform, achieve peak transaction rates as high as 70 million transactions per second, exceeding the capacity of a single machine. Instead, distributed OLTP database management systems (DBMS) are deployed across multiple powerful machines. Historically, such distributed OLTP DBMSs have been primarily designed to avoid network communication, a paradigm largely unchanged since the 1980s. However, fast networks challenge the conventional belief that network communication is the main bottleneck. In particular, emerging network technologies, like Remote Direct Memory Access (RDMA), radically alter how data can be accessed over a network. RDMA's primitives allow direct access to the memory of a remote machine within an order of magnitude of local memory access. This development invalidates the notion that network communication is the primary bottleneck. Given that traditional distributed database systems have been designed with the premise that the network is slow, they cannot efficiently exploit these fast network primitives, which requires us to reconsider how we design distributed OLTP systems. This thesis focuses on the challenges RDMA presents and its implications on the design of distributed OLTP systems. First, we examine distributed architectures to understand data access patterns and scalability in modern OLTP systems. Drawing on these insights, we advocate a distributed storage engine optimized for high-speed networks. The storage engine serves as the foundation of a database, ensuring efficient data access through three central components: indexes, synchronization primitives, and buffer management (caching). With the introduction of RDMA, the landscape of data access has undergone a significant transformation. This requires a comprehensive redesign of the storage engine components to exploit the potential of RDMA and similar high-speed network technologies. Thus, as the second contribution, we design RDMA-optimized tree-based indexes — especially applicable for disaggregated databases to access remote data efficiently. We then turn our attention to the unique challenges of RDMA. One-sided RDMA, one of the network primitives introduced by RDMA, presents a performance advantage in enabling remote memory access while bypassing the remote CPU and the operating system. This allows the remote CPU to process transactions uninterrupted, with no requirement to be on hand for network communication. However, that way, specialized one-sided RDMA synchronization primitives are required since traditional CPU-driven primitives are bypassed. We found that existing RDMA one-sided synchronization schemes are unscalable or, even worse, fail to synchronize correctly, leading to hard-to-detect data corruption. As our third contribution, we address this issue by offering guidelines to build scalable and correct one-sided RDMA synchronization primitives. Finally, recognizing that maintaining all data in memory becomes economically unattractive, we propose a distributed buffer manager design that efficiently utilizes cost-effective NVMe flash storage. By leveraging low-latency RDMA messages, our buffer manager provides a transparent memory abstraction, accessing the aggregated DRAM and NVMe storage across nodes. Central to our approach is a distributed caching protocol that dynamically caches data. With this approach, our system can outperform RDMA-enabled in-memory distributed databases while managing larger-than-memory datasets efficiently

    Survivability with Adaptive Routing and Reactive Defragmentation in IP-over-EON after A Router Outage

    Get PDF
    The occurrence of a router outage in the IP layer can lead to network survivability issues in IP-over-elastic-optical networks with consequent effects on the existing connections used in transiting the router. This usually leads to the application of a path to recover any affected traffic by utilizing the spare capacity of the unaffected lightpath on each link. However, the spare capacity in some links is sometimes insufficient and thus needs to be spectrally expanded. A new lightpath is also sometimes required when it is impossible to implement the first process. It is important to note that both processes normally lead to a large number of lightpath reconfigurations when applied to different unaffected lightpaths. Therefore, this study proposes an adaptive routing strategy to generate the best path with the ability to optimize the use of unaffected lightpaths to perform reconfiguration and minimize the addition of free spectrum during the expansion process. The reactive defragmentation strategy is also applied when it is impossible to apply spectrum expansion because of the obstruction of the neighboring spectrum. This proposed strategy is called lightpath reconfiguration and spectrum expansion with reactive defragmentation (LRSE+RD), and its performance was compared to the first Shortest Path (1SP) as the benchmark without a reactive defragmentation strategy. The simulation conducted for the two topologies with two traffic conditions showed that LRSE+RD succeeded in reducing the lightpath reconfigurations, new lightpath number, and additional power consumption, including the additional operational expense compared to 1SP

    A bidirectional wavelength division multiplexed (WDM) free space optical communication (FSO) system for deployment in data center networks (DCNs)

    Get PDF
    Data centers are crucial to the growth of cloud computing. Next-generation data center networks (DCNs) will rely heavily on optical technology. Here, we have investigated a bidirectional wavelength-division-multiplexed (WDM) free space optical communication (FSO) system for deployment in optical wireless DCNs. The system was evaluated for symmetric 10 Gbps 16−quadrature amplitude modulation (16-QAM) intensity-modulated orthogonal frequency-division multiplexing (OFDM) downstream signals and 10 Gbps on-off keying (OOK) upstream signals, respectively. The transmission of optical signals over an FSO link is demonstrated using a gamma–gamma channel model. According to the bit error rate (BER) results obtained for each WDM signal, the bidirectional WDM-FSO transmission could achieve 320 Gbps over 1000 m free space transmission length. The results show that the proposed FSO topology offers an excellent alternative to fiber-based optical interconnects in DCNs, allowing for high data rate bidirectional transmission

    Online learning on the programmable dataplane

    Get PDF
    This thesis makes the case for managing computer networks with datadriven methods automated statistical inference and control based on measurement data and runtime observations—and argues for their tight integration with programmable dataplane hardware to make management decisions faster and from more precise data. Optimisation, defence, and measurement of networked infrastructure are each challenging tasks in their own right, which are currently dominated by the use of hand-crafted heuristic methods. These become harder to reason about and deploy as networks scale in rates and number of forwarding elements, but their design requires expert knowledge and care around unexpected protocol interactions. This makes tailored, per-deployment or -workload solutions infeasible to develop. Recent advances in machine learning offer capable function approximation and closed-loop control which suit many of these tasks. New, programmable dataplane hardware enables more agility in the network— runtime reprogrammability, precise traffic measurement, and low latency on-path processing. The synthesis of these two developments allows complex decisions to be made on previously unusable state, and made quicker by offloading inference to the network. To justify this argument, I advance the state of the art in data-driven defence of networks, novel dataplane-friendly online reinforcement learning algorithms, and in-network data reduction to allow classification of switchscale data. Each requires co-design aware of the network, and of the failure modes of systems and carried traffic. To make online learning possible in the dataplane, I use fixed-point arithmetic and modify classical (non-neural) approaches to take advantage of the SmartNIC compute model and make use of rich device local state. I show that data-driven solutions still require great care to correctly design, but with the right domain expertise they can improve on pathological cases in DDoS defence, such as protecting legitimate UDP traffic. In-network aggregation to histograms is shown to enable accurate classification from fine temporal effects, and allows hosts to scale such classification to far larger flow counts and traffic volume. Moving reinforcement learning to the dataplane is shown to offer substantial benefits to stateaction latency and online learning throughput versus host machines; allowing policies to react faster to fine-grained network events. The dataplane environment is key in making reactive online learning feasible—to port further algorithms and learnt functions, I collate and analyse the strengths of current and future hardware designs, as well as individual algorithms

    Ohjelmoitava saumaton moniliitettävyys

    Get PDF
    Our devices have become accustomed to being always connected to the Internet. Our devices from handheld devices, such as smartphones and tablets, to our laptops and even desktop PCs are capable of using both wired and wireless networks, ranging from mobile networks such as 5G or 6G in the future to Wi-Fi, Bluetooth, and Ethernet. The applications running on the devices can use different transport protocols from traditional TCP and UDP to state-of-the-art protocols such as QUIC. However, most of our applications still use TCP, UDP, and other protocols in a similar way as they were originally designed in the 1980s, four decades ago. The transport connections are a single path from the source to the destination, using the end-to-end principle without taking advantage of the multiple available transports. Over the years, there have been a lot of studies on both multihoming and multipath protocols, i.e., allowing transports to use multiple paths and interfaces to the destination. Using these would allow better mobility and more efficient use of available transports. However, Internet ossification has hindered their deployment. One of the main reasons for the ossification is the IPv4 Network Address Translation (NAT) introduced in 1993, which allowed whole networks to be hosted behind a single public IP address. Unfortunately, how this many-to-one translation should be done was not standardized thoroughly, allowing vendors to implement their own versions of NAT. While breaking the end-to-end principle, the different versions of NATs also behave unpredictably when encountering other transport protocols than the traditional TCP and UDP, from forwarding packets without translating the packet headers to even discarding the packets that they do not recognize. Similarly, in the context of multiconnectivity, NATs and other middleboxes such as firewalls and load balancers likely prevent connection establishment for multipath protocols unless they are specially designed to support that particular protocol. One promising avenue for solving these issues is Software-Defined Networking (SDN). SDN allows the forwarding elements of the network to remain relatively simple by separating the data plane from the control plane. In SDN, the control plane is realized through SDN controllers, which control how traffic is forwarded by the data plane. This allows controllers to have full control over the traffic inside the network, thus granting fine-grained control of the connections and allowing faster deployment of new protocols. Unfortunately, SDN-capable network elements are still rare in Small Office / Home Office (SOHO) networks, as legacy forwarding elements that do not support SDN can support the majority of contemporary protocols. The most glaring example is the Wi-Fi networks, where the Access Points (AP) typically do not support SDN, and allow traffic to flow between clients without the control of the SDN controllers. In this thesis, we provide a background on why multiconnectivity is still hard, even though there have been decades worth of research on solving it. We also demonstrate how the same devices that made multiconnectivity hard can be used to bring SDN-based traffic control to wireless and SOHO networks. We also explore how this SDN-based traffic control can be leveraged for building a network orchestrator for controlling and managing networks consisting of heterogeneous devices and their controllers. With the insights provided by the legacy devices and programmable networks, we demonstrate two different methods for providing multiconnectivity; one using network-driven programmability, and one using a userspace library, that brings different multihoming and multipathing methods under one roof.Nykyisin kaikki käyttämämme laitteet ovat käytännössä aina yhteydessä Internettiin. Laitteemme voivat käyttää useita erilaisia yhteystapoja, mukaanlukien sekä langallisia, että langattomia verkkoja, kuten Wi-Fi ja mobiiliverkkoja. Kuitenkin laitteemme käyttävät pääsääntöisesti edelleen tietoliikenneprotokollia, jotka suunniteltiin alunperin 1980-luvulla. Tällöin laitteet pystyivät viestimään suoraan toistensa kanssa ilman, että välissä oli verkkolaitteita, jotka piilottivat osia verkosta taakseen. Tämä näkyy protokollien suunnittelussa siten, että jokaisella yhteydellä on määritetyt lähde- ja kohdeosoitteet. Nykyisin laitteemme käyttävät edelleen samaa yhteysparadigmaa, vaikka ne voisivat niputtaa yhteen useampia tietoliikenneyhteyksiä. Tällöin saisimme paremmin käyttöön verkon tarjoaman suorituskyvyn ja muut ominaisuudet. Vuosien saatossa on kehitetty erilaisia monitie (eng. multipath) ja moniyhteys (eng. multihoming) tietoliikenneprotokollia, joiden avulla laitteet pystyvät käyttämään useampia polkuja verkon yli kohteeseensa. Nämä protokollat eivät kuitenkaan ole vielä yleistyneet, sillä kaikki verkkolaitteet eivät tue niitä. Emme myöskään pysty vaikuttamaan kuin ainoastaan epäsuorasti siihen, mitä yhteyttä laitteemme käyttävät. Yksi ratkaisu on tähän ottaa käyttöön ohjelmallisesti määritetyt verkot (eng. Software-Defined Networking, SDN). SDN on paradigma, jonka avulla verkkoihin voidaan tuoda älykkyyttä ja mahdollistaa mm. tehokkaampi liikenteen reititys verkoissa. Tämän väitöskirjatutkimuksen tarkoituksena on käsitellä moniliitettävyyden ongelmia ja ratkaisuja. Tutkimus valottaa miksi moniliitettävyys on edelleen hankala toteuttaa, sekä esittelee kaksi tekniikkaa toteuttaa moniliitettävyys. Ensimmäinen tekniikka soveltaa ohjelmallisesti määritettyjä verkkoja käyttäen hyväkseen väitöskirjan aikana tehtyä tutkimusta, ja toinen tekniikka kerää saman katon alle useita erilaisia monitie- ja moniyhteysprotokollia yhdeksi moniliitettävyyskirjastoksi. Väitöskirjassa esitellään myös kaksi menetelmää tuoda ohjelmallisesti määritetyt verkot laitteisiin, joita ei ole suunniteltu niitä silmällä pitäen. Näiden menetelmien avulla voidaan hallita ja tuoda uusia ominaisuuksia jo olemassa oleviin verkkoihin. Väitöskirjassa esitellään myös koneoppimista soveltava älykäs järjestelmä, joka havaitsee ja poistaa automaattisesti haavoittuvia laitteita verkosta
    corecore