3,151 research outputs found

    Algorithms and Throughput Analysis for MDS-Coded Switches

    Full text link
    Network switches and routers need to serve packet writes and reads at rates that challenge the most advanced memory technologies. As a result, scaling the switching rates is commonly done by parallelizing the packet I/Os using multiple memory units. For improved read rates, packets can be coded with an [n,k] MDS code, thus giving more flexibility at read time to achieve higher utilization of the memory units. In the paper, we study the usage of [n,k] MDS codes in a switching environment. In particular, we study the algorithmic problem of maximizing the instantaneous read rate given a set of packet requests and the current layout of the coded packets in memory. The most interesting results from practical standpoint show how the complexity of reaching optimal read rate depends strongly on the writing policy of the coded packets.Comment: 6 pages, an extended version of a paper accepted to the 2015 IEEE International Symposium on Information Theory (ISIT

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    Improving Distributed Filesystem Performance by Combining Replica and Network Path Selection

    Get PDF
    Distributed filesystems are often the primary bandwidth consumers of large-scale datacenter networks. Unsurprisingly, the datacenter network is often the performance bottleneck for distributed filesystems. Yet even with this close relationship, current distributed filesystems and networks are designed independently and communicate over narrow interfaces that expose only their basic functionalities. Even network-aware distributed filesystems only make use of rudimentary network information, and are not reciprocally involved in making network decisions that affect filesystem performance. In this thesis, we introduce Mayflower, a new distributed filesystem co-designed with the control plane of its underlying datacenter network. This design approach enables Mayflower to combine both filesystem and network information to make replica selection and dynamic flow scheduling decisions. By having more information and controlling both the filesystem and the network, Mayflower can perform optimizations that are unavailable to conventional distributed filesystems and network control planes. Our evaluation results using a real implementation show that Mayflower reduces average read completion time by more than 60% compared to HDFS with ECMP

    A low cost reconfigurable soft processor for multimedia applications: design synthesis and programming model

    Get PDF
    This paper presents an FPGA implementation of a low cost 8 bit reconfigurable processor core for media processing applications. The core is optimized to provide all basic arithmetic and logic functions required by the media processing and other domains, as well as to make it easily integrable into a 2D array. This paper presents an investigation of the feasibility of the core as a potential soft processing architecture for FPGA platforms. The core was synthesized on the entire Virtex FPGA family to evaluate its overall performance, scalability and portability. A special feature of the proposed architecture is its simple programming model which allows low level programming. Throughput results for popular benchmarks coded using the programming model and cycle accurate simulator are presented

    Pixie: A heterogeneous Virtual Coarse-Grained Reconfigurable Array for high performance image processing applications

    Full text link
    Coarse-Grained Reconfigurable Arrays (CGRAs) enable ease of programmability and result in low development costs. They enable the ease of use specifically in reconfigurable computing applications. The smaller cost of compilation and reduced reconfiguration overhead enables them to become attractive platforms for accelerating high-performance computing applications such as image processing. The CGRAs are ASICs and therefore, expensive to produce. However, Field Programmable Gate Arrays (FPGAs) are relatively cheaper for low volume products but they are not so easily programmable. We combine best of both worlds by implementing a Virtual Coarse-Grained Reconfigurable Array (VCGRA) on FPGA. VCGRAs are a trade off between FPGA with large routing overheads and ASICs. In this perspective we present a novel heterogeneous Virtual Coarse-Grained Reconfigurable Array (VCGRA) called "Pixie" which is suitable for implementing high performance image processing applications. The proposed VCGRA contains generic processing elements and virtual channels that are described using the Hardware Description Language VHDL. Both elements have been optimized by using the parameterized configuration tool flow and result in a resource reduction of 24% for each processing elements and 82% for each virtual channels respectively.Comment: Presented at 3rd International Workshop on Overlay Architectures for FPGAs (OLAF 2017) arXiv:1704.0880

    Container network functions: bringing NFV to the network edge

    Get PDF
    In order to cope with the increasing network utilization driven by new mobile clients, and to satisfy demand for new network services and performance guarantees, telecommunication service providers are exploiting virtualization over their network by implementing network services in virtual machines, decoupled from legacy hardware accelerated appliances. This effort, known as NFV, reduces OPEX and provides new business opportunities. At the same time, next generation mobile, enterprise, and IoT networks are introducing the concept of computing capabilities being pushed at the network edge, in close proximity of the users. However, the heavy footprint of today's NFV platforms prevents them from operating at the network edge. In this article, we identify the opportunities of virtualization at the network edge and present Glasgow Network Functions (GNF), a container-based NFV platform that runs and orchestrates lightweight container VNFs, saving core network utilization and providing lower latency. Finally, we demonstrate three useful examples of the platform: IoT DDoS remediation, on-demand troubleshooting for telco networks, and supporting roaming of network functions

    Fabric-on-a-Chip: Toward Consolidating Packet Switching Functions on Silicon

    Get PDF
    The switching capacity of an Internet router is often dictated by the memory bandwidth required to buĀ¤er arriving packets. With the demand for greater capacity and improved service provisioning, inherent memory bandwidth limitations are encountered rendering input queued (IQ) switches and combined input and output queued (CIOQ) architectures more practical. Output-queued (OQ) switches, on the other hand, offer several highly desirable performance characteristics, including minimal average packet delay, controllable Quality of Service (QoS) provisioning and work-conservation under any admissible traffic conditions. However, the memory bandwidth requirements of such systems is O(NR), where N denotes the number of ports and R the data rate of each port. Clearly, for high port densities and data rates, this constraint dramatically limits the scalability of the switch. In an effort to retain the desirable attributes of output-queued switches, while significantly reducing the memory bandwidth requirements, distributed shared memory architectures, such as the parallel shared memory (PSM) switch/router, have recently received much attention. The principle advantage of the PSM architecture is derived from the use of slow-running memory units operating in parallel to distribute the memory bandwidth requirement. At the core of the PSM architecture is a memory management algorithm that determines, for each arriving packet, the memory unit in which it will be placed. However, to date, the computational complexity of this algorithm is O(N), thereby limiting the scalability of PSM switches. In an effort to overcome the scalability limitations, it is the goal of this dissertation to extend existing shared-memory architecture results while introducing the notion of Fabric on a Chip (FoC). In taking advantage of recent advancements in integrated circuit technologies, FoC aims to facilitate the consolidation of as many packet switching functions as possible on a single chip. Accordingly, this dissertation introduces a novel pipelined memory management algorithm, which plays a key role in the context of on-chip output- queued switch emulation. We discuss in detail the fundamental properties of the proposed scheme, along with hardware-based implementation results that illustrate its scalability and performance attributes. To complement the main effort and further support the notion of FoC, we provide performance analysis of output queued cell switches with heterogeneous traffic. The result is a flexible tool for obtaining bounds on the memory requirements in output queued switches under a wide range of traĀ¢ c scenarios. Additionally, we present a reconfigurable high-speed hardware architecture for real-time generation of packets for the various traffic scenarios. The work presented in this thesis aims at providing pragmatic foundations for designing next-generation, high-performance Internet switches and routers

    Domain-specific and reconfigurable instruction cells based architectures for low-power SoC

    Get PDF
    • ā€¦
    corecore