3,151 research outputs found
Algorithms and Throughput Analysis for MDS-Coded Switches
Network switches and routers need to serve packet writes and reads at rates
that challenge the most advanced memory technologies. As a result, scaling the
switching rates is commonly done by parallelizing the packet I/Os using
multiple memory units. For improved read rates, packets can be coded with an
[n,k] MDS code, thus giving more flexibility at read time to achieve higher
utilization of the memory units. In the paper, we study the usage of [n,k] MDS
codes in a switching environment. In particular, we study the algorithmic
problem of maximizing the instantaneous read rate given a set of packet
requests and the current layout of the coded packets in memory. The most
interesting results from practical standpoint show how the complexity of
reaching optimal read rate depends strongly on the writing policy of the coded
packets.Comment: 6 pages, an extended version of a paper accepted to the 2015 IEEE
International Symposium on Information Theory (ISIT
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
Improving Distributed Filesystem Performance by Combining Replica and Network Path Selection
Distributed filesystems are often the primary bandwidth consumers of
large-scale datacenter networks. Unsurprisingly, the datacenter network is
often the performance bottleneck for distributed filesystems. Yet even with
this close relationship, current distributed filesystems and networks are
designed independently and communicate over narrow interfaces that expose only
their basic functionalities. Even network-aware distributed
filesystems only make use of rudimentary network information, and are not
reciprocally involved in making network decisions that affect filesystem
performance.
In this thesis, we introduce Mayflower,
a new distributed filesystem co-designed
with the control plane of its underlying datacenter network. This design
approach enables Mayflower to combine both filesystem and network information
to make replica selection and dynamic flow scheduling decisions. By having more
information and controlling both the filesystem and the network, Mayflower can
perform optimizations that are unavailable to conventional distributed
filesystems and network control planes.
Our evaluation results using a real implementation
show that Mayflower reduces average
read completion time by more than 60% compared to HDFS with ECMP
A low cost reconfigurable soft processor for multimedia applications: design synthesis and programming model
This paper presents an FPGA implementation of a low cost 8 bit reconfigurable processor core for media processing applications. The core is optimized to provide all basic arithmetic and logic functions required by the media processing and other domains, as well as to make it easily integrable into a 2D array. This paper presents an investigation of the feasibility of the core as a potential soft processing architecture for FPGA platforms. The core was synthesized on the entire Virtex FPGA family to evaluate its overall performance, scalability and portability. A special feature of the proposed architecture is its simple programming model which allows low level programming. Throughput results for popular benchmarks coded using the programming model and cycle accurate simulator are presented
Pixie: A heterogeneous Virtual Coarse-Grained Reconfigurable Array for high performance image processing applications
Coarse-Grained Reconfigurable Arrays (CGRAs) enable ease of programmability
and result in low development costs. They enable the ease of use specifically
in reconfigurable computing applications. The smaller cost of compilation and
reduced reconfiguration overhead enables them to become attractive platforms
for accelerating high-performance computing applications such as image
processing. The CGRAs are ASICs and therefore, expensive to produce. However,
Field Programmable Gate Arrays (FPGAs) are relatively cheaper for low volume
products but they are not so easily programmable. We combine best of both
worlds by implementing a Virtual Coarse-Grained Reconfigurable Array (VCGRA) on
FPGA. VCGRAs are a trade off between FPGA with large routing overheads and
ASICs. In this perspective we present a novel heterogeneous Virtual
Coarse-Grained Reconfigurable Array (VCGRA) called "Pixie" which is suitable
for implementing high performance image processing applications. The proposed
VCGRA contains generic processing elements and virtual channels that are
described using the Hardware Description Language VHDL. Both elements have been
optimized by using the parameterized configuration tool flow and result in a
resource reduction of 24% for each processing elements and 82% for each virtual
channels respectively.Comment: Presented at 3rd International Workshop on Overlay Architectures for
FPGAs (OLAF 2017) arXiv:1704.0880
Container network functions: bringing NFV to the network edge
In order to cope with the increasing network utilization driven by new mobile clients, and to satisfy demand for new network services and performance guarantees, telecommunication service providers are exploiting virtualization over their network by implementing network services in virtual machines, decoupled from legacy hardware accelerated appliances. This effort, known as NFV, reduces OPEX and provides new business opportunities. At the same time, next generation mobile, enterprise, and IoT networks are introducing the concept of computing capabilities being pushed at the network edge, in close proximity of the users. However, the heavy footprint of today's NFV platforms prevents them from operating at the network edge. In this article, we identify the opportunities of virtualization at the network edge and present Glasgow Network Functions (GNF), a container-based NFV platform that runs and orchestrates lightweight container VNFs, saving core network utilization and providing lower latency. Finally, we demonstrate three useful examples of the platform: IoT DDoS remediation, on-demand troubleshooting for telco networks, and supporting roaming of network functions
Fabric-on-a-Chip: Toward Consolidating Packet Switching Functions on Silicon
The switching capacity of an Internet router is often dictated by the memory bandwidth required to buĀ¤er arriving packets. With the demand for greater capacity and improved service provisioning, inherent memory bandwidth limitations are encountered rendering input queued (IQ) switches and combined input and output queued (CIOQ) architectures more practical. Output-queued (OQ) switches, on the other hand, offer several highly desirable performance characteristics, including minimal average packet delay, controllable Quality of Service (QoS) provisioning and work-conservation under any admissible traffic conditions. However, the memory bandwidth requirements of such systems is O(NR), where N denotes the number of ports and R the data rate of each port. Clearly, for high port densities and data rates, this constraint dramatically limits the scalability of the switch.
In an effort to retain the desirable attributes of output-queued switches, while significantly reducing the memory bandwidth requirements, distributed shared memory architectures, such as the parallel shared memory (PSM) switch/router, have recently received much attention. The principle advantage of the PSM architecture is derived from the use of slow-running memory units operating in parallel to distribute the memory bandwidth requirement. At the core of the PSM architecture is a memory management algorithm that determines, for each arriving packet, the memory unit in which it will be placed. However, to date, the computational complexity of this algorithm is O(N), thereby limiting the scalability of PSM switches.
In an effort to overcome the scalability limitations, it is the goal of this dissertation to extend existing shared-memory architecture results while introducing the notion of Fabric on a Chip (FoC). In taking advantage of recent advancements in integrated circuit technologies, FoC aims to facilitate the consolidation of as many packet switching functions as possible on a single chip. Accordingly, this dissertation introduces a novel pipelined memory management algorithm, which plays a key role in the context of on-chip output- queued switch emulation. We discuss in detail the fundamental properties of the proposed scheme, along with hardware-based implementation results that illustrate its scalability and performance attributes.
To complement the main effort and further support the notion of FoC, we provide performance analysis of output queued cell switches with heterogeneous traffic. The result is a flexible tool for obtaining bounds on the memory requirements in output queued switches under a wide range of traĀ¢ c scenarios. Additionally, we present a reconfigurable high-speed hardware architecture for real-time generation of packets for the various traffic scenarios. The work presented in this thesis aims at providing pragmatic foundations for designing next-generation, high-performance Internet switches and routers
- ā¦