2,349 research outputs found
Achieving 100% throughput for multicast traffic in input-queued switches
A general approach of designing input-queued multicast switch is to employ multicast switch fabric, where packets can be replicated inside the switch fabric. As compared with unicast switch fabric, the achievable traffic rate region of a switch can be increased, but it is still less than the admissible traffic rate region. In other words, achieving 100% throughput for any admissible multicast traffic pattern is not possible. In this paper, we first revisit the fundamental problems faced by input-queued switch in supporting multicast traffic. We then argue that multicast switch fabric is not necessary if a load-balanced approach is followed. Accordingly, an existing load-balanced two-stage switch architecture [12], consisting of unicast switch fabrics, can be adopted to provide 100% throughput for any admissible multicast traffic pattern. Since the two-stage switch requires no speedup in both switch fabric and packet buffers, we consider it a two-stage input-queued switch. It can be seen that its implementation complexity is much lower than conventional (single-stage) input-queued multicast switches. As compared with the work in [12], our approach is more systematic and we propose a more effective load balancing mechanism. © 2011 IEEE.link_to_subscribed_fulltextProceedings of the IEEE Global Telecommunications Conference (GLOBECOM 2011), Houston, TX, USA, 5-9 December 201
Network Coding in a Multicast Switch
We consider the problem of serving multicast flows in a crossbar switch. We
show that linear network coding across packets of a flow can sustain traffic
patterns that cannot be served if network coding were not allowed. Thus,
network coding leads to a larger rate region in a multicast crossbar switch. We
demonstrate a traffic pattern which requires a switch speedup if coding is not
allowed, whereas, with coding the speedup requirement is eliminated completely.
In addition to throughput benefits, coding simplifies the characterization of
the rate region. We give a graph-theoretic characterization of the rate region
with fanout splitting and intra-flow coding, in terms of the stable set
polytope of the 'enhanced conflict graph' of the traffic pattern. Such a
formulation is not known in the case of fanout splitting without coding. We
show that computing the offline schedule (i.e. using prior knowledge of the
flow arrival rates) can be reduced to certain graph coloring problems. Finally,
we propose online algorithms (i.e. using only the current queue occupancy
information) for multicast scheduling based on our graph-theoretic formulation.
In particular, we show that a maximum weighted stable set algorithm stabilizes
the queues for all rates within the rate region.Comment: 9 pages, submitted to IEEE INFOCOM 200
Optimistic Parallel State-Machine Replication
State-machine replication, a fundamental approach to fault tolerance,
requires replicas to execute commands deterministically, which usually results
in sequential execution of commands. Sequential execution limits performance
and underuses servers, which are increasingly parallel (i.e., multicore). To
narrow the gap between state-machine replication requirements and the
characteristics of modern servers, researchers have recently come up with
alternative execution models. This paper surveys existing approaches to
parallel state-machine replication and proposes a novel optimistic protocol
that inherits the scalable features of previous techniques. Using a replicated
B+-tree service, we demonstrate in the paper that our protocol outperforms the
most efficient techniques by a factor of 2.4 times
FTMS: an efficient multicast scheduling algorithm for feedback-based two-stage switch
Session - NGNI02: Router Architecture & Switch DesignTwo major challenges in designing high-speed multicast switches are the expensive multicast switch fabric and the highly complicated central scheduler. While the recent load-balanced switch architecture uses simple unicast switch fabric and does not require a central scheduler, it is only good at handling unicast traffic. In this paper, we extend an existing load-balanced switch called feedback-based two-stage switch to support multicast traffic. In particular, an efficient multicast scheduling algorithm (FTMS) is designed. With FTMS, head-of-line (HOL) packet blocking at each input port is eliminated by adopting 'pointer' queues. To cut down queuing delay, packet replication is carried out at middle-stage ports. As compared with other multicast scheduling algorithms, simulation results show that our FTMS always provides the highest throughput. © 2012 IEEE.published_or_final_versio
Software Defined Networks based Smart Grid Communication: A Comprehensive Survey
The current power grid is no longer a feasible solution due to
ever-increasing user demand of electricity, old infrastructure, and reliability
issues and thus require transformation to a better grid a.k.a., smart grid
(SG). The key features that distinguish SG from the conventional electrical
power grid are its capability to perform two-way communication, demand side
management, and real time pricing. Despite all these advantages that SG will
bring, there are certain issues which are specific to SG communication system.
For instance, network management of current SG systems is complex, time
consuming, and done manually. Moreover, SG communication (SGC) system is built
on different vendor specific devices and protocols. Therefore, the current SG
systems are not protocol independent, thus leading to interoperability issue.
Software defined network (SDN) has been proposed to monitor and manage the
communication networks globally. This article serves as a comprehensive survey
on SDN-based SGC. In this article, we first discuss taxonomy of advantages of
SDNbased SGC.We then discuss SDN-based SGC architectures, along with case
studies. Our article provides an in-depth discussion on routing schemes for
SDN-based SGC. We also provide detailed survey of security and privacy schemes
applied to SDN-based SGC. We furthermore present challenges, open issues, and
future research directions related to SDN-based SGC.Comment: Accepte
BOOM: Broadcast Optimizations for On-chip Meshes
Future many-core chips will require an on-chip network that can support broadcasts and multicasts at good power-performance. A vanilla on-chip network would send multiple unicast packets for each broadcast packet, resulting in latency, throughput and power overheads. Recent research in on-chip multicast support has proposed forking of broadcast/multicast packets within the network at the router buffers, but these techniques are far from ideal, since they increase buffer occupancy which lowers throughput, and packets incur delay and power penalties at each router. In this work, we analyze an ideal broadcast mesh; show the substantial gaps between state-of-the-art multicast NoCs and the ideal; then propose BOOM, which comprises a WHIRL routing protocol that ideally load balances broadcast traffic, a mXbar multicast crossbar circuit that enables multicast traversal at similar energy-delay as unicasts, and speculative bypassing of buffering for multicast flits. Together, they enable broadcast packets to approach the delay, energy, and throughput of the ideal fabric. Our simulations show BOOM realizing an average network latency that is 5% off ideal, attaining 96% of ideal throughput, with energy consumption that is 9% above ideal. Evaluations using synthetic traffic show BOOM achieving a latency reduction of 61%, throughput improvement of 63%, and buffer power reduction of 80% as compared to a baseline broadcast. Simulations with PARSEC benchmarks show BOOM reducing average request and network latency by 40% and 15% respectively
Recommended from our members
On Multicast in Asynchronous Networks-on-Chip: Techniques, Architectures, and FPGA Implementation
In this era of exascale computing, conventional synchronous design techniques are facing unprecedented challenges. The consumer electronics market is replete with many-core systems in the range of 16 cores to thousands of cores on chip, integrating multi-billion transistors. However, with this ever increasing complexity, the traditional design approaches are facing key issues such as increasing chip power, process variability, aging, thermal problems, and scalability. An alternative paradigm that has gained significant interest in the last decade is asynchronous design. Asynchronous designs have several potential advantages: they are naturally energy proportional, burning power only when active, do not require complex clock distribution, are robust to different forms of variability, and provide ease of composability for heterogeneous platforms. Networks-on-chip (NoCs) is an interconnect paradigm that has been introduced to deal with the ever-increasing system complexity. NoCs provide a distributed, scalable, and efficient interconnect solution for today’s many-core systems. Moreover, NoCs are a natural match with asynchronous design techniques, as they separate communication infrastructure and timing from the computational elements. To this end, globally-asynchronous locally-synchronous (GALS) systems that interconnect multiple processing cores, operating at different clock speeds, using an asynchronous NoC, have gained significant interest. While asynchronous NoCs have several advantages, they also face a key challenge of supporting new types of traffic patterns. Once such pattern is multicast communication, where a source sends packets to arbitrary number of destinations. Multicast is not only common in parallel computing, such as for cache coherency, but also for emerging areas such as neuromorphic computing. This important capability has been largely missing from asynchronous NoCs. This thesis introduces several efficient multicast solutions for these interconnects. In particular, techniques, and network architectures are introduced to support high-performance and low-power multicast. Two leading network topologies are the focus: a variant mesh-of-trees (MoT) and a 2D mesh. In addition, for a more realistic implementation and analysis, as well as significantly advancing the field of asynchronous NoCs, this thesis also targets synthesis of these NoCs on commercial FPGAs. While there has been significant advances in FPGA technologies, there has been only limited research on implementing asynchronous NoCs on FPGAs. To this end, a systematic computeraided design (CAD) methodology has been introduced to efficiently and safely map asynchronous NoCs on FPGAs. Overall, this thesis makes the following three contributions. The first contribution is a multicast solution for a variant MoT network topology. This topology consists of simple low-radix switches, and has been used in high-performance computing platforms. A novel local speculation technique is introduced, where a subset of the network’s switches are speculative that always broadcast every packet. These switches are very simple and have high performance. Speculative switches are surrounded by non-speculative ones that route packets based on their destinations and also throttle any redundant copies created by the former. This hybrid network architecture achieved significant performance and power benefits over other multicast approaches. The second contribution is a multicast solution for a 2D-mesh topology, which is more complex with higher-radix switches and also is more commonly used. A novel continuous-time replication strategy is introduced to optimize the critical multi-way forking operation of a multicast transmission. In this technique, a multicast packet is first stored in an input port of a switch, from where it is sent through distinct output ports towards different destinations concurrently, at each output’s own rate and in continuous time. This strategy is shown to have significant latency and energy benefits over an approach that performs multicast using multiple distinct serial unicasts to each destination. Finally, a systematic CAD methodology is introduced to synthesize asynchronous NoCs on commercial FPGAs. A two-fold goal is targeted: correctness and high performance. For ease of implementation, only existing FPGA synthesis tools are used. Moreover, since asynchronous NoCs involve special asynchronous components, a comprehensive guide is introduced to map these elements correctly and efficiently. Two asynchronous NoC switches are synthesized using the proposed approach on a leading Xilinx FPGA in 28 nm: one that only handles unicast, and the other that also supports multicast. Both showed significant energy benefits with some performance gains over a state-of-the-art synchronous switch
- …