102 research outputs found

    Resilient and Scalable Forwarding for Software-Defined Networks with P4-Programmable Switches

    Get PDF
    Traditional networking devices support only fixed features and limited configurability. Network softwarization leverages programmable software and hardware platforms to remove those limitations. In this context the concept of programmable data planes allows directly to program the packet processing pipeline of networking devices and create custom control plane algorithms. This flexibility enables the design of novel networking mechanisms where the status quo struggles to meet high demands of next-generation networks like 5G, Internet of Things, cloud computing, and industry 4.0. P4 is the most popular technology to implement programmable data planes. However, programmable data planes, and in particular, the P4 technology, emerged only recently. Thus, P4 support for some well-established networking concepts is still lacking and several issues remain unsolved due to the different characteristics of programmable data planes in comparison to traditional networking. The research of this thesis focuses on two open issues of programmable data planes. First, it develops resilient and efficient forwarding mechanisms for the P4 data plane as there are no satisfying state of the art best practices yet. Second, it enables BIER in high-performance P4 data planes. BIER is a novel, scalable, and efficient transport mechanism for IP multicast traffic which has only very limited support of high-performance forwarding platforms yet. The main results of this thesis are published as 8 peer-reviewed and one post-publication peer-reviewed publication. The results cover the development of suitable resilience mechanisms for P4 data planes, the development and implementation of resilient BIER forwarding in P4, and the extensive evaluations of all developed and implemented mechanisms. Furthermore, the results contain a comprehensive P4 literature study. Two more peer-reviewed papers contain additional content that is not directly related to the main results. They implement congestion avoidance mechanisms in P4 and develop a scheduling concept to find cost-optimized load schedules based on day-ahead forecasts

    Experimental survey of FPGA-based monolithic switches and a novel queue balancer

    Get PDF
    This paper studies small to medium-sized monolithic switches for FPGA implementation and presents a novel switch design that achieves high algorithmic performance and FPGA implementation efficiency. Crossbar switches based on virtual output queues (VOQs) and variations have been rather popular for implementing switches on FPGAs, with applications in network switches, memory interconnects, network-on-chip (NoC) routers etc. The implementation efficiency of crossbar-based switches is well-documented on ASICs, though we show that their disadvantages can outweigh their advantages on FPGAs. One of the most important challenges in such input-queued switches is the requirement for iterative scheduling algorithms. In contrast to ASICs, this is more harmful on FPGAs, as the reduced operating frequency and narrower packets cannot “hide” multiple iterations of scheduling that are required to achieve a modest scheduling performance.Our proposed design uses an output-queued switch internally for simplifying scheduling, and a queue balancing technique to avoid queue fragmentation and reduce the need for memory-sharing VOQs. Its implementation approaches the scheduling performance of a state-of-the-art FPGA-based switch, while requiring considerably fewer resources

    Process Modeling in Pyrometallurgical Engineering

    Get PDF
    The Special Issue presents almost 40 papers on recent research in modeling of pyrometallurgical systems, including physical models, first-principles models, detailed CFD and DEM models as well as statistical models or models based on machine learning. The models cover the whole production chain from raw materials processing through the reduction and conversion unit processes to ladle treatment, casting, and rolling. The papers illustrate how models can be used for shedding light on complex and inaccessible processes characterized by high temperatures and hostile environment, in order to improve process performance, product quality, or yield and to reduce the requirements of virgin raw materials and to suppress harmful emissions

    Partial aggregation for collective communication in distributed memory machines

    Get PDF
    High Performance Computing (HPC) systems interconnect a large number of Processing Elements (PEs) in high-bandwidth networks to simulate complex scientific problems. The increasing scale of HPC systems poses great challenges on algorithm designers. As the average distance between PEs increases, data movement across hierarchical memory subsystems introduces high latency. Minimizing latency is particularly challenging in collective communications, where many PEs may interact in complex communication patterns. Although collective communications can be optimized for network-level parallelism, occasional synchronization delays due to dependencies in the communication pattern degrade application performance. To reduce the performance impact of communication and synchronization costs, parallel algorithms are designed with sophisticated latency hiding techniques. The principle is to interleave computation with asynchronous communication, which increases the overall occupancy of compute cores. However, collective communication primitives abstract parallelism which limits the integration of latency hiding techniques. Approaches to work around these limitations either modify the algorithmic structure of application codes, or replace collective primitives with verbose low-level communication calls. While these approaches give fine-grained control for latency hiding, implementing collective communication algorithms is challenging and requires expertise knowledge about HPC network topologies. A collective communication pattern is commonly described as a Directed Acyclic Graph (DAG) where a set of PEs, represented as vertices, resolve data dependencies through communication along the edges. Our approach improves latency hiding in collective communication through partial aggregation. Based on mathematical rules of binary operations and homomorphism, we expose data parallelism in a respective DAG to overlap computation with communication. The proposed concepts are implemented and evaluated with a subset of collective primitives in the Message Passing Interface (MPI), an established communication standard in scientific computing. An experimental analysis with communication-bound microbenchmarks shows considerable performance benefits for the evaluated collective primitives. A detailed case study with a large-scale distributed sort algorithm demonstrates, how partial aggregation significantly improves performance in data-intensive scenarios. Besides better latency hiding capabilities with collective communication primitives, our approach enables further optimizations of their implementations within MPI libraries. The vast amount of asynchronous programming models, which are actively studied in the HPC community, benefit from partial aggregation in collective communication patterns. Future work can utilize partial aggregation to improve the interaction of MPI collectives with acclerator architectures, and to design more efficient communication algorithms

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Study of the data acquisition network for the triggerless data acquisition of the LHCb experiment and new particle track reconstruction strategies for the LHCb upgrade

    Get PDF
    The LHCb experiment will receive a major upgrade by the end of February 2021. This upgrade will allow the recording of proton-proton collision data at s=14 TeV\sqrt{s} = 14\ \text{TeV} with an instantaneous luminosity of 21033 cm2s12 \cdot 10^{33}\ \text{cm}^{-2}\text{s}^{-1}, making possible measurements of unprecedented precision in the bb and cc-quark flavour sectors. For taking advantage of the increased luminosity provided, the data acquisition system will receive a substantial upgrade. The upgraded system will be capable of processing the full collision rate of 30 MHz30\ \text{MHz}, without any low-level hardware preselection. This new design constraint poses a non-trivial technological challenge, both from a networking and computing point of view. A possible design of a 32 Tb/s32\ \text{Tb/s} data acquisition network is presented, and low-level network simulations are used to validate the design. Those simulations use an accurate behavioural model developed and optimised for this specific purpose. It is mandatory to optimise the reconstruction algorithms using a computing and physics approach, to perform the online reconstruction of the full 30 MHz30\ \text{MHz} pppp collisions rate. A new parametrisation of the charged particles' bending generated by the dipole of the LHCb experiment is presented. The accuracy of the model is tested against Monte Carlo data. This strategy can reduce by a factor four the size of the search windows needed in the SciFi sub-detector. The LookingForward algorithm in the Allen framework uses this model

    Designing periodic and aperiodic structures for nanophotinic devices.

    Get PDF
    330 p.Future all--optical networks will require to substitute the present electronic integrated circuitry by optical analogous devices that satisfy the compactness, throughput, latency and high transmission efficiency requirements in nanometer scale dimensions, outperforming the functionality of current networks. Thereby, existing dielectric materials do not confine light in a sufficiently small scale and so the physical size of these links and devices becomes unacceptable. In fact, if the optical chip does not exist in the liking of the electronic chip, photonic crystals have recently led to great hopes for a large-scale integration of optoelectronic components. Two-dimensional photonic crystals slabs obtained through periodic structuring of a planar optical waveguide, feature many characteristics which bring them closer to electronic micro-and nanostructures. This thesis explores non-trivial periodic and aperiodic dielectric nano-structures and to do so, we pose a photonic crystal design process guided by non-convex combinatory optimization techniques. In addition, this thesis proposes some novel coupling devices optimized to minimize insertion losses between silicon-on-insulator integrated waveguides and single mode optical fibers. Last but not least, this thesis explores periodic arrangements from a new perspective and reports on the first experimental evidence of topologically protected waveguiding in silicon. Furthermore, we propose and demonstrate that, in a system where topological and trivial defect modes coexist, we can probe them independently. Tuning the configuration of the interface, we observe the transition between a single topological defect and a compound trivial defect state

    Generalized averaged Gaussian quadrature and applications

    Get PDF
    A simple numerical method for constructing the optimal generalized averaged Gaussian quadrature formulas will be presented. These formulas exist in many cases in which real positive GaussKronrod formulas do not exist, and can be used as an adequate alternative in order to estimate the error of a Gaussian rule. We also investigate the conditions under which the optimal averaged Gaussian quadrature formulas and their truncated variants are internal
    corecore