133 research outputs found

    Exploration and Design of Power-Efficient Networked Many-Core Systems

    Get PDF
    Multiprocessing is a promising solution to meet the requirements of near future applications. To get full benefit from parallel processing, a manycore system needs efficient, on-chip communication architecture. Networkon- Chip (NoC) is a general purpose communication concept that offers highthroughput, reduced power consumption, and keeps complexity in check by a regular composition of basic building blocks. This thesis presents power efficient communication approaches for networked many-core systems. We address a range of issues being important for designing power-efficient manycore systems at two different levels: the network-level and the router-level. From the network-level point of view, exploiting state-of-the-art concepts such as Globally Asynchronous Locally Synchronous (GALS), Voltage/ Frequency Island (VFI), and 3D Networks-on-Chip approaches may be a solution to the excessive power consumption demanded by today’s and future many-core systems. To this end, a low-cost 3D NoC architecture, based on high-speed GALS-based vertical channels, is proposed to mitigate high peak temperatures, power densities, and area footprints of vertical interconnects in 3D ICs. To further exploit the beneficial feature of a negligible inter-layer distance of 3D ICs, we propose a novel hybridization scheme for inter-layer communication. In addition, an efficient adaptive routing algorithm is presented which enables congestion-aware and reliable communication for the hybridized NoC architecture. An integrated monitoring and management platform on top of this architecture is also developed in order to implement more scalable power optimization techniques. From the router-level perspective, four design styles for implementing power-efficient reconfigurable interfaces in VFI-based NoC systems are proposed. To enhance the utilization of virtual channel buffers and to manage their power consumption, a partial virtual channel sharing method for NoC routers is devised and implemented. Extensive experiments with synthetic and real benchmarks show significant power savings and mitigated hotspots with similar performance compared to latest NoC architectures. The thesis concludes that careful codesigned elements from different network levels enable considerable power savings for many-core systems.Siirretty Doriast

    Exploration and Design of High Performance Variation Tolerant On-Chip Interconnects

    Get PDF
    Siirretty Doriast

    Thermal-Aware Networked Many-Core Systems

    Get PDF
    Advancements in IC processing technology has led to the innovation and growth happening in the consumer electronics sector and the evolution of the IT infrastructure supporting this exponential growth. One of the most difficult obstacles to this growth is the removal of large amount of heatgenerated by the processing and communicating nodes on the system. The scaling down of technology and the increase in power density is posing a direct and consequential effect on the rise in temperature. This has resulted in the increase in cooling budgets, and affects both the life-time reliability and performance of the system. Hence, reducing on-chip temperatures has become a major design concern for modern microprocessors. This dissertation addresses the thermal challenges at different levels for both 2D planer and 3D stacked systems. It proposes a self-timed thermal monitoring strategy based on the liberal use of on-chip thermal sensors. This makes use of noise variation tolerant and leakage current based thermal sensing for monitoring purposes. In order to study thermal management issues from early design stages, accurate thermal modeling and analysis at design time is essential. In this regard, spatial temperature profile of the global Cu nanowire for on-chip interconnects has been analyzed. It presents a 3D thermal model of a multicore system in order to investigate the effects of hotspots and the placement of silicon die layers, on the thermal performance of a modern ip-chip package. For a 3D stacked system, the primary design goal is to maximise the performance within the given power and thermal envelopes. Hence, a thermally efficient routing strategy for 3D NoC-Bus hybrid architectures has been proposed to mitigate on-chip temperatures by herding most of the switching activity to the die which is closer to heat sink. Finally, an exploration of various thermal-aware placement approaches for both the 2D and 3D stacked systems has been presented. Various thermal models have been developed and thermal control metrics have been extracted. An efficient thermal-aware application mapping algorithm for a 2D NoC has been presented. It has been shown that the proposed mapping algorithm reduces the effective area reeling under high temperatures when compared to the state of the art.Siirretty Doriast

    Optimization of BGP Convergence and Prefix Security in IP/MPLS Networks

    Get PDF
    Multi-Protocol Label Switching-based networks are the backbone of the operation of the Internet, that communicates through the use of the Border Gateway Protocol which connects distinct networks, referred to as Autonomous Systems, together. As the technology matures, so does the challenges caused by the extreme growth rate of the Internet. The amount of BGP prefixes required to facilitate such an increase in connectivity introduces multiple new critical issues, such as with the scalability and the security of the aforementioned Border Gateway Protocol. Illustration of an implementation of an IP/MPLS core transmission network is formed through the introduction of the four main pillars of an Autonomous System: Multi-Protocol Label Switching, Border Gateway Protocol, Open Shortest Path First and the Resource Reservation Protocol. The symbiosis of these technologies is used to introduce the practicalities of operating an IP/MPLS-based ISP network with traffic engineering and fault-resilience at heart. The first research objective of this thesis is to determine whether the deployment of a new BGP feature, which is referred to as BGP Prefix Independent Convergence (PIC), within AS16086 would be a worthwhile endeavour. This BGP extension aims to reduce the convergence delay of BGP Prefixes inside of an IP/MPLS Core Transmission Network, thus improving the networks resilience against faults. Simultaneously, the second research objective was to research the available mechanisms considering the protection of BGP Prefixes, such as with the implementation of the Resource Public Key Infrastructure and the Artemis BGP Monitor for proactive and reactive security of BGP prefixes within AS16086. The future prospective deployment of BGPsec is discussed to form an outlook to the future of IP/MPLS network design. As the trust-based nature of BGP as a protocol has become a distinct vulnerability, thus necessitating the use of various technologies to secure the communications between the Autonomous Systems that form the network to end all networks, the Internet

    2022 roadmap on neuromorphic computing and engineering

    Full text link
    Modern computation based on von Neumann architecture is now a mature cutting-edge science. In the von Neumann architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exascale with 1018^{18} calculations each second. Even though these future computers will be incredibly powerful, if they are based on von Neumann type architectures, they will consume between 20 and 30 megawatts of power and will not have intrinsic physically built-in capabilities to learn or deal with complex data as our brain does. These needs can be addressed by neuromorphic computing systems which are inspired by the biological concepts of the human brain. This new generation of computers has the potential to be used for the storage and processing of large amounts of digital information with much lower power consumption than conventional processors. Among their potential future applications, an important niche is moving the control from data centers to edge devices. The aim of this roadmap is to present a snapshot of the present state of neuromorphic technology and provide an opinion on the challenges and opportunities that the future holds in the major areas of neuromorphic technology, namely materials, devices, neuromorphic circuits, neuromorphic algorithms, applications, and ethics. The roadmap is a collection of perspectives where leading researchers in the neuromorphic community provide their own view about the current state and the future challenges for each research area. We hope that this roadmap will be a useful resource by providing a concise yet comprehensive introduction to readers outside this field, for those who are just entering the field, as well as providing future perspectives for those who are well established in the neuromorphic computing community

    An erasure-resilient and compute-efficient coding scheme for storage applications

    Get PDF
    Driven by rapid technological advancements, the amount of data that is created, captured, communicated, and stored worldwide has grown exponentially over the past decades. Along with this development it has become critical for many disciplines of science and business to being able to gather and analyze large amounts of data. The sheer volume of the data often exceeds the capabilities of classical storage systems, with the result that current large-scale storage systems are highly distributed and are comprised of a high number of individual storage components. As with any other electronic device, the reliability of storage hardware is governed by certain probability distributions, which in turn are influenced by the physical processes utilized to store the information. The traditional way to deal with the inherent unreliability of combined storage systems is to replicate the data several times. Another popular approach to achieve failure tolerance is to calculate the block-wise parity in one or more dimensions. With better understanding of the different failure modes of storage components, it has become evident that sophisticated high-level error detection and correction techniques are indispensable for the ever-growing distributed systems. The utilization of powerful cyclic error-correcting codes, however, comes with a high computational penalty, since the required operations over finite fields do not map very well onto current commodity processors. This thesis introduces a versatile coding scheme with fully adjustable fault-tolerance that is tailored specifically to modern processor architectures. To reduce stress on the memory subsystem the conventional table-based algorithm for multiplication over finite fields has been replaced with a polynomial version. This arithmetically intense algorithm is better suited to the wide SIMD units of the currently available general purpose processors, but also displays significant benefits when used with modern many-core accelerator devices (for instance the popular general purpose graphics processing units). A CPU implementation using SSE and a GPU version using CUDA are presented. The performance of the multiplication depends on the distribution of the polynomial coefficients in the finite field elements. This property has been used to create suitable matrices that generate a linear systematic erasure-correcting code which shows a significantly increased multiplication performance for the relevant matrix elements. Several approaches to obtain the optimized generator matrices are elaborated and their implications are discussed. A Monte-Carlo-based construction method allows it to influence the specific shape of the generator matrices and thus to adapt them to special storage and archiving workloads. Extensive benchmarks on CPU and GPU demonstrate the superior performance and the future application scenarios of this novel erasure-resilient coding scheme

    Self-adaptivity of applications on network on chip multiprocessors: the case of fault-tolerant Kahn process networks

    Get PDF
    Technology scaling accompanied with higher operating frequencies and the ability to integrate more functionality in the same chip has been the driving force behind delivering higher performance computing systems at lower costs. Embedded computing systems, which have been riding the same wave of success, have evolved into complex architectures encompassing a high number of cores interconnected by an on-chip network (usually identified as Multiprocessor System-on-Chip). However these trends are hindered by issues that arise as technology scaling continues towards deep submicron scales. Firstly, growing complexity of these systems and the variability introduced by process technologies make it ever harder to perform a thorough optimization of the system at design time. Secondly, designers are faced with a reliability wall that emerges as age-related degradation reduces the lifetime of transistors, and as the probability of defects escaping post-manufacturing testing is increased. In this thesis, we take on these challenges within the context of streaming applications running in network-on-chip based parallel (not necessarily homogeneous) systems-on-chip that adopt the no-remote memory access model. In particular, this thesis tackles two main problems: (1) fault-aware online task remapping, (2) application-level self-adaptation for quality management. For the former, by viewing fault tolerance as a self-adaptation aspect, we adopt a cross-layer approach that aims at graceful performance degradation by addressing permanent faults in processing elements mostly at system-level, in particular by exploiting redundancy available in multi-core platforms. We propose an optimal solution based on an integer linear programming formulation (suitable for design time adoption) as well as heuristic-based solutions to be used at run-time. We assess the impact of our approach on the lifetime reliability. We propose two recovery schemes based on a checkpoint-and-rollback and a rollforward technique. For the latter, we propose two variants of a monitor-controller- adapter loop that adapts application-level parameters to meet performance goals. We demonstrate not only that fault tolerance and self-adaptivity can be achieved in embedded platforms, but also that it can be done without incurring large overheads. In addressing these problems, we present techniques which have been realized (depending on their characteristics) in the form of a design tool, a run-time library or a hardware core to be added to the basic architecture

    Investigation of Orthogonal Frequency Division Multiplexing Based Power Line Communication Systems

    Get PDF
    Power Line Communication (PLC) has the potential to become the preferred technique for providing broadband to homes and offices with the advantage of eliminating the need for new wiring infrastructure and reducing the cost. Power line grids, however, present a hostile channel for data communication, since the fundamental purpose of the power line channel was only the transmission of electric power at 50/60 Hz frequencies. The development of PLC systems for providing broadband applications requires an adequate knowledge of the power line channel characteristics. Various types of noise and multipath effects are some of the limitations for power line channels which need to be considered carefully in designing PLC systems. An effect of an impulsive noise characterized with short durations is identified as one of the major impairment in PLC system. Orthogonal Frequency Division Multiplexing (OFDM) technique is one of the modulation approaches which has been regarded as the modulation technique for PLC systems by most researchers in the field and is used in this research study work. This is because it provides high robustness against impulsive noise and minimizes the effects of multipath. In case of impulsive noise affecting the OFDM system, this effect is spread over multiple subcarriers due to Discrete Fourier Transform (DFT) at the receiver. Hence, each of the transmitted communication symbols is only affected by a fraction of the impulsive noise. In order to achieve reliable results for data transmission, a proper power line channel with various noise models must be used in the investigations. In this research study work, a multipath model which has been widely accepted by many researchers in the field and practically proven in the Tanzanian power line system is used as the model for the power line channel. The effects of different scenarios such as variations in direct path length, path number, branch length and load on the channel frequency response are investigated in this research work. Simulation results indicate the suitability of multi-carrier modulation technique such as an OFDM over the power line channels. To represent the actual noise scenario in the power line channel, an impulsive noise and background noise are classified as the two main noise sources. A Middleton class A noise is modelled as an impulsive noise, whereas the background noise is modelled as an Additive White Gaussian Noise (AWGN). The performance of PLC system based on OFDM is investigated under Middleton Class A and AWGN noise scenarios. It is observed that Bit Error Rate (BER) for the impulsive noise is higher than the background noise. Since channel coding can enhance the transmission in a communication system, Block code and convolutional codes have been studied in this research work. The hamming code chosen as a type of the block code, whereas the Trellis Coded Modulation (TCM) selected from the category of the convolutional channel codes and modelled in Matlab2013b. Although TCM code produces improvements in the Signal-to-Noise Ratio (SNR), they do not perform well with Middleton class A noise. A rectangular 16-QAM TCM based on OFDM provides better BER rate compared to the general TCM

    Direct sequence spread spectrum techniques in local area networks

    Get PDF
    This thesis describes the application of a direct sequence spread spectrum modulation scheme to the physical layer of a local area networks subsequently named the SS-LAN. Most present day LANs employ erne form or another of time division multiplexing which performs well in many systems but which is limited by its very nature in real time, time critical and time demanding applications. The use of spread spectrum multiplexing removes these limitations by providing a simultaneous multiple user access capability to the channel which permits each and all nodes to utilise the channel independent of the activity being currently supported by that channel. The theory of spectral spreading is a consequence of the Shannon channel capacity in which the channel capacity may be maintained by the trading of signal to noise ratio for bandwidth. The increased bandwidth provides an increased signal dimensionality which can be utilised in providing noise immunity and/or a simultaneous multiple user environment: the effects of the simultaneous users can be considered as noise from the point of view of any particular constituent signal. The use of code sequences at the physical layer of a LAN permits a wide range of mapping alternatives which can be selected according to the particular application. Each of the mapping techniques possess the general spread spectrum properties but certain properties can be emphasised at the expense of others. The work has Involved the description of the properties of the SS-LAN coupled with the development of the mapping techniques for use In the distribution of the code sequences. This has been followed by an appraisal of a set of code sequences which has resulted in the definition of the ideal code properties and the selection of code families for particular types of applications. The top level design specification for the hardware required in the construction of the SS-LAN has also been presented and this has provided the basis for a simplified and idealised theoretical analysis of the performance parameters of the SS-LAN. A positive set of conclusions for the range of these parameters has been obtained and these have been further analysed by the use of a SS-LAN computer simulation program. This program can simulate any configuration of the SS-LAN and the results it has produced have been compared with those of the analysis and have been found to be in agreement. A tool for the further analysis of complex SS-LAN configurations has therefore been developed and this will form the basis for further work
    • …
    corecore