282 research outputs found

    Dynamic Power Management for Neuromorphic Many-Core Systems

    Full text link
    This work presents a dynamic power management architecture for neuromorphic many core systems such as SpiNNaker. A fast dynamic voltage and frequency scaling (DVFS) technique is presented which allows the processing elements (PE) to change their supply voltage and clock frequency individually and autonomously within less than 100 ns. This is employed by the neuromorphic simulation software flow, which defines the performance level (PL) of the PE based on the actual workload within each simulation cycle. A test chip in 28 nm SLP CMOS technology has been implemented. It includes 4 PEs which can be scaled from 0.7 V to 1.0 V with frequencies from 125 MHz to 500 MHz at three distinct PLs. By measurement of three neuromorphic benchmarks it is shown that the total PE power consumption can be reduced by 75%, with 80% baseline power reduction and a 50% reduction of energy per neuron and synapse computation, all while maintaining temporary peak system performance to achieve biological real-time operation of the system. A numerical model of this power management model is derived which allows DVFS architecture exploration for neuromorphics. The proposed technique is to be used for the second generation SpiNNaker neuromorphic many core system

    Area and Reconfiguration Time Minimization of the Communication Network in Regular 2D Reconfigurable Architectures

    Get PDF
    International audienceIn this paper, we introduce a constraint programming- based approach for the optimization of area and of reconfiguration time for communication networks for a class of regular 2D reconfigurable processor array architectures. For a given set of different algorithms the execution of which is supposed to be switched upon request at run-time, we provide static solutions for the optimal routing of data between processors. Here, we support also multi-casting data transfers for the first time. The routing found by our method minimizes the area or the reconfiguration time of the communication network, when switching between the execution of these algorithms. In fact, when switching, the communication network reconfiguration can be executed in just a few clock cycles. Moreover the communication network area can be minimized signifiHeidelbergcantly (62% in average)

    Exploiting the HTX-Board as a Coprocessor for Exact Arithmetics

    Get PDF
    Certain numerical computations benefit from dedicated computation units, e.g. providing increased computation accuracy. Exploiting current interconnection technologies and advances in reconfigurable logic, restrictions and drawbacks of past approaches towards application-specific units can be overcome. This paper presents our implementation of an FPGA-based hardware unit for exact arithmetics. The unit is tightly integrated into the host system using state-of-the-art HyperTransport technology. An according runtime system provides OS-level support including dynamic function resolution. The approach demonstrates suitability and applicability of the chosen technologies, setting the pace towards broadly acceptable use of reconfigurable coprocessor technology for application-specific computing

    Design and development from single core reconfigurable accelerators to a heterogeneous accelerator-rich platform

    Get PDF
    The performance of a platform is evaluated based on its ability to deal with the processing of multiple applications of different nature. In this context, the platform under evaluation can be of homogeneous, heterogeneous or of hybrid architecture. The selection of an architecture type is generally based on the set of different target applications and performance parameters, where the applications can be of serial or parallel nature. The evaluation is normally based on different performance metrics, e.g., resource/area utilization, execution time, power and energy consumption. This process can also include high-level performance metrics, e.g., Operations Per Second (OPS), OPS/Watt, OPS/Hz, Watt/Area etc. An example of architecture selection can be related to a wireless communication system where the processing of computationally-intensive signal-processing algorithms has strict execution-time constraints and in this case, a platform with special-purpose accelerators is relatively more suitable than a typical homogeneous platform. A couple of decades ago, it was expensive to plant many special-purpose accelerators on a chip as the cost per unit area was relatively higher than today. The utilization wall is also becoming a limiting factor in homogeneous multicore scaling which means that all the cores on a platform cannot be operated at their maximum frequency due to a possible thermal meltdown. In this case, some of the processing cores have to be turned-off or to be operated at very low frequencies making most of the part of the chip to stay underutilized. A possible solution lies in the use of heterogeneous multicore platforms where many application-specific cores operate at lower frequencies, therefore reducing power dissipation density and increasing other performance parameters. However, to achieve maximum flexibility in processing, a general-purpose flavor can also be introduced by adding a few Reduced Instruction-Set Computing (RISC) cores. A power class of heterogeneous multicore platforms is an accelerator-rich platform where many application-specific accelerators are loosely connected with each other for work load distribution or to execute the tasks independently. This research work spans from the design and development of three different types of template-based Coarse-Grain Reconfigurable Arrays (CGRAs), i.e., CREMA, AVATAR and SCREMA to a Heterogeneous Accelerator-Rich Platform (HARP). The accelerators generated from the three CGRAs could perform different lengths and types of Fast Fourier Transform (FFT), real and complex Matrix-Vector Multiplication (MVM) algorithms. CREMA and AVATAR were fixed CGRAs with eight and sixteen number of Processing Element (PE) columns, respectively. SCREMA could flex between four, eight, sixteen and thirty two number of PE columns. Many case studies were conducted to evaluate the performance of the reconfigurable accelerators generated from these CGRA templates. All of these CGRAs work in a processor/coprocessor model tightly integrated with a Direct Memory Access (DMA) device. Apart from these platforms, a reconfigurable Application-Specific Instruction-set Processor (rASIP) is also designed, tested for FFT execution under IEEE-802.11n timing constraints and evaluated against a processor/coprocessor model. It was designed by integrating AVATAR generated radix-(2, 4) FFT accelerator into the datapath of a RISC processor. The instruction set of the RISC processor was extended to perform additional operations related to AVATAR. As mentioned earlier, the underutilized part of the chip, now-a-days called Dark Silicon is posing many challenges for the designers. Apart from software optimizations, clock gating, dynamic voltage/frequency scaling and other high-level techniques, one way of dealing with this problem is to use many application-specific cores. In an effort to maximize the number of reconfigurable processing resources on a platform, the accelerator-rich architecture HARP was designed and evaluated in terms of different performance metrics. HARP is constructed on a Network-on-Chip (NoC) of 3x3 nodes where with every node, a CGRA of application-specific size is integrated other than the central node which is attached to a RISC processor. The RISC establishes synchronization between the nodes for data transfer and also performs the supervisory control. While using the NoC as the backbone of communication between the cores, it becomes possible for all the cores to address each other and also perform execution simultaneously and independently of each other. The performance of accelerators generated from CREMA, AVATAR and SCREMA templates were evaluated individually and also when attached to HARP's NoC nodes. The individual CGRAs show promising results in their own capacity but when integrated all together in the framework of HARP, interesting comparisons were established in terms of overall execution times, resource utilization, operating frequencies, power and energy consumption. In evaluating HARP, estimates and measurements were also made in some advanced performance metrics, e.g., in MOPS/mW and MOPS/MHz. The overall research work promotes the idea of heterogeneous accelerator-rich platform as a solution to current problems and future needs of industry and academia

    Economic aspects of FPGA technology

    Full text link
    En este PFC se ha recogido y analizado diversa información acerca de la tecnología de Xilinx. Incluyendo los datasheets de Xilinx notas del E.E. Times, informes financieros, y artículos de internet. Todos los datos se han unificado en unas ciento cincuenta figuras y tablas. Además, se han revisado los proceedings de la conferencia FPL desde 1991 (la primera en Oxford) hasta 2013 (el último en Porto).In this PFC, diverse information about Xilinx technology has been collected and analyzed. It includes Xilinx datasheets, notes on E.E. Times, financial reports, and Internet articles. All the data have been unified in around one hundred and fifty figures and tables. In addition, FPL proceedings from 1991 (the first in Oxford) to 2013 (the last in Porto) have been revised

    Proceedings of the First International Workshop on HyperTransport Research and Applications (WHTRA2009)(revised 08/2009)

    Get PDF
    Proceedings of the First International Workshop on HyperTransport Research and Applications (WHTRA2009) which was held Feb. 12th 2009 in Mannheim, Germany. The 1st International Workshop for Research on HyperTransport is an international high quality forum for scientists, researches and developers working in the area of HyperTransport. This includes not only developments and research in HyperTransport itself, but also work which is based on or enabled by HyperTransport. HyperTransport (HT) is an interconnection technology which is typically used as system interconnect in modern computer systems, connecting the CPUs among each other and with the I/O bridges. Primarily designed as interconnect between high performance CPUs it provides an extremely low latency, high bandwidth and excellent scalability. The definition of the HTX connector allows the use of HT even for add-in cards. In opposition to other peripheral interconnect technologies like PCI-Express no protocol conversion or intermediate bridging is necessary. HT is a direct connection between device and CPU with minimal latency. Another advantage is the possibility of cache coherent devices. Because of these properties HT is of high interest for high performance I/O like networking and storage, but also for co-processing and acceleration based on ASIC or FPGA technologies. In particular acceleration sees a resurgence of interest today. One reason is the possibility to reduce power consumption by the use of accelerators. In the area of parallel computing the low latency communication allows for fine grain communication schemes and is perfectly suited for scalable systems. Summing up, HT technology offers key advantages and great performance to any research aspect related to or based on interconnects. For more information please consult the workshop website (http://whtra.uni-hd.de)

    Proceedings of the First International Workshop on HyperTransport Research and Applications (WHTRA2009)

    Get PDF
    Proceedings of the First International Workshop on HyperTransport Research and Applications (WHTRA2009) which was held Feb. 12th 2009 in Mannheim, Germany. The 1st International Workshop for Research on HyperTransport is an international high quality forum for scientists, researches and developers working in the area of HyperTransport. This includes not only developments and research in HyperTransport itself, but also work which is based on or enabled by HyperTransport. HyperTransport (HT) is an interconnection technology which is typically used as system interconnect in modern computer systems, connecting the CPUs among each other and with the I/O bridges. Primarily designed as interconnect between high performance CPUs it provides an extremely low latency, high bandwidth and excellent scalability. The definition of the HTX connector allows the use of HT even for add-in cards. In opposition to other peripheral interconnect technologies like PCI-Express no protocol conversion or intermediate bridging is necessary. HT is a direct connection between device and CPU with minimal latency. Another advantage is the possibility of cache coherent devices. Because of these properties HT is of high interest for high performance I/O like networking and storage, but also for co-processing and acceleration based on ASIC or FPGA technologies. In particular acceleration sees a resurgence of interest today. One reason is the possibility to reduce power consumption by the use of accelerators. In the area of parallel computing the low latency communication allows for fine grain communication schemes and is perfectly suited for scalable systems. Summing up, HT technology offers key advantages and great performance to any research aspect related to or based on interconnects. For more information please consult the workshop website (http://whtra.uni-hd.de)

    Elastic wave modes for the assessment of structural timber: ultrasonic echo for building elements and guided waves for pole and pile structures

    Full text link
    © 2014, Springer-Verlag Berlin Heidelberg. This paper presents the state-of-the-art of using non-destructive testing (NDT) methods based on elastic waves for the condition assessment of structural timber. Two very promising approaches based on the propagation and reflections of elastic waves are described. While the first approach uses ultrasonic echoes for the testing of wooden building elements, the second approach uses guided waves (GW) for the testing of timber pole and pile structures. The basic principle behind both approaches is that elastic waves induced in a timber structure will propagate through its material until they encounter a change in stiffness, cross-sectional area or density, at which point they will reflect back. By measuring the wave echoes, it is possible to determine geometric properties of the tested structures such as the back wall of timber elements or the underground length of timber poles or piles. In addition, the internal state of the tested structures can be assessed since damage and defects such as rot, fungi or termite attacks will cause early reflections of the elastic waves as well as it can result in changes in wave velocity, wave attenuation and wave mode conversion. In the paper, the principles and theory of using elastic wave propagation for the assessment of wooden building elements and timber pole/pile structures are described. The state-of-the-art in testing equipment and procedures is presented and detailed examples are given on the practical application of both testing approaches. Recent encouraging developments of cutting edge research are presented along with challenges for future research
    • …
    corecore