216 research outputs found

    Achieving Superscalar Performance without Superscalar Overheads - A Dataflow Compiler IR for Custom Computing

    Get PDF
    The difficulty of effectively parallelizing code for multicore processors, combined with the end of threshold voltage scaling has resulted in the problem of \u27Dark Silicon\u27, severely limiting performance scaling despite Moore\u27s Law. To address dark silicon, not only must we drastically improve the energy efficiency of computation, but due to Amdahl\u27s Law, we must do so without compromising sequential performance. Designers increasingly utilize custom hardware to dramatically improve both efficiency and performance in increasingly heterogeneous architectures. Unfortunately, while it efficiently accelerates numeric, data-parallel applications, custom hardware often exhibits poor performance on sequential code, so complex, power-hungry superscalar processors must still be utilized. This paper addresses the problem of improving sequential performance in custom hardware by (a) switching from a statically scheduled to a dynamically scheduled (dataflow) execution model, and (b) developing a new compiler IR for high-level synthesis that enables aggressive exposition of ILP even in the presence of complex control flow. This new IR is directly implemented as a static dataflow graph in hardware by our high-level synthesis tool-chain, and shows an average speedup of 1.13 times over equivalent hardware generated using LegUp, an existing HLS tool. In addition, our new IR allows us to further trade area & energy for performance, increasing the average speedup to 1.55 times, through loop unrolling, with a peak speedup of 4.05 times. Our custom hardware is able to approach the sequential cycle-counts of an Intel Nehalem Core i7 superscalar processor, while consuming on average only 0.25 times the energy of an in-order Altera Nios IIf processor

    A Modular Reconfigurable Architecture for Asymmetric and Symmetric-key Cryptographic Algorithms

    Get PDF
    It is widely recognized that security issues will play a crucial role in the majority of future computer and communication systems. Cryptographic algorithms are the central tools for achieving system security. Numerous such algorithms have been devised, and many have found popularity in different domains. High throughput and low-cost implementation of these algorithms is critical for achieving both high security and high-speed processing in an increasingly digital global economy. Conventional methods for implementing ciphers are unable to provide all three crucial characteristics in a single solution: high throughput, low-cost, and cipher-agility. This thesis develops a reconfigurable architecture capable of implementing most symmetric-key as well as asymmetric-key ciphers. The reconfigurable nature of the architecture provides flexibility equivalent to software implementations, with the low-cost and throughput figures approaching ASIC implementations of these ciphers. Detailed discussions of the development of this architecture, along with the top-level design and interconnection scheme, have been provided. The individual components developed have been synthesized on a standard-cell library to provide an estimate of the area/performance characteristics of the design. Preliminary results show throughput values equivalent to FPGA based implementations for most of the tested ciphers, and approaching ASIC based implementations. Keywords: Reconfigurable Computing, Cryptography, Symmetric-Key, Asymmetric-Key, Domain-specific Reconfigurable Architecture

    A Modular Reconfigurable Architecture for Asymmetric and Symmetric-key Cryptographic Algorithms

    Get PDF
    It is widely recognized that security issues will play a crucial role in the majority of future computer and communication systems. Cryptographic algorithms are the central tools for achieving system security. Numerous such algorithms have been devised, and many have found popularity in different domains. High throughput and low-cost implementation of these algorithms is critical for achieving both high security and high-speed processing in an increasingly digital global economy. Conventional methods for implementing ciphers are unable to provide all three crucial characteristics in a single solution: high throughput, low-cost, and cipher-agility. This thesis develops a reconfigurable architecture capable of implementing most symmetric-key as well as asymmetric-key ciphers. The reconfigurable nature of the architecture provides flexibility equivalent to software implementations, with the low-cost and throughput figures approaching ASIC implementations of these ciphers. Detailed discussions of the development of this architecture, along with the top-level design and interconnection scheme, have been provided. The individual components developed have been synthesized on a standard-cell library to provide an estimate of the area/performance characteristics of the design. Preliminary results show throughput values equivalent to FPGA based implementations for most of the tested ciphers, and approaching ASIC based implementations. Keywords: Reconfigurable Computing, Cryptography, Symmetric-Key, Asymmetric-Key, Domain-specific Reconfigurable Architecture

    Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm

    Get PDF
    Simulated Evolution (SimE) is a sound stochastic approximation algorithm based on the principles of adaptation. If properly engineered it is possible for SimE to reach near optimal solutions in lesser time then Simulated Annealing [1], [2]. Nevertheless, depending on the size of the problem, it may have large run-time requirements. One practical approach to speed up the execution of SimE algorithm is to parallelize it. This is all the more true for multi-objective cell placement, where the need to optimize conflicting objectives (interconnect wirelength, power dissipation, and timing performance) adds another level of difficulty [3]. In this paper a distributed parallel SimE algorithm is presented for multiobjective VLSI standard cell placement. Fuzzy logic is used to integrate the costs of these objectives. The algorithm presented is based on random distribution of rows to individual processors in order to partition the problem and distribute computationally intensive tasks, while also efficiently traversing the complex search space. A series of experiments are performed on ISCAS-85/89 benchmarks to compare speedup with serial implementation and other earlier proposals. Discussion on comparison with parallel implementations of other iterative heuristics is included

    Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm

    Get PDF
    Simulated Evolution (SimE) is a sound stochastic approximation algorithm based on the principles of adaptation. If properly engineered it is possible for SimE to reach near optimal solutions in lesser time then Simulated Annealing [1], [2]. Nevertheless, depending on the size of the problem, it may have large run-time requirements. One practical approach to speed up the execution of SimE algorithm is to parallelize it. This is all the more true for multi-objective cell placement, where the need to optimize conflicting objectives (interconnect wirelength, power dissipation, and timing performance) adds another level of difficulty [3]. In this paper a distributed parallel SimE algorithm is presented for multiobjective VLSI standard cell placement. Fuzzy logic is used to integrate the costs of these objectives. The algorithm presented is based on random distribution of rows to individual processors in order to partition the problem and distribute computationally intensive tasks, while also efficiently traversing the complex search space. A series of experiments are performed on ISCAS-85/89 benchmarks to compare speedup with serial implementation and other earlier proposals. Discussion on comparison with parallel implementations of other iterative heuristics is included

    Evaluating Parallel Simulated Evolution Strategies for VLSI Cell Placement

    Get PDF
    Simulated Evolution (SimE) is an evolutionary metaheuristic that has produced results comparable to well established stochastic heuristics such as SA, TS and GA, with shorter runtimes. However, for problems with a very large set of elements to optimize, such as in VLSI placement and routing, runtimes can still be very large and parallelization is an attractive option. Compared to other metaheuristics, parallelization of SimE has not been extensively explored. This paper presents a comprehensive set of parallelization approaches for SimE when applied to multiobjective VLSI cell placement problem. Each of these approaches are evaluated with respect to SimE characteristics and the constraints imposed by the problem instance. Conclusions drawn can be extended to parallelization of other SimE based optimization problems

    Evaluating Parallel Simulated Evolution Strategies for VLSI Cell Placement

    Get PDF
    Simulated Evolution (SimE) is an evolutionary metaheuristic that has produced results comparable to well established stochastic heuristics such as SA, TS and GA, with shorter runtimes. However, for problems with a very large set of elements to optimize, such as in VLSI placement and routing, runtimes can still be very large and parallelization is an attractive option. Compared to other metaheuristics, parallelization of SimE has not been extensively explored. This paper presents a comprehensive set of parallelization approaches for SimE when applied to multiobjective VLSI cell placement problem. Each of these approaches are evaluated with respect to SimE characteristics and the constraints imposed by the problem instance. Conclusions drawn can be extended to parallelization of other SimE based optimization problems

    Asynchronous MMC based Parallel SA Schemes for Multiobjective Standard Cell Placement

    Get PDF
    Simulated Annealing (SA) is a popular iterative heuristic used to solve a wide variety of combinatorial optimization problems. However, depending on the size of the problem, it may have large run-time requirements. One practical approach to speed up its execution is to parallelize it. In this paper we develop parallel SA schemes based on the Asynchronous Multiple-Markov Chain model (AMMC) described in [1] and applied to standard-cell placement in [2]. The schemes are applied to solve the multi-objective standard cell placement problem using an inexpensive cluster-of-workstations environment. This problem requires the optimization of conflicting objectives (interconnect wire-length, power dissipation, and timing performance), and Fuzzy logic is used to integrate the costs of these objectives [3], [4]. Experiments are performed on ISCAS-85/89 benchmark circuits. Our goal is to develop parallel SA schemes that provide significantly improved runtime/solution quality characteristics for this key CAD problem, by making the best possible use of an inexpensive parallel environment

    Evaluating Parallel Simulated Evolution Strategies for VLSI Cell Placement

    Get PDF
    Simulated Evolution (SimE) is an evolutionary metaheuristic that has produced results comparable to well established stochastic heuristics such as SA, TS and GA, with shorter runtimes. However, for problems with a very large set of elements to optimize, such as in VLSI placement and routing, runtimes can still be very large and parallelization is an attractive option. Compared to other metaheuristics, parallelization of SimE has not been extensively explored. This paper presents a comprehensive set of parallelization approaches for SimE when applied to multiobjective VLSI cell placement problem. Each of these approaches are evaluated with respect to SimE characteristics and the constraints imposed by the problem instance. Conclusions drawn can be extended to parallelization of other SimE based optimization problems

    Asynchronous MMC based Parallel SA Schemes for Multiobjective Standard Cell Placement

    Get PDF
    Simulated Annealing (SA) is a popular iterative heuristic used to solve a wide variety of combinatorial optimization problems. However, depending on the size of the problem, it may have large run-time requirements. One practical approach to speed up its execution is to parallelize it. In this paper we develop parallel SA schemes based on the Asynchronous Multiple-Markov Chain model (AMMC) described in [1] and applied to standard-cell placement in [2]. The schemes are applied to solve the multi-objective standard cell placement problem using an inexpensive cluster-of-workstations environment. This problem requires the optimization of conflicting objectives (interconnect wire-length, power dissipation, and timing performance), and Fuzzy logic is used to integrate the costs of these objectives [3], [4]. Experiments are performed on ISCAS-85/89 benchmark circuits. Our goal is to develop parallel SA schemes that provide significantly improved runtime/solution quality characteristics for this key CAD problem, by making the best possible use of an inexpensive parallel environment
    corecore