551 research outputs found

    Towards hardware acceleration of neuroevolution for multimedia processing applications on mobile devices

    Get PDF
    This paper addresses the problem of accelerating large artificial neural networks (ANN), whose topology and weights can evolve via the use of a genetic algorithm. The proposed digital hardware architecture is capable of processing any evolved network topology, whilst at the same time providing a good trade off between throughput, area and power consumption. The latter is vital for a longer battery life on mobile devices. The architecture uses multiple parallel arithmetic units in each processing element (PE). Memory partitioning and data caching are used to minimise the effects of PE pipeline stalling. A first order minimax polynomial approximation scheme, tuned via a genetic algorithm, is used for the activation function generator. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design

    Genetic approach to minimizing energy consumption of VLSI processors using multiple supply voltages

    Get PDF
    科研費報告書収録論文(課題番号:17300009/研究代表者:亀山充隆/システムインテグレーション理論に基づく高安全知能自動車用VLSIの最適設計

    Minimizing energy consumption of VLSI processors based on dual-supply-voltage assignment and interconnection simplification

    Get PDF
    科研費報告書収録論文(課題番号:17300009/研究代表者:亀山充隆/システムインテグレーション理論に基づく高安全知能自動車用VLSIの最適設計

    A CAD Tool for Synthesizing Optimized Variants of Altera\u27s Nios II Soft-Core Processor

    Get PDF
    Soft-core processors offer embedded system designers the benefits of customization, flexibility and reusability. Altera\u27s NIOS II soft-core processor is a popular, commercially available soft-core processor that can be implemented on a variety of Altera FPGAs. In this thesis, the Nios II soft-core processor from Altera Corporation was studied and a VHDL implementation, called UW_Nios II, was developed. UW_Nios II was developed to enable us to perform design space exploration (DSE) for the Nios II processor. It was evaluated and compared with Altera Nios II and shown to be competitive. SCBuild is an existing CAD tool that was developed to enable DSE of soft-core processors. We modified SCBuild to automatically explore the design space of the UW_Nios II using a genetic algorithm. This tool can accurately estimate the area and critical path delay of different variants of the UW_Nios II on a field programmable gate array. Through experiments conducted using SCBuild, it was shown that employing a genetic algorithm to explore the design space of parameterized Nios II core, with a large design space, helps designers find optimized variants of UW_Nios II

    Modeling of a hardware VLSI placement system: Accelerating the Simulated Annealing algorithm

    Get PDF
    An essential step in the automation of electronic design is the placement of the physical components on the target semiconductor die. The placement step presents the opportunity to reduce costs in terms of wire length and performance degradation; however it is compute intensive and is NP-complete in terms of obtaining an optimal solution. As designs have grown in complexity and gate count, obtaining an optimal solution is not feasible due to time to market constraints or sheer compute effort required. Heuristic algorithms allow for efficient but sub-optimal designs to be produced with a reduction in processing time. A widely used algorithm is Simulated Annealing (SA). The goal of this work was to develop a model that would enable an analysis into the feasibility of developing a hardware accelerated placement system which uses SA at its core. The SA heuristic was analyzed for possible improvements in efficiency with focus given to targeting the system for hardware. A solution implementing parallel computing with specialized hardware configurations inside a field programmable gate array (FPGA) was investigated as having the possibility to improve the efficiency of the SA-based algorithm. All supporting subsystems were also described for a hardware accelerated model. A large speedup was analytically shown from both accelerating the critical path of the SA algorithm as well as novel methods of improving SA\u27s efficiency. As data throughput requirements were not included in this work, the results presented may be optimistic for an overall system speedup. However, the results clearly show that future work is warranted in studying the concept of a hardware accelerated placement system

    Speeding-Up Expensive Evaluations in High-Level Synthesis Using Solution Modeling and Fitness Inheritance

    Get PDF
    High-Level Synthesis (HLS) is the process of developing digital circuits from behavioral specifications. It involves three interdependent and NP-complete optimization problems: (i) the operation scheduling, (ii) the resource allocation, and (iii) the controller synthesis. Evolutionary Algorithms have been already effectively applied to HLS to find good solution in presence of conflicting design objectives. In this paper, we present an evolutionary approach to HLS that extends previous works in three respects: (i) we exploit the NSGA-II, a multi-objective genetic algorithm, to fully automate the design space exploration without the need of any human intervention, (ii) we replace the expensive evaluation process of candidate solutions with a quite accurate regression model, and (iii) we reduce the number of evaluations with a fitness inheritance scheme. We tested our approach on several benchmark problems. Our results suggest that all the enhancements introduced improve the overall performance of the evolutionary search

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    Simulated annealing based datapath synthesis

    Get PDF

    빠른 성능조건 만족을 위한 임계경로를 고려하는 상위 수준 합성

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 최기영.Rapid advancement of process technology enables designers to integrate various functions onto a single chip and to realize diverse requirements of customers, but productivity of system designers has improved too slowly to make optimal design in time-to-market. Since designing at higher levels of abstraction reduces the number of design instances to be considered to acquire an optimal design, it improves quality of system as well as reduces design time and cost. High-level synthesis, which maps behavioral description models to register-transfer models, can improve design productivity drastically, and thus, it has been one of the important issues in electronic system level design. Centralized controllers commonly used in high-level synthesis often require long wires and cause high load capacitance, and that is why critical paths typically occur on paths from controllers to data registers instead of paths from data registers to data registers. However, conventional high-level synthesis has focused on delays within a datapath, making it difficult to solve the timing closure problem during physical synthesis. This thesis presents hardware architecture with a distributed controller, which makes the timing closure problem much easier. A novel critical-path-aware high-level synthesis flow is also presented for synthesizing such hardware through datapath partitioning, register binding, and controller optimization. We explore the design space related to the number of partitions, which is an important design parameter for target architecture. According to our experiments, the proposed approach reduces the critical path delay excluding FUs by 29.3% and that including FUs by 10.0%, with 2.2% area overhead on average compared to centralized controller architecture. We also propose two approaches, clock gating and register constrained flow, to alleviate high peak current problem which is caused by the proposed approach. These approaches suppress the peak current overhead to keep it less than 3.6%.Chapter 1 Introduction 1 Chapter 2 Background 7 2.1 High-level Synthesis 7 2.2 Subtasks of High-level Synthesis 8 2.2.1 Operation Scheduling and FU Binding 8 2.2.2 Register Binding 10 2.2.3 Controller Synthesis 11 2.2.4 Functional Pipelining Technique for High-level Synthesis 11 2.3 Centralized Controller Architecture 12 2.4 Design Closure Problem in High-level Synthesis 15 2.5 Thesis Contribution 18 Chapter 3 Target Architecture and Overall flow 21 3.1 Target Architecture 21 3.2 Overall flow 23 Chapter 4 Critical-Path-Aware Datapath Partitioning 27 4.1 Introduction 27 4.2 Problem Formulation 30 4.3 Proposed Algorithm 32 4.4 Exploring Design Space for the Number of Partitions 36 Chapter 5 Critical-Path-Aware Register Binding 39 5.1 Introduction 39 5.2 Problem Formulation 40 5.3 Proposed Algorithm 43 Chapter 6 Critical-Path-Aware Controller Optimization 49 6.1 Introduction 49 6.2 Problem Formulation 50 6.3 Proposed Algorithm 55 Chapter 7 Evaluation 63 7.1 Experimental Setup 63 7.2 Design Parameters and Computation Time 66 7.3 Analysis Critical Path Delay on Distributed Controller Architecture 68 7.4 Analysis of Performance and Area 70 7.5 Energy Consumption 78 7.6 Analysis on Register Overhead 80 7.6.1 Clock Gating Approach 81 7.6.2 Register Constrained Approach 84 7.6.3 Combined Approach 86 7.7 Join to Conventional Optimization Techniques on HLS 87 7.8 Comparison with DRFM Binding Approach 87 Chapter 8 Conclusion and Future Work 89 8.1 Summary 89 8.2 Future Work 90 Bibliography 93 Abstract in Korean 103Docto
    corecore