23 research outputs found
Dynamic power dissipation formulation for application in dynamic programming buffer insertion algorithm
Buffer insertion is a very effective technique to reduce propagation delay in nano-metre VLSI interconnects. There are two techniques for buffer insertion which are: (1) closed-form solution and (2) dynamic programming. Buffer insertion algorithm using dynamic programming is more useful than the closed-form solution as it allows the use of multiple buffer types and it can be used in tree structured interconnects. As design dimension shrinks, more buffers are needed to improve timing performance. However, the buffer itself consumes power and it has been shown that power dissipation of buffers is significant. Although there are many buffer insertion algorithms that were able to optimize propagation delay with power constraint, most of them used the closed-form solution. Hence, in this paper, we present a formulation to compute dynamic power dissipation of buffers for application in dynamic programming buffer insertion algorithm. The proposed formulation allows dynamic power dissipation of buffers to be computed incrementally. The technique is validated by comparing the formulation with the standard closed-form dynamic power equation. The advantage of the proposed formulation is demonstrated through a series of experiments where it is applied in van Ginneken’s algorithm. The results show that the output of the proposed formulation is consistent with the standard closed-form formulation. Furthermore, it also suggests that the proposed formulation is able to compute dynamic power dissipation for buffer insertion algorithm with multiple buffer types
Reconfigurable Logic Embedded Architecture of Support Vector Machine Linear Kernel
Support Vector Machine (SVM) is a linear binary classifier that requires a kernel function to handle non-linear problems. Most previous SVM implementations for embedded systems in literature were built targeting a certain application; where analyses were done through comparison with software im- plementations only. The impact of different application datasets towards SVM hardware performance were not analyzed. In this work, we propose a parameterizable linear kernel architecture that is fully pipelined. It is prototyped and analyzed on Altera Cyclone IV platform and results are verified with equivalent software model. Further analysis is done on determining the effect of the number of features and support vectors on the performance of the hardware architecture. From our proposed linear kernel implementation, the number of features determine the maximum operating frequency and amount of logic resource utilization, whereas the number of support vectors determines the amount of on-chip memory usage and also the throughput of the system
Energy-Aware Network-on-Chip Application Mapping Based on Domain Knowledge Genetic Algorithm
This paper addresses energy-aware application mapping for large-scale Network-on-chip (NoC). The increasing number of intellectual property (IP) cores in multi-processor system-on-chips (MPSoCs) makes NoC application mapping more challenging to find optimum core-to-topology mapping. This paper proposes an application mapping technique that incorporates domain knowledge into genetic algorithm (GA) to minimize the energy consumption of NoC communication. The GA is initialized with knowledge on network partition whereas the genetic crossover operator is guided with inter-core communication demands. NoC energy estimation is based on analytical energy model and cycle-accurate Noxim simulation. For large-scale NoC, application mapping using knowledge-based genetic operator saves up to 28% energy compared to the one on conventional GA. Adding knowledge-based initial mapping speeds up convergence by 81% and further saves energy by 5% compared to only knowledge-based crossover GA. Furthermore, cycle-accurate simulations of applications with traffic dependency show the effectiveness of the proposed application mapping for large-scale NoC
Configurable Version Management Hardware Transactional Memory for Multi-processor Platform
Programming on a shared memory multi-processor platforms in an efficient way is difficult as locked based synchronization limits the efficiency. Transactional memory (TM) is a promising approach in creating an abstraction layer for multi-threaded programming. However, the performance of TM is application-specific. In general, the configuration of a TM is divided into version management and conflict management. Each scheme has its strengths and weaknesses depending on executing application. Previous TM implementations for embedded system were built on fixed version management configuration which results in significant performance loss when transaction behaviour changes. In this paper, we propose a hardware transactional memory (HTM) with interchangeable version management. Random requests at different contention levels are used to verify the performance of the proposed TM. The proposed architecture is targeted for embedded applications and is area-efficient compared to current implementations that apply cache coherence protocols
Application profiling and mapping on NoC-based MPSoC emulation platform on reconfigurable logic
In network-on-chip (NoC) based multi-processor system-on-chip (MPSoC) development, application profiling is one of the most crucial step during design time to search and explore optimal mapping. Conventional mapping exploration methodologies analyse application-specific graphs by estimating its runtime behaviour using analytical or simulation models. However, the former does not replicate the actual application run-time performance while the latter requires significant amount of time for exploration. To map applications on a specific MPSoC platform, the application behaviour on cycle-accurate emulated platform should be considered for obtaining better mapping quality. This paper proposes an application mapping methodology that utilizes a MPSoC prototyped in Field-Programmable Gate Array (FPGA). Applications are implemented on homogeneous MPSoC cores and their costs are analysed and profiled on the platform in term of execution time, intra-core communication and inter-core communication delays. These metrics are utilized in analytical evaluation of the application mapping. The proposed analytical-based mapping is demonstrated against the exhaustive brute force method. Results show that the proposed method is able to produce quality mappings compared to the ground truth solutions but in shorter evaluation time
Performance Evaluation of Centralized Reconfigurable Transmitting Power Scheme in Wireless Network-on-chip
Network-on-chip (NoC) is an on-chip communication network that allows parallel communication among all cores to improve inter-core performance. Wireless NoC (WiNoC) introduces long-range and high bandwidth radio frequency (RF) interconnects that can possibly reduce the multi-hop communication of the planar metal interconnects in conventional NoC platforms. In WiNoC, RF transceivers account for a significant power consumption, particularly its transmitter, out of its total communication energy. This paper evaluates the energy and latency performance of a closed loop power management mechanism which enables transmitting power reconfiguration in WiNoC based on number of erroneous received packets. The scheme achieves significant energy savings with limited performance degradation and insignificant impact on throughput
Hybrid routing tree with buffer insertion under obstacle constraints
Performance optimization in very-large-scale integration (VLSI) design is the key success in today's design automation methodologies. One of the performance issues is the interconnect delay in deep sub-micron VLSI circuits. The interconnect delay becomes more dominant compared to gate delay when the size of the gates is reduced. This paper presents an algorithm to optimize the timing performance of the routing tree under obstacle constraints. It is known that simultaneous routing and buffer insertion is proven to be NP-complete while the two-step approach may produce a poor solution. Therefore, we propose a hybrid algorithm that can modify a given routing tree simultaneously with buffer insertion. This paper describes this algorithm and we present experimental results that show the proposed algorithm can improve the timing of the routing tree significantly with low execution time
Interleaved incremental/decremental support vector machine for embedded system
Incremental and Decremental Support Vector Machine (IDSVM) is a widely used incremental learning algorithm that is highly accurate but requires high computational complexity. For IDSVM to be deployed in embedded systems, moving window architecture is needed to limit the number of support vectors in the model. This increases the complexity of the system as data need to be unlearned while learning new data. This work proposes an interleaved IDSVM (IIDSVM) architecture that performs incremental and decremental learning simultaneously. This work targets embedded system platform with limited on-chip memory. The proposed solution is able to get an improvement of 60%-70% in terms of speed while producing similar accuracy with IDSVM
An optimized buffer insertion algorithm with delay-power constraints for VLSI layouts
We propose a grid-graph algorithm for interconnect routing and buffer insertion in nanometer VLSI layout designs. The algorithm is designed to handle multiconstraint optimizations, namely timing performance and power dissipation. The proposed algorithm is called HRTB-LA, which stands for hybrid routing tree and buffer insertion with look-ahead. In recent VLSI designs, interconnect delay has become a dominant factor compared to gate delay. The well-known technique to minimize the interconnect delay is by inserting buffers along the interconnect wires. However, the buffer itself consumes power and it has been shown that power dissipation overhead due to buffer insertions is significantly high. Many methodologies to optimize timing performance with power constraint have been proposed, and no algorithm is based on dynamic programing technique using a grid graph. In addition, most of the algorithms for buffer insertion use a postrouting buffer insertion approach. In the presence of buffer obstacles, these postrouting algorithms may produce poor solutions. On the other hand, the simultaneous routing and buffer insertion algorithm offers a better solution, but it was proven to be NP complete. Hence, our main contribution is an efficient algorithm using a hybrid approach for multiconstraint optimization for multisink nets. The algorithm uses dynamic programming to compute incrementally the interconnect delay and power dissipation of the inserted buffers while an effective runtime is achieved with the aid of novel look-ahead and graph pruning schemes. Experimental results prove that HRTB-LA is able to handle multiconstraint optimizations and produces a solution up to 30% better compared to a postrouting buffer insertion algorithm in comparable runtime