73 research outputs found

    IP-Enabled C/C++ Based High Level Synthesis: A Step towards Better Designer Productivity and Design Performance

    Get PDF
    Intellectual property (IP) core based design is an emerging design methodology to deal with increasing chip design complexity. C/C++ based high level synthesis (HLS) is also gaining traction as a design methodology to deal with increasing design complexity. In the work presented here, we present a design methodology that combines these two individual methodologies and is therefore more powerful. We discuss our proposed methodology in the context of supporting efficient hardware synthesis of a class of mathematical functions without altering original C/C++ source code. Additionally, we also discuss and propose methods to integrate legacy IP cores in existing HLS flows. Relying on concepts from the domains of program recognition and optimized low level implementations of such arithmetic functions, the described design methodology is a step towards intelligent synthesis where application characteristics are matched with specific architectural resources and relevant IP cores in a transparent manner for improved area-delay results. The combined methodology is more aware of the target hardware architecture than the conventional HLS flow. Implementation results of certain compute kernels from a commercial tool Vivado-HLS as well as proposed flow are also compared to show that proposed flow gives better results

    Knapsack Model and Algorithm for Hardware/Software Partitioning Problem

    Get PDF
    Efficient hardware/software partitioning is crucial towards realizing optimal solutions for constraint driven embedded systems. The size of the total solution space is typically quite large for this problem. In this paper, we show that the knapsack model could be employed for the rapid identification of hardware components that provide for time efficient implementations. In particular, we propose a method to split the problem into standard 0-1 knapsack problems in order to leverage on the classical approaches. The proposed method relies on the tight lower and upper bounds for each of these knapsack problems for the rapid elimination of the sub-problems, which are guaranteed not to give optimal results. Experimental results show that, for problem sizes ranging from 30 to 3000, the optimal solution of the whole problem can be obtained by solving only 1 sub-problem except for one case where it required the solution of 3 sub-problems

    Rapid evaluation of custom instruction selection approaches with FPGA estimation

    Get PDF
    The main aim of this article is to demonstrate that a fast and accurate FPGA estimation engine is indispensable in design flows for custom instruction (template) selection. The need for a FPGA estimation engine stems from the difficulty in predicting the FPGA performance measures of selected custom instructions. We will present a FPGA estimation technique that partitions the high-level representation of custom instructions into clusters based on the structural organization of the target FPGA, while taking into account general logic synthesis principles adopted by FPGA tools. In this work, we have evaluated a widely used graph covering algorithm with various heuristics for custom instruction selection. In addition, we present an algorithm called Refined Largest Fit First (RLFF) that relies on a graph covering heuristic to select non-overlapping superset templates, which typically incorporate frequently used basic templates. The initial solution is further refined by considering overlapping templates that were ignored previously to see if their introduction could lead to higher performance. While RLFF provides the most efficient cover compared to the ILP method and other graph covering heuristics, FPGA estimation results reveals that RLFF leads to the worst performance in certain applications. It is therefore a worthy proposition to equip design flows with accurate FPGA estimation in order to rapidly determine the most profitable custom instruction approach for a given application.</jats:p

    Longevity Framework: Leveraging Online Integrated Aging-Aware Hierarchical Mapping and VF-Selection for Lifetime Reliability Optimization in Manycore Processors

    Get PDF
    Rapid device aging in the nano era threatens system lifetime reliability, posing a major intrinsic threat to system functionality. Traditional techniques to overcome the aging-induced device slowdown, such as guardbanding are static and incur performance, power, and area penalties. In a manycore processor, the system-level design abstraction offers dynamic opportunities through the control of task-to-core mappings and per-core operation frequency towards more balanced core aging profile across the chip, optimizing the system lifetime reliability while meeting the application performance requirements. This article presents Longevity Framework (LF) that leverages online integrated aging-aware hierarchical mapping and voltage frequency (VF)-selection for lifetime reliability optimization in manycore processors. The mapping exploration is hierarchical to achieve scalability. The VF-selection builds on the trade-offs involved between power, performance, and aging as the VF is scaled while leveraging the per-core DVFS capabilities. The methodology takes the chip-wide process variation into account. Extensive experimentation, comparing the proposed approach with two state-of-the-art methods, for 64-core and 256-core systems running applications from PARSEC and SPLASH-2 benchmark suites, show an improvement of up to 3.2 years in the system lifetime reliability and 4Ă— improvement in the average core health

    CHiPES

    No full text
    5 p.The objective of the RGM 39/01 was to provide funding for manpower at the Centre for High Performance Embedded Systems (CHiPES). The funding was utilized to cover the manpower cost of two Research Associates, namely Mr Lam Siew Kei and Mr George Rosario Jagadeesh.RGM 39/0

    Embedded accelerators

    No full text
    The main objective of this project was to develop techniques for accerlaration of algorithms for embedded systems. To achieve this objective, methodologies were developed to perform architecture centric algorithm development. In addition techniques were developed to exploit VLSI arrays for embedded acceleration. In the developed methodologies, algorithms were developed from the perspective of embedded architectures. Developed techniques were applied to algorithms for image processing. Major improvements were observed in the performance of these algorithms in embedded systems environment.RGM 49/0

    Power sensitive techniques for high productivity embedded systems

    No full text
    Energy consumption is a major issue in modern day embedded applications. With the cache memory consuming about 50% of the total energy expended in these systems, predictor based filter cache hierarchies have been introduced to reduce the energy consumption of the instruction cache by leveraging on a smaller cache to store the many tiny loops inherent in embedded applications. In light of this, there exists a need to identify the optimal filter cache and L1 cache size for an embedded application. In this work, we introduce a framework for systematic tuning of predictor based instruction cache hierarchies without the need for exhaustive memory hierarchy simulation. Simulations based on programs from the MiBench benchmark suite shows that the proposed framework is capable of identifying optimal cache sizes due to its sensitivity to spatial and temporal locality. The exploration using the proposed techniques is also notably faster when compared to exhaustive design space exploration for identifying optimal cache sizes as it relies on only a one-time simulation. Instruction set customization is fast becoming a preferred approach to meet the performance requirements of embedded applications. It is of interest to examine the implications on the overall energy-delay product reduction when a combined optimization through cache hierarchy tuning and instruction set customization is performed.RGM 24/0

    Seed funding for strategic research @ RTP (research manpower­)

    No full text
    Modern voice authentication systems perform extremely well on large population high quality clean speech databases. While novel algorithms can be designed to provide performance and accuracy, the performance degrades rapidly in the presence of noise. Noise introduces a mismatch between the verification utterance and the speaker template which causes unpredictable scores leading to performance degradation. Our research attempts to address the problem of mismatch condition caused by additive noise. We have proposed novel algorithms for noise compensation in the speaker model domain and demonstrated their efficiency on TIMIT database corrupted with additive noise. Subsequently, we have combined the proposed algorithm with spectral subtraction method to further improve the performance of the authentication have been successfully translated to dedicated hardware architecture and prototyped on FPGA-based platforms.RGM 11/0

    Online map-matching of noisy and sparse location data with hidden Markov and route choice models

    No full text
    With the growing use of crowdsourced location data from smartphones for transportation applications, the task of map-matching raw location sequence data to travel paths in the road network becomes more important. High-frequency sampling of smartphone locations using accurate but power-hungry positioning technologies is not practically feasible as it consumes an undue amount of the smartphone’s bandwidth and battery power. Hence, there exists a need to develop robust algorithms for map matching inaccurate and sparse location data in an accurate and timely manner. This paper addresses the above need by presenting a novel map matching solution that combines the widely-used approach based on a Hidden Markov Model (HMM) with the concept of drivers’ route choice. Our algorithm uses a HMM tailored for noisy and sparse data to generate partial map-matched paths in an online manner. We use a route choice model, estimated from real drive data, to reassess each HMM-generated partial path along with a set of feasible alternative paths. We evaluated the proposed algorithm with real-world as well as synthetic location data under varying levels of measurement noise and temporal sparsity. The results show that the map-matching accuracy of our algorithm is significantly higher than that of the state of the art, especially at high levels of noise.Accepted versio

    Psychoacoustic model compensation for robust speaker verification in environmental noise

    No full text
    We investigate the problem of speaker verification in noisy conditions in this paper. Our work is motivated by the fact that environmental noise severely degrades the performance of speaker verification systems. We present a model compensation scheme based on the psychoacoustic principles that adapts the model parameters in order to reduce the training and verification mismatch. To deal with scenarios where accurate noise estimation is difficult, a modified multiconditioning scheme is proposed. The new algorithm was tested on two speech databases. The first database is the TIMIT database corrupted with white and pink noise and the noise estimation is fairly easy in this case. The second database is the MIT Mobile Device Speaker Verification Corpus (MITMDSVC) containing realistic noisy speech data which makes the noise estimation difficult. The proposed scheme achieves significant performance gain over the baseline system in both cases
    • …
    corecore