668 research outputs found

    From FPGA to ASIC: A RISC-V processor experience

    Get PDF
    This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC

    Understanding multidimensional verification: Where functional meets non-functional

    Get PDF
    Abstract Advancements in electronic systems' design have a notable impact on design verification technologies. The recent paradigms of Internet-of-Things (IoT) and Cyber-Physical Systems (CPS) assume devices immersed in physical environments, significantly constrained in resources and expected to provide levels of security, privacy, reliability, performance and low-power features. In recent years, numerous extra-functional aspects of electronic systems were brought to the front and imply verification of hardware design models in multidimensional space along with the functional concerns of the target system. However, different from the software domain such a holistic approach remains underdeveloped. The contributions of this paper are a taxonomy for multidimensional hardware verification aspects, a state-of-the-art survey of related research works and trends enabling the multidimensional verification concept. Further, an initial approach to perform multidimensional verification based on machine learning techniques is evaluated. The importance and challenge of performing multidimensional verification is illustrated by an example case study

    Customized Nios II multi-cycle instructions to accelerate block-matching techniques

    Get PDF
    This study focuses on accelerating the optimization of motion estimation algorithms, which are widely used in video coding standards, by using both the paradigm based on Altera Custom Instructions as well as the efficient combination of SDRAM and On-Chip memory of Nios II processor. Firstly, a complete code profiling is carried out before the optimization in order to detect time leaking affecting the motion compensation algorithms. Then, a multi-cycle Custom Instruction which will be added to the specific embedded design is implemented. The approach deployed is based on optimizing SOC performance by using an efficient combination of On-Chip memory and SDRAM with regards to the reset vector, exception vector, stack, heap, read/write data (.rwdata), read only data (.rodata), and program text (.text) in the design. Furthermore, this approach aims to enhance the said algorithms by incorporating Custom Instructions in the Nios II ISA. Finally, the efficient combination of both methods is then developed to build the final embedded system. The present contribution thus facilitates motion coding for low-cost Soft-Core microprocessors, particularly the RISC architecture of Nios II implemented in FPGA. It enables us to construct an SOC which processes 50Ă—50 @ 180 fps

    On thermal sensor calibration and software techniques for many-core thermal management

    Get PDF
    The high power density of a many-core processor results in increased temperature which negatively impacts system reliability and performance. Dynamic thermal management applies thermal-aware techniques at run time to avoid overheating using temperature information collected from on-chip thermal sensors. Temperature sensing and thermal control schemes are two critical technologies for successfully maintaining thermal safety. In this dissertation, on-line thermal sensor calibration schemes are developed to provide accurate temperature information. Software-based dynamic thermal management techniques are proposed using calibrated thermal sensors. Due to process variation and silicon aging, on-chip thermal sensors require periodic calibration before use in DTM. However, the calibration cost for thermal sensors can be prohibitively high as the number of on-chip sensors increases. Linear models which are suitable for on-line calculation are employed to estimate temperatures at multiple sensor locations using performance counters. The estimated temperature and the actual sensor thermal profile show a very high similarity with correlation coefficient ~0.9 for SPLASH2 and SPEC2000 benchmarks. A calibration approach is proposed to combine potentially inaccurate temperature values obtained from two sources: thermal sensor readings and temperature estimations. A data fusion strategy based on Bayesian inference, which combines information from these two sources, is demonstrated. The result shows the strategy can effectively recalibrate sensor readings in response to inaccuracies caused by process variation and environmental noise. The average absolute error of the corrected sensor temperature readings is A dynamic task allocation strategy is proposed to address localized overheating in many-core systems. Our approach employs reinforcement learning, a dynamic machine learning algorithm that performs task allocation based on current temperatures and a prediction regarding which assignment will minimize the peak temperature. Our results show that the proposed technique is fast (scheduling performed in \u3c1 \u3ems) and can efficiently reduce peak temperature by up to 8 degree C in a 49-core processor (6% on average) versus a leading competing task allocation approach for a series of SPLASH-2 benchmarks. Reinforcement learning has also been applied to 3D integrated circuits to allocate tasks with thermal awareness

    DESTINY: A Comprehensive Tool with 3D and Multi-Level Cell Memory Modeling Capability

    Get PDF
    To enable the design of large capacity memory structures, novel memory technologies such as non-volatile memory (NVM) and novel fabrication approaches, e.g., 3D stacking and multi-level cell (MLC) design have been explored. The existing modeling tools, however, cover only a few memory technologies, technology nodes and fabrication approaches. We present DESTINY, a tool for modeling 2D/3D memories designed using SRAM, resistive RAM (ReRAM), spin transfer torque RAM (STT-RAM), phase change RAM (PCM) and embedded DRAM (eDRAM) and 2D memories designed using spin orbit torque RAM (SOT-RAM), domain wall memory (DWM) and Flash memory. In addition to single-level cell (SLC) designs for all of these memories, DESTINY also supports modeling MLC designs for NVMs. We have extensively validated DESTINY against commercial and research prototypes of these memories. DESTINY is very useful for performing design-space exploration across several dimensions, such as optimizing for a target (e.g., latency, area or energy-delay product) for a given memory technology, choosing the suitable memory technology or fabrication method (i.e., 2D v/s 3D) for a given optimization target, etc. We believe that DESTINY will boost studies of next-generation memory architectures used in systems ranging from mobile devices to extreme-scale supercomputers. The latest source-code of DESTINY is available from the following git repository: https://bitbucket.org/sparsh_mittal/destiny_v2

    An efficient design space exploration framework to optimize power-efficient heterogeneous many-core multi-threading embedded processor architectures

    Get PDF
    By the middle of this decade, uniprocessor architecture performance had hit a roadblock due to a combination of factors, such as excessive power dissipation due to high operating frequencies, growing memory access latencies, diminishing returns on deeper instruction pipelines, and a saturation of available instruction level parallelism in applications. An attractive and viable alternative embraced by all the processor vendors was multi-core architectures where throughput is improved by using micro-architectural features such as multiple processor cores, interconnects and low latency shared caches integrated on a single chip. The individual cores are often simpler than uniprocessor counterparts, use hardware multi-threading to exploit thread-level parallelism and latency hiding and typically achieve better performance-power figures. The overwhelming success of the multi-core microprocessors in both high performance and embedded computing platforms motivated chip architects to dramatically scale the multi-core processors to many-cores which will include hundreds of cores on-chip to further improve throughput. With such complex large scale architectures however, several key design issues need to be addressed. First, a wide range of micro- architectural parameters such as L1 caches, load/store queues, shared cache structures and interconnection topologies and non-linear interactions between them define a vast non-linear multi-variate micro-architectural design space of many-core processors; the traditional method of using extensive in-loop simulation to explore the design space is simply not practical. Second, to accurately evaluate the performance (measured in terms of cycles per instruction (CPI)) of a candidate design, the contention at the shared cache must be accounted in addition to cycle-by-cycle behavior of the large number of cores which superlinearly increases the number of simulation cycles per iteration of the design exploration. Third, single thread performance does not scale linearly with number of hardware threads per core and number of cores due to memory wall effect. This means that at every step of the design process designers must ensure that single thread performance is not unacceptably slowed down while increasing overall throughput. While all these factors affect design decisions in both high performance and embedded many-core processors, the design of embedded processors required for complex embedded applications such as networking, smart power grids, battlefield decision-making, consumer electronics and biomedical devices to name a few, is fundamentally different from its high performance counterpart because of the need to consider (i) low power and (ii) real-time operations. This implies the design objective for embedded many-core processors cannot be to simply maximize performance, but improve it in such a way that overall power dissipation is minimized and all real-time constraints are met. This necessitates additional power estimation models right at the design stage to accurately measure the cost and reliability of all the candidate designs during the exploration phase. In this dissertation, a statistical machine learning (SML) based design exploration framework is presented which employs an execution-driven cycle- accurate simulator to accurately measure power and performance of embedded many-core processors. The embedded many-core processor domain is Network Processors (NePs) used to processed network IP packets. Future generation NePs required to operate at terabits per second network speeds captures all the aspects of a complex embedded application consisting of shared data structures, large volume of compute-intensive and data-intensive real-time bound tasks and a high level of task (packet) level parallelism. Statistical machine learning (SML) is used to efficiently model performance and power of candidate designs in terms of wide ranges of micro-architectural parameters. The method inherently minimizes number of in-loop simulations in the exploration framework and also efficiently captures the non-linear interactions between the micro-architectural design parameters. To ensure scalability, the design space is partitioned into (i) core-level micro-architectural parameters to optimize single core architectures subject to the real-time constraints and (ii) shared memory level micro- architectural parameters to explore the shared interconnection network and shared cache memory architectures and achieves overall optimality. The cost function of our exploration algorithm is the total power dissipation which is minimized, subject to the constraints of real-time throughput (as determined from the terabit optical network router line-speed) required in IP packet processing embedded application

    Analysis of the reconfiguration latency and energy overheads for a Xilinx Virtex-5 FPGA

    Get PDF
    In this paper we have evaluated the overhead and the tradeoffs of a set of components usually included in a system with run-time partial reconfiguration implemented on a Xilinx Virtex-5. Our analysis shows the benefits of including a scratchpad memory inside the reconfiguration controller in order to improve the efficiency of the reconfiguration process. We have designed a simple controller for this scratchpad that includes support for prefetching and caching in order to further reduce both the energy and latency overhead
    • …
    corecore