Search CORE

2,378 research outputs found

Recommended from our members

Data-dependent cycle-accurate power modeling of RTL-level IPs using machine learning

Author: Srour Malek
Publication venue
Publication date: 07/08/2018
Field of study

In a chip design project, early design planning has a strong impact on the schedule and the cost of design. Power estimation is part of early design planning, and it greatly affects design decisions. Power modeling performed at a high level of abstraction is fast but inaccurate due to lack of circuit switching activity information. By contrast, power modeling performed at a low level of abstraction is more accurate as the synthesized circuit synthesis is known, but this simulation is typically slow. This report explores a power modeling approach performed at register transfer level (RTL). It exploits machine learning models in order to have a fast yet relatively accurate cycle-by-cycle power estimation. The approach is data-dependent, where cycle-specific models are trained based on the switching activity of signals obtained from RTL simulation and cycle-by-cycle power values obtained from a reference gate-level simulation of an existing RTL design. Therefore, if any changes are applied to the RTL design, re-training of models is required. The approach aims at obtaining fast yet accurate power predictions for new invocations of a given trained model using signal activity information collected during simulation of the unmodified RTL. At a low level, the complete visibility of signals in a design unintuitively might cause overtraining the model leading to inaccurate estimation. The suggested model employs automatic feature selection in each cycle. Based on the invocations used to train the cycle-by-cycle models, only signals that may switch during a given cycle will be selected as the features for their respective cycle-specific model. The method was tested on an 8-by-8 DCT design and the power estimates were within 6.5% of those from a commercial power analysis tool. This report also simulates and compares the approach of cycle-specific models to the approach of a single global model for all cycles and show that the cycle-specific approach is twice as accurate.Electrical and Computer Engineerin

Texas ScholarWorks

Large-scale linear regression: Development of high-performance routines

Author: Bientinesi Paolo
Fabregat-Traver Diego
Frank Alvaro
Publication venue
Publication date: 01/01/2015
Field of study

In statistics, series of ordinary least squares problems (OLS) are used to study the linear correlation among sets of variables of interest; in many studies, the number of such variables is at least in the millions, and the corresponding datasets occupy terabytes of disk space. As the availability of large-scale datasets increases regularly, so does the challenge in dealing with them. Indeed, traditional solvers---which rely on the use of black-box" routines optimized for one single OLS---are highly inefficient and fail to provide a viable solution for big-data analyses. As a case study, in this paper we consider a linear regression consisting of two-dimensional grids of related OLS problems that arise in the context of genome-wide association analyses, and give a careful walkthrough for the development of {\sc ols-grid}, a high-performance routine for shared-memory architectures; analogous steps are relevant for tailoring OLS solvers to other applications. In particular, we first illustrate the design of efficient algorithms that exploit the structure of the OLS problems and eliminate redundant computations; then, we show how to effectively deal with datasets that do not fit in main memory; finally, we discuss how to cast the computation in terms of efficient kernels and how to achieve scalability. Importantly, each design decision along the way is justified by simple performance models. {\sc ols-grid} enables the solution of

10^{11}

correlated OLS problems operating on terabytes of data in a matter of hours

arXiv.org e-Print Archive

Publikationsserver der RWTH Aachen University

Recommended from our members

Learning-based system-level power modeling of hardware IPs

Author: Lee Dongwook
Publication venue
Publication date: 18/12/2017
Field of study

Accurate power models for hardware components at high levels of abstraction are a critical component to enable system-level power analysis and optimization. Virtual platform prototypes are widely utilized to support early system-level design space exploration. There is, however, a lack of accurate and fast power models of hardware components at such high-levels of abstraction. In this dissertation, we present novel learning‑based approaches for extending fast functional simulation models of white-, gray-, and black-box custom hardware intellectual property components (IPs) with accurate power estimates. Depending on the observability, we extend high-level functional models with the capability to capture data-dependent resource, block, or I/O activity without a significant loss in simulation speed. We further leverage state-of-the-art machine learning techniques to synthesize abstract power models that can predict cycle-, block-, and invocation-level power from low-level hardware implementations, where we introduce novel structural decomposition techniques to reduce model complexities and increase estimation accuracy. Our white-box approach integrates with existing high-level synthesis (HLS) tools to automatically extract resource mapping information, which is used to trace data-dependent resource-level activity and drive a cycle-accurate online power-performance model during functional simulation. Our gray-box approach supports power estimation at coarser basic block granularity. It uses only limited information about block inputs and outputs to extract light-weight block-level activity from a functional simulation and drive a basic block-level power model that utilizes a control flow decomposition to improve accuracy and speed. It is faster than cycle-level models, while providing a finer granularity than invocation-level models, which allows to further navigate accuracy and speed trade-offs. We finally propose a novel approach for extending behavioral models of black-box hardware IPs with an invocation-level power estimate. Our black-box model only uses input and output history to track data-dependent pipeline behavior, where we introduce a specialized ensemble learning that is composed out of individually selected cycle-by-cycle models with reduced complexity and increased accuracy. The proposed approaches are fully automated by integrating with existing, commercial HLS tools for custom hardware synthesized by HLS. Results of applying our approaches to various industrial‑strength design examples show that our power models can predict cycle‑, basic block-, and invocation-level power consumption to within 10%, 9%, and 3% of a commercial gate-level power estimation tool, respectively, all while running at several order of magnitude faster speeds of 1-10Mcycles/sec.Electrical and Computer Engineerin

Texas ScholarWorks

HL-Pow: A Learning-Based Power Modeling Framework for High-Level Synthesis

Author: Lin Zhe
Sinha Sharad
Zhang Wei
Zhao Jieru
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/09/2020
Field of study

High-level synthesis (HLS) enables designers to customize hardware designs efficiently. However, it is still challenging to foresee the correlation between power consumption and HLS-based applications at an early design stage. To overcome this problem, we introduce HL-Pow, a power modeling framework for FPGA HLS based on state-of-the-art machine learning techniques. HL-Pow incorporates an automated feature construction flow to efficiently identify and extract features that exert a major influence on power consumption, simply based upon HLS results, and a modeling flow that can build an accurate and generic power model applicable to a variety of designs with HLS. By using HL-Pow, the power evaluation process for FPGA designs can be significantly expedited because the power inference of HL-Pow is established on HLS instead of the time-consuming register-transfer level (RTL) implementation flow. Experimental results demonstrate that HL-Pow can achieve accurate power modeling that is only 4.67% (24.02 mW) away from onboard power measurement. To further facilitate power-oriented optimizations, we describe a novel design space exploration (DSE) algorithm built on top of HL-Pow to trade off between latency and power consumption. This algorithm can reach a close approximation of the real Pareto frontier while only requiring running HLS flow for 20% of design points in the entire design space.Comment: published as a conference paper in ASP-DAC 202

arXiv.org e-Print Archive

Crossref

Using similitude theory and discrete element modeling to understand the effects of digging parameters on excavation performance for rubber tire loaders

Author: Ur Rehman Atta
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2021
Field of study

The large sizes of mining equipment pose challenges for analysis using experiments or simulation. While scaled physical and simulation models can address this challenge, no previous work has explored how similitude theory and modeling can provide valid analysis of large equipment such as rubber tire loaders. The objective of this research was to apply similitude theory and discrete element modeling (DEM) to study the effect of different digging parameters on the penetration and the draft on the buckets of rubber tire loaders. The work sought to (1) test the hypothesis that the geometry of a rubber tire loader bucket and operating conditions significantly affects the resistive force (draft) and penetration; (2) test the hypothesis that different geometry orientations and operating conditions of a rubber tire loader bucket significantly affects draft and penetration; (3) apply DEM to scale models of rubber tire loader buckets to understand the effect of bucket geometry, orientations, and operating conditions on draft and penetration; and (4) evaluate the effectiveness of using discrete element models and similitude theory to predict draft and penetration. The results show that geometry, muckpile particle sizes, height above the floor, rake angle, speed, and motor power output are correlated to penetration and draft. This work has demonstrated that we can build valid DEM models for predicting at a larger scale. The chamfer angle of semi-spade bucket cutting blades significantly affects the draft on the buckets and 30° chamfer cut angle performs the best with the lowest peak resistive forces and energy consumption. The work finds that the forces observed during the rotation phase of the simulation are lower than the observed forces during penetration --Abstract, page iii

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

A survey on run-time power monitors at the edge

Author: Andrea Galimberti
Davide Zoni
William Fornaciari
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2023
Field of study

Effectively managing energy and power consumption is crucial to the success of the design of any computing system, helping mitigate the efficiency obstacles given by the downsizing of the systems while also being a valuable step towards achieving green and sustainable computing. The quality of energy and power management is strongly affected by the prompt availability of reliable and accurate information regarding the power consumption for the different parts composing the target monitored system. At the same time, effective energy and power management are even more critical within the field of devices at the edge, which exponentially proliferated within the past decade with the digital revolution brought by the Internet of things. This manuscript aims to provide a comprehensive conceptual framework to classify the different approaches to implementing run-time power monitors for edge devices that appeared in literature, leading the reader toward the solutions that best fit their application needs and the requirements and constraints of their target computing platforms. Run-time power monitors at the edge are analyzed according to both the power modeling and monitoring implementation aspects, identifying specific quality metrics for both in order to create a consistent and detailed taxonomy that encompasses the vast existing literature and provides a sound reference to the interested reader

Archivio istituzionale della ricerca - Politecnico di Milano