Search CORE

446 research outputs found

CPU-GPU Layer-Switched Low Latency CNN Inference

Author: Aghapour E.
Pathania A.
Pimentel A.
Sapra D.
Publication venue: IEEE Computer Society
Publication date: 01/01/2022
Field of study

Convolutional Neural Networks (CNNs) inference on Heterogeneous Multi-Processor System-on-Chips (HMPSoCs) in edge devices represent cutting-edge embedded machine learning. Embedded CPU and GPU within an HMPSoC can both perform inference using CNNs. However, common practice is to run a CNN on the HMPSoC component (CPU or GPU) provides the best performance (lowest latency) for that CNN. CNNs are not monolithic and are composed of several layers of different types. Some of these layers have lower latency on the CPU, while others execute faster on the GPU. In this work, we investigate the reason behind this observation. We also propose an execution of CNN that switches between CPU and GPU at the layer granularity, wherein a CNN layer executes on the component that provides it with the lowest latency. Switching between the CPU and the GPU back and forth mid-inference introduces additional overhead (delay) in the inference. Regardless of overhead, we show in this work that a CPU-GPU layer switched execution results in, on average, having 4.72% lower CNN inference latency on the Khadas VIM 3 board with Amlogic A311D HMPSoC

International Migration, Integration and Social Cohesion online publications

UvA-DARE

PELSI: Power-Efficient Layer-Switched Inference

Author: Aghapour E.
Pathania A.
Pimentel A.D.
Sapra D.
Publication venue: IEEE Computer Society
Publication date: 01/01/2023
Field of study

Convolutional Neural Networks (CNNs) are now quintessential kernels within embedded computer vision applications deployed in edge devices. Heterogeneous Multi-Processor System-on-Chips (HMPSoCs) with Dynamic Voltage and Frequency Scaling (DVFS) capable components (CPUs and GPUs) allow for low-latency, low-power CNN inference on resource-constrained edge devices when employed efficiently. CNNs comprise several heterogeneous layer types that execute with different degrees of power efficiency on different HMPSoC components at different frequencies. We propose the first framework, PELSI, that exploits this layer-wise power efficiency heterogeneity for power-efficient CPU-GPU layer-switched CNN interference on HMPSoCs. PELSI executes each layer of a CNN on an HMPSoC component (CPU or GPU) clocked at just the right frequency for every layer such that the CNN meets its inference latency target with minimal power consumption while still accounting for the power-performance overhead of multiple switching between CPU and GPU mid-inference. PELSI incorporates a Genetic Algorithm (GA) to identify the near-optimal CPU-GPU layer-switched CNN inference configuration from within the large exponential design space that meets the given latency requirement most power efficiently. We evaluate PELSI on Rock-Pi embedded platform. The platform contains an RK3399Pro HMPSoC with DVFS-capable CPU clusters and GPU. Empirical evaluations with five different CNNs show a 44.48% improvement in power efficiency for CNN inference under PELSI over the state-of-the-art

International Migration, Integration and Social Cohesion online publications

3D-TTP: Efficient Transient Temperature-Aware Power Budgeting for 3D-Stacked Processor-Memory Systems

Author: Niknam S.
Pathania A.
Pimentel A.D.
Shen Y.
Publication venue
Publication date: 01/01/2023
Field of study

The heat produced during computation severely limits the performance of multi-/many-core processors. High-performance 3D-stacked processor-memory systems stack cores and main memory on a single die. However, 3D-stacked systems suffer more severe thermal issues than their non-stacked planar 2D counterparts. Consequently, the aggressive thermal throttling required for their thermally-safe operation limits the potential performance gains. Power budgeting is an effective thermal management technique that prevents thermal throttling in multi-/many-core processors by assigning a thermally-safe power budget to cores within the processors. State-of-the-art power budgeting techniques for 2D processors do not account for the vertical thermal coupling between the layers of the 3D-stacked system and will fail to prevent thermal throttling in them. Furthermore, estimating thermals for a 3D-stacked processor with power budgeting requires a finer-grained RC thermal model than non-stacked processors. This requirement inhibits the porting of existing power budgeting solutions for 2D processors to 3D-stacked processor-memory systems. This work is the first to present the linear algebra-based algorithmic time-invariant transformations required to enable power budgeting in 3D-stacked systems. Based on the transformations, we propose the first transient-temperature-aware power budgeting technique, 3D-TTP, for 3D-stacked systems. Detailed interval thermal simulations with the advanced CoMeT simulator designed for 3D-stacked systems also confirm no thermal violations with our 3D-TTP technique. 3D-TTP exhibits an average 11.41% speedup over the state-of-the-art reactive-based thermal management technique

International Migration, Integration and Social Cohesion online publications

Thermal Management for S-NUCA Many-Cores via Synchronous Thread Rotations

Author: Niknam S.
Pathania A.
Pimentel A.D.
Shen Y.
Publication venue
Publication date: 01/01/2023
Field of study

On-chip thermal management is quintessential to a thermally safe operation of a many-core processor. The presence of a physically distributed logically shared Last-Level Cache (LLC) significantly reduces the performance penalty of migrating threads within the cores of an S-NUCA many-core. This cost reduction allows novel thermal management of these many-cores via synchronous thread migration. Synchronous thread migration provides a viable alternative to Dynamic Voltage and Frequency Scaling (DVFS) and asynchronous thread migration used traditionally to manage thermals of S-NUCA many-cores. We present a theoretical method to compute the peak tem-perature in many-cores with synchronous thread migrations. We use the method to create a thermal management heuristic called HotPotato that maximizes the performance of S-NUCA many-cores under a peak temperature constraint. We implement HotPotato within the state-of-the-art HotSniper simulator. Detailed interval thermal simulations with HotSniper show an average 10.72% improvement in response time of S-NUCA many-cores when scheduling with HotPotato compared to a state-of-the-art thermal-aware S-NUCA scheduler

International Migration, Integration and Social Cohesion online publications

Thermal Management for 3D-Stacked Systems via Unified Core-Memory Power Regulation

Author: Pathania A.
Pimentel A.D.
Schreuders L.
Shen Y.
Publication venue
Publication date: 01/10/2023
Field of study

3D-stacked processor-memory systems stack memory (DRAM banks) directly on top of logic (CPU cores) using chiplet-on-chiplet packaging technology to provide the next-level computing performance in embedded platforms. Stacking, however, severely increases the system’s power density without any accompanying increase in the heat dissipation capacity. Consequently, 3D-stacked processor-memory systems suffer more severe thermal issues than their non-stacked counterparts. Nevertheless, 3D-stacked processor-memory systems do inherit power (thermal) management knobs from their non-stacked predecessors - namely Dynamic Voltage and Frequency Scaling (DVFS) for cores and Low Power Mode (LPM) for memory banks. In the context of 3D-stacked processor-memory systems, DVFS and LPM are performance- and power-wise deeply intertwined. Their non-unified independent use on 3D-stacked processor-memory systems results in sub-optimal thermal management. The unified use of DVFS and LPM for thermal management for 3D-stacked processor-memory systems remains unexplored. The lack of implementation of LPM in thermal simulators for 3D-stacked processor-memory systems hinders real-world representative evaluation for a unified approach.We extend the state-of-the-art interval thermal simulator for 3D-stacked processor-memory systems CoMeT with an LPM power management knob for memory banks. We also propose a learning-based thermal management technique for 3D-stacked processor-memory systems that employ DVFS and LPM in a unified manner. Detailed interval thermal simulations with the extended CoMeT framework show a 10.15% average response time improvement with the PARSEC and SPLASH-2 benchmark suites, along with widely-used Deep Neural Network (DNN) workloads against a state-of-the-art thermal management technique for 2.5D processor-memory systems (ported directly to 3D-stacked processor-memory systems) that also proposes unified use of DVFS and LPM

International Migration, Integration and Social Cohesion online publications

Estimating the Energy Consumption of Applications in the Computing Continuum with <i>iFogSim</i>

Author: Akesson B.
Baneshi S.
Pathania A.
Pimentel A.
Varbanescu A.-L.
Publication venue
Publication date: 01/01/2023
Field of study

Digital services - applications that often span the entire computing continuum - have become an essential part of our daily lives, but they can have a significant energy cost, raising sustainability concerns. The computing continuum features multiple distributed layers (edge, fog, and cloud) with specific computing infrastructure and scheduling decisions at each layer, which impact the overall quality of service and energy consumption of digital services. Measuring the energy consumption of such applications is challenging due to the distributed nature of the system and the application. As such, simulation techniques are promising solutions to estimate energy consumption, and several simulators are available for modeling the cloud and fog computing environment.In this paper, we investigate iFogSim’s effectiveness in analyzing the end-to-end energy consumption of applications in the computing continuum through two case studies. We design different scenarios for each case study to map application modules to devices along the continuum, including the Edge-Cloud collaboration architecture, and compare them with the two placement policies native to iFogSim: Cloud-only and Edge-ward policies. We observe iFogSim’s limitations in reporting energy consumption, and improve its ability to report energy consumption from an application’s perspective; this enables additional insight into an application’s energy consumption, thus enhancing the usability of iFogSim in evaluating the end-to-end energy consumption of digital services.</p

International Migration, Integration and Social Cohesion online publications

Impact of health education on knowledge and practices about menstruation among adolescent school girls of rural part of district Ambala, Haryana

Author: Arora A
Bunger R
Mehta C
Mittal A
Pathania D
Singh J
Publication venue: MRI Publication Pvt. Ltd.
Publication date: 31/12/2013
Field of study

Background: This study was undertaken to assess the impact of health education on knowledge regarding menstruation, misconceptions related to it as the prevalence of RTI is still very high in India.  Aims: To study the existing level of status of hygiene, knowledge and practices regarding menstruation among adolescent school girls and to assess the change in their knowledge level and practices after health education. Materials A community-based pre and post interventional study was conducted among 200 adolescents’ girls of class IX and X of rural part of district Ambala. Multistage random sampling technique was used to draw the representative sample. A pre-tested questionnaire was administered and later health education regarding menstruation and healthy menstrual practices was imparted to the girls. Post-test was done after 3 months to assess the impact of health education. Pre- and post-intervention, data were compared using the paired t test, z test for proportions, chi-squared test for paired proportions. Difference between Proportions of the pre-post data and its 95% confidence interval has been calculated of the findings. SPSS for Windows software version 20 (IBM, Chicago, USA) have been used for data analysis. The level of significance has been considered at p value < 0.05. Results: In the pre-test, menstrual perceptions amongst them were found to be poor and practices incorrect while in the post-test, there was a significant difference in the level of knowledge (P<0.05). There was no significant difference in pre and post-test with regard to restrictions followed during menses (P>0.05) while in the post-test preceding health education, significant improvements were observed in their practices. Conclusion: Overall significant improvement was found in knowledge and practices regarding menstruation among adolescent school girls

Indian Journal of Community Health

CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5 D, and 3D Processor-Memory Systems

Author: Henkel J.
Kedia R.
Panda P.R.
Pandey S.
Pathania A.
Rapp M.
Siddhu L.
Publication venue
Publication date: 16/03/2022
Field of study

Processing cores and the accompanying main memory working in tandem enable the modern processors. Dissipating heat produced from computation, memory access remains a significant problem for processors. Therefore, processor thermal management continues to be an active research topic. Most thermal management research takes place using simulations, given the challenges of measuring temperature in real processors. Since core and memory are fabricated on separate packages in most existing processors, with the memory having lower power densities, thermal management research in processors has primarily focused on the cores. Memory bandwidth limitations associated with 2D processors lead to high-density 2.5D and 3D packaging technology. 2.5D packaging places cores and memory on the same package. 3D packaging technology takes it further by stacking layers of memory on the top of cores themselves. Such packagings significantly increase the power density, making processors prone to heating. Therefore, mitigating thermal issues in high-density processors (packaged with stacked memory) becomes an even more pressing problem. However, given the lack of thermal modeling for memories in existing interval thermal simulation toolchains, they are unsuitable for studying thermal management for high-density processors. To address this issue, we present CoMeT, the first integrated Core and Memory interval Thermal simulation toolchain. CoMeT comprehensively supports thermal simulation of high- and low-density processors corresponding to four different core-memory configurations - off-chip DDR memory, off-chip 3D memory, 2.5D, and 3D. CoMeT supports several novel features that facilitate overlying system research. Compared to an equivalent state-of-the-art core-only toolchain, CoMeT adds only a ~5% simulation-time overhead. The source code of CoMeT has been made open for public use under the MIT license.Comment: https://github.com/marg-tools/CoMe

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Performance, Power and Cooling Trade-Offs with NCFET-based Many-Cores

Author: Henkel J.
Hoffmann M.
Krivokapic Z.
Landau L. D.
Pathania A.
Samal S. K.
Shin D.
Publication venue: Association for Computing Machinery
Publication date: 01/01/2019
Field of study

Negative Capacitance Field-Effect Transistor (NCFET) is an emerging technology that incorporates a ferroelectric layer within the transistor gate stack to overcome the fundamental limit of sub-threshold swing in transistors. Even though physics-based NCFET models have been recently proposed, system-level NCFET models do not exist and research is still in its infancy. In this work, we are the first to investigate the impact of NCFET on performance, energy and cooling costs in many-core processors. Our proposed methodology starts from accurate physics models all the way up to the system level, where the performance and power of a many-core are widely affected. Our new methodology and system-level models allow, for the first time, the exploration of the novel trade-offs between performance gains and power losses that NCFET now offers to system-level designers. We demonstrate that an optimal ferroelectric thickness does exist. In addition, we reveal that current state-of-the-art power management techniques fail when NCFET (with a thick ferroelectric layer) comes into play

Crossref

KITopen

Which individual, social, and urban factors in early childhood predict psychopathology in later childhood, adolescence and young adulthood? A systematic review

Author: Bennett K.F.
Bockting C.L
Breedvelt J.J.F.
Brouwer M.E.
Franzoi D.
Lee A.
Lucassen P.J.
Odom A.
Pathania A.
van de Schoot R.
Wiers R.W.
Publication venue
Publication date: 01/03/2024
Field of study

Background: A comprehensive picture is lacking of the impact of early childhood (age 0–5) risk factors on the subsequent development of mental health symptoms. Objective: In this systematic review, we investigated which individual, social and urban factors, experienced in early childhood, contribute to the development of lateranxiety and depression, behavioural problems, and internalising and externalising symptoms in youth. Methods: Embase, MEDLINE, Scopus, and PsycInfo were searched on the 5th of January 2022. Three additional databases were retrieved from a mega-systematic review source that focused on the identification of both risk and protective indicators for the onset and maintenance of prospective depressive, anxiety and substance use disorders. A total of 46,450 records were identified and screened in ASReview, an AI-aided systematic review tool. We included studies with experimental, quasi-experimental, prospective and longitudinal study designs, while studies that focused on biological and genetical factors, were excluded. Results: Twenty studies were included. The majority of studies explored individual-level risk factors (N = 16). Eleven studies also explored social risk factors and three studied urban risk factors. We found evidence for early predictors relating to later psychopathology measures (i.e., anxiety and depression, behavioural problems, and internalising and externalising symptoms) in childhood, adolescence and early adulthood. These were: parental psychopathology, exposure to parental physical and verbal violence and social and neighbourhood disadvantage. Conclusions: Very young children are exposed to a complex mix of risk factors, which operate at different levels and influence children at different time points. The urban environment appears to have an effect on psychopathology but it is understudied compared to individual-level factors. Moreover, we need more research exploring the interaction between individual, social and urban factor

International Migration, Integration and Social Cohesion online publications

UvA-DARE