Search CORE

32 research outputs found

SACR: Scheduling-Aware Cache Reconfiguration for Real-Time Embedded Systems

Author: Ann Gordon-ross
Prabhat Mishra
Weixun Wang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Dynamic reconfiguration techniques are widely used for efficient system optimization. Dynamic cache reconfiguration is a promising approach for reducing energy consumption as well as for improving overall system performance. It is a major challenge to introduce cache reconfiguration into real-time embedded systems since dynamic analysis may adversely affect tasks with real-time constraints. This paper presents a novel approach for implementing cache reconfiguration in soft real-time systems by efficiently leveraging static analysis during execution to both minimize energy and maximize performance. To the best of our knowledge, this is the first attempt to integrate dynamic cache reconfiguration in real-time scheduling techniques. Our experimental results using a wide variety of applications have demonstrated that our approach can significantly (up to 74%) reduce the overall energy consumption of the cache hierarchy in soft real-time systems. 1

CiteSeerX

Crossref

A survey on hardware-based malware detection approaches

Author: Di Carlo Stefano
Pegoraro Chenet Cristiano
Savino Alessandro
Publication venue: IEEE
Publication date: 01/01/2024
Field of study

This paper delves into the dynamic landscape of computer security, where malware poses a paramount threat. Our focus is a riveting exploration of the recent and promising hardware-based malware detection approaches. Leveraging hardware performance counters and machine learning prowess, hardware-based malware detection approaches bring forth compelling advantages such as real-time detection, resilience to code variations, minimal performance overhead, protection disablement fortitude, and cost-effectiveness. Navigating through a generic hardware-based detection framework, we meticulously analyze the approach, unraveling the most common methods, algorithms, tools, and datasets that shape its contours. This survey is not only a resource for seasoned experts but also an inviting starting point for those venturing into the field of malware detection. However, challenges emerge in detecting malware based on hardware events. We struggle with the imperative of accuracy improvements and strategies to address the remaining classification errors. The discussion extends to crafting mixed hardware and software approaches for collaborative efficacy, essential enhancements in hardware monitoring units, and a better understanding of the correlation between hardware events and malware applications

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Malicious loop detection using support vector machine

Author: Adda Mo
Allaf Zirak
Gegov Alexander
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/07/2019
Field of study

Crossref

Portsmouth University Research Portal (Pure)

Cache Sharing Administration for Performance Fairness using D3C Miss Classification in Chip Multi-Processors

Author: Carballal Claudio A.
Cernuschi-Frías Bruno
Hamkalo José Luis
Publication venue
Publication date: 04/10/2021
Field of study

This work presents a study of fairness in cache sharing between processes in a chip multiprocessor (CMP). We propose a new algorithm that uses a metric based on the D3C miss classification and LRU Stack Distance, to measure the fairness in the administration of the resources to achieve an increase of the global IPC of all executed processes. Shared cache miss rate, IPC and bandwidth metrics were considered to analyze the simulation results obtained using three test sets. The obtained results showed that the proposed dynamic management policy compared to Capitalist management policy, has a lower global miss rate in shared cache and lower bandwidth usage for each test set studied and fulfills its objective of managing the shared cache space for every process while improving the overall IPC.Sociedad Argentina de Informática e Investigación Operativ

Servicio de Difusión de la Creación Intelectual

Recommended from our members

Analysis of Super Fine-Grained Program Phases

Author: Bui Van
Kim Martha Allen
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

Dynamic reconfiguration systems guided by coarse-grained program phases has found success in improving overall program performance and energy efficiency. These performance/energy savings are limited by the granularity that program phases are detected since phases that occur at a finer granularity goes undetected and reconfiguration opportunities are missed. In this study, we detect program phases using interval sizes on the order of tens, hundreds, and thousands of program cycles. This is in stark contrast with prior phase detection studies where the interval size is on the order of several thousands to millions of cycles. The primary goal of this study is to begin to fill a gap in the literature on phase detection by characterizing super fine-grained program phases and demonstrating an application where detection of these relatively short-lived phases can be instrumental. Traditional models for phase detection including basic block vectors and working set signatures are used to detect super fine-grained phases as well as a less traditional model based on microprocessor activity. Finally, we show an analytical case study where super fine-grained phases are applied to voltage and frequency scaling optimizations

Columbia University Academic Commons

Efficient and scalable scheduling for performance heterogeneous multicore systems

Author: Pengcheng Nie
Zhenhua Duan
Publication venue
Publication date: 01/01/2012
Field of study

a b s t r a c t Performance heterogeneous multicore processors (HMP for brevity) consisting of multiple cores with the same instruction set but different performance characteristics (e.g., clock speed, issue width), are of great concern since they are able to deliver higher performance per watt and area for programs with diverse architectural requirements than comparable homogeneous ones. However, such power and area efficiencies of performance heterogeneous multicore systems can only be achieved when workloads are matched with cores according to both the properties of the workload and the features of the cores. Several heterogeneity-aware schedulers were proposed in the previous work. In terms of whether properties of workloads are obtained online or not, those scheduling algorithms can be categorized into two classes: online monitoring and offline profiling. The previous online monitoring approaches had to trace threads' execution on all core types, which is impractical as the number of core types grows. Besides, to trace all core types threads have to be migrated among cores, which may cause load imbalance and degrade the performance. The existing offline profiling approaches profile programs with a given input set before really executing them and thus remove the overhead associated with the number of core types. However, offline profiling approaches do not account for phase changes of threads. Moreover, since the properties they have collected are based on the given input set, those offline profiling approaches are hard to adapt to various input sets and therefore will drastically affect the program performance. To address the above problems in the existing approaches, we propose a new technique, ASTPI (Average Stall Time Per Instruction), to measure the efficiencies of threads in using fast cores. We design, implement and evaluate a new online monitoring approach called ESHMP, which is based on the metric. Our evaluation in the Linux 2.6.21 operating system shows that ESHMP delivers scalability while adapting to a wide variety of applications. Also, our experiment results show that among HMP systems in which heterogeneity-aware schedulers are adopted and there are more than one LLC (Last Level Cache), the architecture where heterogeneous cores share LLCs gain better performance than the ones where homogeneous cores share LLCs

CiteSeerX

Power, Performance, and Energy Management of Heterogeneous Architectures

Author
Publication venue
Publication date: 01/01/2019
Field of study

abstract: Many core modern multiprocessor systems-on-chip offers tremendous power and performance optimization opportunities by tuning thousands of potential voltage, frequency and core configurations. Applications running on these architectures are becoming increasingly complex. As the basic building blocks, which make up the application, change during runtime, different configurations may become optimal with respect to power, performance or other metrics. Identifying the optimal configuration at runtime is a daunting task due to a large number of workloads and configurations. Therefore, there is a strong need to evaluate the metrics of interest as a function of the supported configurations. This thesis focuses on two different types of modern multiprocessor systems-on-chip (SoC): Mobile heterogeneous systems and tile based Intel Xeon Phi architecture. For mobile heterogeneous systems, this thesis presents a novel methodology that can accurately instrument different types of applications with specific performance monitoring calls. These calls provide a rich set of performance statistics at a basic block level while the application runs on the target platform. The target architecture used for this work (Odroid XU3) is capable of running at 4940 different frequency and core combinations. With the help of instrumented application vast amount of characterization data is collected that provides details about performance, power and CPU state at every instrumented basic block across 19 different types of applications. The vast amount of data collected has enabled two runtime schemes. The first work provides a methodology to find optimal configurations in heterogeneous architecture using classifiers and demonstrates an average increase of 93%, 81% and 6% in performance per watt compared to the interactive, ondemand and powersave governors, respectively. The second work using same data shows a novel imitation learning framework for dynamically controlling the type, number, and the frequencies of active cores to achieve an average of 109% PPW improvement compared to the default governors. This work also presents how to accurately profile tile based Intel Xeon Phi architecture while training different types of neural networks using open image dataset on deep learning framework. The data collected allows deep exploratory analysis. It also showcases how different hardware parameters affect performance of Xeon Phi.Dissertation/ThesisMasters Thesis Engineering 201

ASU Digital Repository

On the feasibility of online malware detection with performance counters

Author: Adam Waksman
Adrian Tang
Jared Schmitz
John Demme
Matthew Maycock
Salvatore Stolfo
Simha Sethumadhavan
Stone-Gross B.
Xia Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref