2,019 research outputs found

    A Survey of Prediction and Classification Techniques in Multicore Processor Systems

    Get PDF
    In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems

    Exploring performance and power properties of modern multicore chips via simple machine models

    Full text link
    Modern multicore chips show complex behavior with respect to performance and power. Starting with the Intel Sandy Bridge processor, it has become possible to directly measure the power dissipation of a CPU chip and correlate this data with the performance properties of the running code. Going beyond a simple bottleneck analysis, we employ the recently published Execution-Cache-Memory (ECM) model to describe the single- and multi-core performance of streaming kernels. The model refines the well-known roofline model, since it can predict the scaling and the saturation behavior of bandwidth-limited loop kernels on a multicore chip. The saturation point is especially relevant for considerations of energy consumption. From power dissipation measurements of benchmark programs with vastly different requirements to the hardware, we derive a simple, phenomenological power model for the Sandy Bridge processor. Together with the ECM model, we are able to explain many peculiarities in the performance and power behavior of multicore processors, and derive guidelines for energy-efficient execution of parallel programs. Finally, we show that the ECM and power models can be successfully used to describe the scaling and power behavior of a lattice-Boltzmann flow solver code.Comment: 23 pages, 10 figures. Typos corrected, DOI adde

    Dynamic cache reconfiguration based techniques for improving cache energy efficiency

    Get PDF
    Modern multicore processors are employing large last-level caches, for example Intel's E7-8800 processor uses 24MB L3 cache. Further, with each CMOS technology generation, leakage energy has been dramatically increasing and hence, leakage energy is expected to become a major source of energy dissipation, especially in last-level caches (LLCs). The conventional schemes of cache energy saving either aim at saving dynamic energy or are based on properties specific to first-level caches, and thus these schemes have limited utility for last-level caches. Further, several other techniques require offline profiling or per-application tuning and hence are not suitable for product systems. In this research, we propose novel cache leakage energy saving schemes for single-core and multicore systems; desktop, QoS, real-time and server systems. We propose software-controlled, hardware-assisted techniques which use dynamic cache reconfiguration to configure the cache to the most energy efficient configuration while keeping the performance loss bounded. To profile and test a large number of potential configurations, we utilize low-overhead, micro-architecture components, which can be easily integrated into modern processor chips. We adopt a system-wide approach to save energy to ensure that cache reconfiguration does not increase energy consumption of other components of the processor. We have compared our techniques with the state-of-art techniques and have found that our techniques outperform them in their energy efficiency. This research has important applications in improving energy-efficiency of higher-end embedded, desktop, server processors and multitasking systems. We have also proposed performance estimation approach for efficient design space exploration and have implemented time-sampling based simulation acceleration approach for full-system architectural simulators.Comment: PhD thesis, dynamic cache reconfiguratio

    Study, analysis and new scheduling proposals in partitioned real-time systems

    Full text link
    [ES] En nuestra vida cotidiana, cada vez más ordenadores controlan nuestro entorno: teléfonos móviles, procesos industriales, asistencia a la conducción, etc. Todos estos sistemas presentan requisitos estrictos para garantizar un comportamiento adecuado. En muchos de estos sistemas, cumplir con las restricciones de tiempo es un factor tan importante como el resultado lógico de los cálculos. Desde hace aproximadamente 40 años, los sistemas en tiempo real son muy atractivos en el campo de la computación y hoy en día se aplican en áreas de gran alcance como aplicaciones industriales, aplicaciones aeroespaciales, telecomunicaciones, electrónica de consumo, etc. Algunos retos a abordar en el campo del tiempo real son el determinismo y la predecibilidad del comportamiento temporal del sistema. En este sentido, garantizar la ejecución del programa y los tiempos de respuesta del sistema son requisitos esenciales que deben cumplirse estrictamente a través de estrategias apropiadas de planificación de tareas. Además, las arquitecturas multiprocesador se están volviendo más populares debido al hecho de que las capacidades de procesamiento y los recursos computacionales de los sistemas están aumentando. Un estudio reciente estima que existe una tendencia creciente entre las arquitecturas multiprocesador a combinar diferentes niveles de criticidad en el mismo sistema. En este sentido, proporcionar aislamiento entre las aplicaciones es extremadamente necesario. La tecnología particionada es capaz de lidiar con este propósito. Además, la gestión de la energía es un problema relevante en los sistemas en tiempo real. Muchos sistemas empotrados de tiempo real, como dispositivos portátiles o robots móviles que requieren baterías, buscan encontrar técnicas que reduzcan el consumo de energía y, como consecuencia, aumenten la vida útil de sus baterías. También se obtienen claros beneficios operativos, financieros, monetarios y ambientales al minimizar el consumo de energía. Con todo ello, este trabajo aborda el problema de planificabilidad y contribuye al estudio de las nuevas técnicas de planificación en sistemas particionados de tiempo real. Estas técnicas proporcionan el tiempo mínimo para planificar de manera factible conjuntos de tareas. Además, se proponen técnicas de asignación para sistemas multiprocesador cuyo objetivo principal es reducir el consumo de energía del sistema global. Finalmente, se presentan los resultados obtenidos así como los trabajos futuros relacionados con este trabajo[CA] En la nostra vida quotidiana, cada vegada més ordenadors controlen el nostre entorn: telèfons mòbils, processos industrials, assistència a la conducció, etc. Tots aquests sistemes presenten requisits estrictes per a garantir un comportament adequat. En molts d' aquests sistemes, complir amb les restriccions de temps és un factor tan important com el resultat lògic dels càlculs. Des de fa aproximadament 40 anys, els sistemes en temps real són molt atractius en el camp de la computació i hui dia s' apliquen en àrees de gran abast com a aplicacions industrials, aplicacions aeroespacials, telecomunicacions, electrònica de consum, etc. Alguns reptes a abordar en el camp del temps real són el determinisme i la predictibilitat del comportament temporal del sistema. En aquest sentit, garantir l'execució del programa i els temps de resposta del sistema són requisits essencials que han de complir-se estrictament a través d'estratègies apropiades de planificació de tasques. A més, les arquitectures multiprocessador s'estan tornant més populars a causa del fet que les capacitats de processament i els recursos computacionals dels sistemes estan augmentant. Un estudi recent estima que existeix una tendència creixent entre les arquitectures multiprocessador a combinar diferents nivells de criticitat en el mateix sistema. En aquest sentit, proporcionar aïllament entre les aplicacions és extremadament necessari. La tecnologia particionada és capaç de bregar amb aquest propòsit. A més, la gestió de l'energia és un problema rellevant en els sistemes en temps real. Molts sistemes embebits de temps real, com a dispositius portàtils o robots mòbils que requereixen bateries, busquen trobar tècniques que reduïsquen el consum d'energia i, com a conseqüència, augmenten la vida útil de les seues bateries. També s'obtenen clars beneficis operatius, financers, monetaris i ambientals en minimitzar el consum d'energia. Amb tot això, aquest treball aborda el problema de planificabilitat i contribueix a l'estudi de les noves tècniques de planificació en sistemes particionats de temps real. Aquestes tècniques proporcionen el temps mínim per a planificar de manera factible conjunts de tasques. A més, es proposen tècniques d'assignació per a sistemes multiprocessador l'objectiu principal del qual és reduir el consum d'energia del sistema global. Finalment, es presenten els resultats obtinguts així com els treballs futurs relacionats amb aquest treball.[EN] In our everyday lives, more and more computers are controlling our environment: mobile phones, industrial processes, driving assistance, etc. All these systems present strict requirements to ensure proper behaviour. In many of these systems, the time at which the action is delivered is as important as the logical result of the computation. About 40 years ago, real-time systems began to attract attention in computing field and nowadays are applied in wide ranging areas as industrial applications, aerospace, telecommunication applications, consumer electronics, etc. Some real-time challenges that must be addressed are determinism and predictability of the temporal behaviour of the system. In this sense, to guarantee program execution and system response times are essential requirements that must be strictly met through appropriate task scheduling strategies. Furthermore, multiprocessor architectures are becoming more popular due to the fact that processing capabilities and computational resources are increasing. A recent study estimates that there is an increasing tendency among multiprocessor architectures to combine different levels of criticality in the same system. In this sense, to provide isolation between applications is extremely required. Partitioned technology is able to deal with this purpose. In addition, energy management is a relevant problem in real-time systems. Many real-time embedded systems, as wearable devices or mobile robots that require batteries, seek to find techniques that reduce the energy consumption and, as a consequence, increase the lifetime of their batteries. Also clear operational, financial, monetary and environmental gains are reached when minimizing energy consumption. Faced with all this, this work addresses the problem of schedulability and contributes to the study of new scheduling techniques in partitioned real-time systems. These techniques provide the minimum time to feasible schedule tasks sets. Moreover, allocation techniques for multicore systems whose main objective is to reduce the energy consumption of the overall system are also proposed. Finally, some of the obtained results are discussed as conclusions and future works are introduced.Guasque Ortega, A. (2019). Study, analysis and new scheduling proposals in partitioned real-time systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/135279TESI

    Smart Sensor Architectures for Multimedia Sensing in IoMT

    Full text link
    [EN] Today, a wide range of developments and paradigms require the use of embedded systems characterized by restrictions on their computing capacity, consumption, cost, and network connection. The evolution of the Internet of Things (IoT) towards Industrial IoT (IIoT) or the Internet of Multimedia Things (IoMT), its impact within the 4.0 industry, the evolution of cloud computing towards edge or fog computing, also called near-sensor computing, or the increase in the use of embedded vision, are current examples of this trend. One of the most common methods of reducing energy consumption is the use of processor frequency scaling, based on a particular policy. The algorithms to define this policy are intended to obtain good responses to the workloads that occur in smarthphones. There has been no study that allows a correct definition of these algorithms for workloads such as those expected in the above scenarios. This paper presents a method to determine the operating parameters of the dynamic governor algorithm called Interactive, which offers significant improvements in power consumption, without reducing the performance of the application. These improvements depend on the load that the system has to support, so the results are evaluated against three different loads, from higher to lower, showing improvements ranging from 62% to 26%.This work has been supported by the MCyU (Spanish Ministry of Science and Universities) under the project ATLAS (PGC2018-094151-B-I00), which is partially funded by AEI, FEDER and EU.Silvestre-Blanes, J.; Sempere Paya, VM.; Albero Albero, T. (2020). Smart Sensor Architectures for Multimedia Sensing in IoMT. Sensors. 20(5):1-16. https://doi.org/10.3390/s20051400S116205Bangemann, T., Riedl, M., Thron, M., & Diedrich, C. (2016). Integration of Classical Components Into Industrial Cyber–Physical Systems. Proceedings of the IEEE, 104(5), 947-959. doi:10.1109/jproc.2015.2510981Wollschlaeger, M., Sauter, T., & Jasperneite, J. (2017). The Future of Industrial Communication: Automation Networks in the Era of the Internet of Things and Industry 4.0. IEEE Industrial Electronics Magazine, 11(1), 17-27. doi:10.1109/mie.2017.2649104Salehi, M., & Ejlali, A. (2015). A Hardware Platform for Evaluating Low-Energy Multiprocessor Embedded Systems Based on COTS Devices. IEEE Transactions on Industrial Electronics, 62(2), 1262-1269. doi:10.1109/tie.2014.2352215Alvi, S. A., Afzal, B., Shah, G. A., Atzori, L., & Mahmood, W. (2015). Internet of multimedia things: Vision and challenges. Ad Hoc Networks, 33, 87-111. doi:10.1016/j.adhoc.2015.04.006Jridi, M., Chapel, T., Dorez, V., Le Bougeant, G., & Le Botlan, A. (2018). SoC-Based Edge Computing Gateway in the Context of the Internet of Multimedia Things: Experimental Platform. Journal of Low Power Electronics and Applications, 8(1), 1. doi:10.3390/jlpea8010001Memos, V. A., Psannis, K. E., Ishibashi, Y., Kim, B.-G., & Gupta, B. B. (2018). An Efficient Algorithm for Media-based Surveillance System (EAMSuS) in IoT Smart City Framework. Future Generation Computer Systems, 83, 619-628. doi:10.1016/j.future.2017.04.039Chianese, A., Piccialli, F., & Riccio, G. (2015). Designing a Smart Multisensor Framework Based on Beaglebone Black Board. Lecture Notes in Electrical Engineering, 391-397. doi:10.1007/978-3-662-45402-2_60Wang, W., Wang, Q., & Sohraby, K. (2016). Multimedia Sensing as a Service (MSaaS): Exploring Resource Saving Potentials of at Cloud-Edge IoTs and Fogs. IEEE Internet of Things Journal, 1-1. doi:10.1109/jiot.2016.2578722Munir, A., Gordon-Ross, A., & Ranka, S. (2014). Multi-Core Embedded Wireless Sensor Networks: Architecture and Applications. IEEE Transactions on Parallel and Distributed Systems, 25(6), 1553-1562. doi:10.1109/tpds.2013.219Baali, H., Djelouat, H., Amira, A., & Bensaali, F. (2018). Empowering Technology Enabled Care Using IoT and Smart Devices: A Review. IEEE Sensors Journal, 18(5), 1790-1809. doi:10.1109/jsen.2017.2786301Kim, Y. G., Kong, J., & Chung, S. W. (2018). A Survey on Recent OS-Level Energy Management Techniques for Mobile Processing Units. IEEE Transactions on Parallel and Distributed Systems, 29(10), 2388-2401. doi:10.1109/tpds.2018.2822683Chaib Draa, I., Niar, S., Tayeb, J., Grislin, E., & Desertot, M. (2016). Sensing user context and habits for run-time energy optimization. EURASIP Journal on Embedded Systems, 2017(1). doi:10.1186/s13639-016-0036-8Chen, Y.-L., Chang, M.-F., Yu, C.-W., Chen, X.-Z., & Liang, W.-Y. (2018). Learning-Directed Dynamic Voltage and Frequency Scaling Scheme with Adjustable Performance for Single-Core and Multi-Core Embedded and Mobile Systems. Sensors, 18(9), 3068. doi:10.3390/s18093068Tamilselvan, K., & Thangaraj, P. (2020). Pods – A novel intelligent energy efficient and dynamic frequency scalings for multi-core embedded architectures in an IoT environment. Microprocessors and Microsystems, 72, 102907. doi:10.1016/j.micpro.2019.10290

    Dynamic Energy Management for Chip Multi-processors under Performance Constraints

    Get PDF
    We introduce a novel algorithm for dynamic energy management (DEM) under performance constraints in chip multi-processors (CMPs). Using the novel concept of delayed instructions count, performance loss estimations are calculated at the end of each control period for each core. In addition, a Kalman filtering based approach is employed to predict workload in the next control period for which voltage-frequency pairs must be selected. This selection is done with a novel dynamic voltage and frequency scaling (DVFS) algorithm whose objective is to reduce energy consumption but without degrading performance beyond the user set threshold. Using our customized Sniper based CMP system simulation framework, we demonstrate the effectiveness of the proposed algorithm for a variety of benchmarks for 16 core and 64 core network-on-chip based CMP architectures. Simulation results show consistent energy savings across the board. We present our work as an investigation of the tradeoff between the achievable energy reduction via DVFS when predictions are done using the effective Kalman filter for different performance penalty thresholds

    Microarchitectural techniques to reduce energy consumption in the memory hierarchy

    Get PDF
    This thesis states that dynamic profiling of the memory reference stream can improve energy and performance in the memory hierarchy. The research presented in this theses provides multiple instances of using lightweight hardware structures to profile the memory reference stream. The objective of this research is to develop microarchitectural techniques to reduce energy consumption at different levels of the memory hierarchy. Several simple and implementable techniques were developed as a part of this research. One of the techniques identifies and eliminates redundant refresh operations in DRAM and reduces DRAM refresh power. Another, reduces leakage energy in L2 and higher level caches for multiprocessor systems. The emphasis of this research has been to develop several techniques of obtaining energy savings in caches using a simple hardware structure called the counting Bloom filter (CBF). CBFs have been used to predict L2 cache misses and obtain energy savings by not accessing the L2 cache on a predicted miss. A simple extension of this technique allows CBFs to do way-estimation of set associative caches to reduce energy in cache lookups. Another technique using CBFs track addresses in a Virtual Cache and reduce false synonym lookups. Finally this thesis presents a technique to reduce dynamic power consumption in level one caches using significance compression. The significant energy and performance improvements demonstrated by the techniques presented in this thesis suggest that this work will be of great value for designing memory hierarchies of future computing platforms.Ph.D.Committee Chair: Lee, Hsien-Hsin S.; Committee Member: Cahtterjee,Abhijit; Committee Member: Mukhopadhyay, Saibal; Committee Member: Pande, Santosh; Committee Member: Yalamanchili, Sudhaka
    • …
    corecore