Search CORE

13 research outputs found

Evaluación y optimización de rendimiento y consumo energético de aplicaciones paralelas a nivel de tareas sobre arquitecturas asimétricas

Author: Costero Valero Luis María
Publication venue
Publication date: 01/01/2016
Field of study

Las arquitecturas asimétricas, formadas por varios procesadores con el mismo repertorio de instrucciones pero distintas características de rendimiento y consumo, ofrecen muchas posibilidades de optimización del rendimiento y/o el consumo en la ejecución de aplicaciones paralelas. La planificación de tareas sobre dichas arquitecturas de forma que se aprovechen de manera eficiente los distintos recursos, es muy compleja y se suele abordar utilizando modelos de programación paralelos, que permiten al programador especificar el paralelismo de las tareas, y entornos de ejecución que explotan dinámicamente dicho paralelismo. En este trabajo hemos modificado uno de los planificadores de tareas más utilizados en la actualidad para intentar aprovechar todos los recursos al máximo, cuando el rendimiento así lo necesite, o para conseguir la mejor eficiencia energética posible, cuando el consumo sea más prioritario. También se ha utilizado una biblioteca desarrollada específicamente para la arquitectura asimétrica objeto de estudio en la Universidad de Texas, Austin. Para obtener el máximo rendimiento se han agrupado los núcleos del sistema en dos niveles: hay un cluster simétrico de núcleos virtuales idénticos, cada uno de los cuales está compuesto por un conjunto de núcleos asimétricos. El planificador de tareas asigna trabajo a los núcleos virtuales, de manera idéntica a como lo haría en un sistema multinúcleo simétrico, y la biblioteca se encarga de repartir el trabajo entre los núcleos asimétricos. El trabajo ha consistido en integrar dicha biblioteca con el planificador de tareas. Para mejorar la eficiencia energética se han incluido en el planificador de tareas políticas de explotación de los modos de bajo consumo de la arquitectura y también de apagado o no asignación de carga de trabajo a algunos de los núcleos, que se activan en tiempo de ejecución cuando se detecta que la aplicación no necesita todos los recursos disponibles en la arquitectura

Docta Complutense

Gestión de recursos energéticamente eficiente para aplicaciones paralelas basadas en tareas en entornos multi-aplicación

Author: Costero Valero Luis María
Publication venue: 'Universidad Complutense de Madrid (UCM)'
Publication date: 26/07/2021
Field of study

Tesis de la Universidad Complutense de Madrid, Facultad de Informática, leída el 28/01/2021The end of Dennard scaling, as well as the arrival of the post-Moore era, has meant a big change in the way performance and energy efficiency are achieved by modern processors. From a constant increase of the clock frequency as the main method to increase performance at the beginning of the 2000s, the increase in the number of cores inside processors running at relatively conservative frequencies has stabilised as the current trend to increase both performance and energy efficiency. The increase of the heterogeneity in the systems, both inside the processors comprising different types of cores (e.g., big LITTLE architectures) or adding specific compute units (like multimedia extensions), as well as in the platform by the addition of other specific compute units (like GPUs), offering different performance and energy-efficiency trade-offs. Together with the increase in the number of cores, the processor evolution has been accompanied by the addition of different techologies that allow processors to adapt dynamically to the changes in the environment and running aplications. Among others, techiniques like dynamic voltage and frequiency scaling, power capping or cache partitioning are widely used nowadays to increase the performance and/or energy-efficiency...El fin del escalado de Dennard, así como la llegada de la era post-Moore ha supuesto una gran revolución en la forma de obtener el rendimiento y eficiencia energética en los procesadores modernos. Desde un incremento constante en la frecuencia relativamente moderadas se ha impuesto como la tendencia actual para incrementar tanto el rendimiento como la eficiencia energética. El aumento del número de núcleos dentro del procesado ha venido acompañado en los últimos años por el aumento de la heterogeneidad en la plataforma, tanto dentro del procesador incorporando distintos tipos de núcleos en el mismo procesador (e.g., la arquitectura big.LITTLE) como añadiendo unidades de cómputo específicas (e.g., extensiones multimedia), como la incorporación de otros elementos de computo específicos, ofreciendo diferentes grados de rendimiento y eficiencia energética. La evolución de los procesadores no solo ha venido dictada por el aumento del número de núcleos, sino que ha venido acompañada por la incorporación de diferentes técnicas permitiendo la adaptación de las arquitecturas de forma dinámica al entorno así como a las aplicaciones en ejecución. Entre otras, técnicas como el escalado de frecuencia, la limitación de consumo o el particionado de la memoria caché son ampliamente utilizadas en la actualidad como métodos para incrementar el consumo y/o la eficiencia energética...Fac. de InformáticaTRUEunpu

Docta Complutense

Resource management for power-constrained HEVC transcoding using reinforcement learning

Author: Atienza David
Costero Valero Luis María
Iranfar Arman
Olcoz Herrero Katzalin
Zapater Marina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2020
Field of study

The advent of online video streaming applications and services along with the users' demand for high-quality contents require High Efficiency Video Coding (HEVC), which provides higher video quality and more compression at the cost of increased complexity. On one hand, HEVC exposes a set of dynamically tunable parameters to provide trade-offs among Quality-of-Service (QoS), performance, and power consumption of multi-core servers on the video providers' data center. On the other hand, resource management of modern multi-core servers is in charge of adapting system-level parameters, such as operating frequency and multithreading, to deal with concurrent applications and their requirements. Therefore, efficient multi-user HEVC streaming necessitates joint adaptation of application- and system-level parameters. Nonetheless, dealing with such a large and dynamic design space is challenging and difficult to address through conventional resource management strategies. Thus, in this work, we develop a multi-agent Reinforcement Learning framework to jointly adjust application- and system-level parameters at runtime to satisfy the QoS of multi-user HEVC streaming in power-constrained servers. In particular, the design space, composed of all design parameters, is split into smaller independent sub-spaces. Each design sub-space is assigned to a particular agent so that it can explore it faster, yet accurately. The benefits of our approach are revealed in terms of adaptability and quality (with up to to 4x improvements in terms of QoS when compared to a static resource management scheme), and learning time (6 x faster than an equivalent mono-agent implementation). Finally, we show that the power-capping techniques formulated outperform the hardware-based power capping with respect to quality

Docta Complutense

La gamificación en la educación universitaria: aplicación a asignaturas de programación

Author: Carballa Corredoira Boris
Costero Valero Luis María
Domenech Arellano Jesús Javier
García Baameiro Daniel
Gómez Martín Marco Antonio
Gómez Martín Pedro Pablo
Jacynycz García Viktor Shamel
Martí Oliet Narciso
Martín Sánchez Oscar
Ortega Mallén Yolanda
Segura Díaz Clara María
Verdejo López José Alberto
Publication venue
Publication date: 23/10/2019
Field of study

Informe sobre la experiencia de aplicar técnicas de gamificación en la asignatura “Estructura de Datos y Algoritmos”, obligatoria de 2º curso en los grados impartidos en la Facultad de Informática de la UCM

Docta Complutense

Ejecución y adaptación de trazas de juegos para la automatización de pruebas

Author: Costero Valero Luis María
Publication venue
Publication date: 01/01/2015
Field of study

El testing es una de las fases más importantes en el proceso de desarrollo de un videojuego ya que permite construir un juego de calidad y libre de errores. Sin embargo, características propias de un videojuego como el diseño de niveles, los comportamientos del juego frente a diferentes eventos o la experiencia del usuario no pueden ser comprobadas por herramientas y métodos de testing tradicional. En este trabajo se propone una nueva forma de testing que intenta solventar alguna de estas carencias. El método propuesto se basa en imitar los movimientos de un jugador experto previamente grabados. Estos movimientos grabados son adaptados a las diferentes modificaciones que se han podido realizar sobre un nivel del juego, con el objetivo de intentar superar el nivel. Para conseguir este objetivo se hace uso de la arquitectura con la que son diseñados los videojuegos, siendo capaces de capturar todos los eventos que ocurren sin necesidad de que el juego esté diseñado específicamente para este propósito. Gracias a esto, a parte de poder realizar la grabación y posterior reproducción de trazas, se consigue unos test muchos más completos, basados en la detección de acciones que ocurren durante el juego, y no solo en detectar si el nivel ha sido superado o no. El sistema propuesto ha sido implementado sobre un videojuego real llamadoTime & Space (Blázquez et al., 2013-2014), creado como proyecto de fin de máster del máster de Desarrollo de Videojuegos de la UCM

Docta Complutense

Dynamic power budget redistribution under a power cap on multi-application environments

Author: Costero Valero Luis María
Igual Peña Francisco Daniel
Olcoz Herrero Katzalin
Publication venue: Elsevier
Publication date: 20/03/2023
Field of study

We present a two-level implementation of an infrastructure that allows performance maximization under a power-cap on multi-application environments with minimal user intervention. At the application level, we integrate bar (Power Budget-Aware Runtime Scheduler) into existing task-based runtimes, e.g. OpenMP; bar implements combined software/hardware techniques (thread malleability and DVFS) to maximize the application performance without violating a granted power budget. At a higher level, we introduce barman (Power Budget-Aware Resource Manager), a system-wide software able to manage resources globally, gathering power needs of registered applications, and redistributing the available overall power budget across them. The combination and co-operative operation of both pieces of software yields performance and energy efficiency improvements on environments in which power capping is established globally, and also granted asymmetrically to different co-existing applications. This behaviour is demonstrated to be stable under different workloads (a selection of task-based scientific applications and PARSEC benchmarks are tested) and different levels of power capping.MCINComunidad de MadridDepto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEpu

Docta Complutense

An approach to automated videogame beta testing

Author: Costero Valero Luis María
Gómez Martín Pedro Pablo
Jennifer Hernández Bécares
Publication venue: Elsevier
Publication date: 22/08/2017
Field of study

Videogames developed in the 1970s and 1980s were modest programs created in a couple of months by a single person, who played the roles of designer, artist and programmer. Since then, videogames have evolved to become a multi-million dollar industry. Today, AAA game development involves hundreds of people working together over several years. Management and engineering requirements have changed at the same pace. Although many of the processes have been adapted over time, this is not quite true for quality assurance tasks, which are still done mainly manually by human beta testers due to the specific peculiarities of videogames. This paper presents an approach to automate this beta testing.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEpu

Docta Complutense

Reinforcement Learning-Based Joint Reliability and Performance Optimization for Hybrid-Cache Computing Servers

Author: Atienza Alonso David
Costero Valero Luis María
Huang Darong
Pahlevan Ali
Zapater Marina
Publication venue: IEEE
Publication date: 11/03/2022
Field of study

Computing servers play a key role in the development and process of emerging compute-intensive applications in recent years. However, they need to operate efficiently from an energy perspective viewpoint, while maximizing the performance and lifetime of the hottest server components (i.e., cores and cache). Previous methods focused on either improving energy efficiency by adopting new hybrid-cache architectures including the resistive random-access memory (RRAM) and static random-access memory (SRAM) at the hardware level, or exploring tradeoffs between lifetime limitation and performance of multicore processors under stable workloads conditions. Therefore, no work has so far proposed a co-optimization method with hybrid-cache-based server architectures for real-life dynamic scenarios taking into account scalability, performance, lifetime reliability, and energy efficiency at the same time. In this article, we first formulate a reliability model for the hybrid-cache architecture to enable precise lifetime reliability management and energy efficiency optimization. We also include the performance and energy overheads of cache switching, and optimize the benefits of hybrid-cache usage for better energy efficiency and performance. Then, we propose a runtime Q-learning-based reliability management and performance optimization approach for multicore microprocessors with the hybrid-cache architecture, jointly incorporated with a dynamic preemptive priority queue management method to improve the overall tasks’ performance by targeting to respect their end time limits. Experimental results show that our proposed method achieves up to 44% average performance (i.e., tasks execution time) improvement, while maintaining the whole system design lifetime longer than five years, when compared to the latest state-of-the-art energy efficiency optimization and reliability management methods for computing servers.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEpu

Docta Complutense

Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloads

Author: Atienza Alonso David
Costero Valero Luis María
Huang Darong
Liu Qunyou
Zapater Marina
Publication venue: ACM - Association for Computing Machinery
Publication date: 20/04/2024
Field of study

The increasing demand for computing power and the emergence of heterogeneous computing architectures have driven the exploration of innovative techniques to address current limitations in both the compute and memory subsystems. One such solution is the use of Accelerated Processing Units (APUs), processors that incorporate both a central processing unit (CPU) and an integrated graphics processing unit (iGPU). However, the performance of both APU and CPU systems can be significantly hampered by address translation overhead, leading to a decline in overall performance, especially for cache-resident workloads. To address this issue, we propose the introduction of a new intermediate address space (IAS) in both APU and CPU systems. IAS serves as a bridge between virtual address (VA) spaces and physical address (PA) spaces, optimizing the address translation process. In the case of APU systems, our research indicates that the iGPU suffers from significant translation look-aside buffer (TLB) misses in certain workload situations. Using an IAS, we can divide the initial address translation into front- and back-end phases, effectively shifting the bottleneck in address translation from the cache side to the memory controller side, a technique that proves to be effective for cache-resident workloads. Our simulations demonstrate that implementing IAS in the CPU system can boost performance by up to 40% compared to conventional CPU systems. Furthermore, we evaluate the effectiveness of APU systems, comparing the performance of IAS-based systems with traditional systems, showing up to a 185% improvement in APU system performance with our proposed IAS implementation. Furthermore, our analysis indicates that over 90% of TLB misses can be filtered by the cache, and employing a larger cache within the system could potentially result in even greater improvements. The proposed IAS offers a promising and practical solution to enhance the performance of both APU and CPU systems, contributing to state-of-the-art research in the field of computer architecture.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEpu

Docta Complutense