724 research outputs found

    Ananke: A Q-Learning-Based Portfolio Scheduler for Complex Industrial Workflows

    Full text link

    Quality of Service Driven Runtime Resource Allocation in Reconfigurable HPC Architectures

    Get PDF
    Heterogeneous System Architectures (HSA) are gaining importance in the High Performance Computing (HPC) domain due to increasing computational requirements coupled with energy consumption concerns, which conventional CPU architectures fail to effectively address. Systems based on Field Programmable Gate Array (FPGA) recently emerged as an effective alternative to Graphical Processing Units (GPUs) for demanding HPC applications, although they lack the abstractions available in conventional CPU-based systems. This work tackles the problem of runtime resource management of a system using FPGA-based co-processors to accelerate multi-programmed HPC workloads. We propose a novel resource manager able to dynamically vary the number of FPGAs allocated to each of the jobs running in a multi-accelerator system, with the goal of meeting a given Quality of Service metric for the running jobs measured in terms of deadline or throughput. We implement the proposed resource manager in a commercial HPC system, evaluating its behavior with representative workloads

    Portfolio peak algorithms achieving superior performance for maximizing throughput in WiMAX networks

    Get PDF
    The Mobile WiMAX IEEE 802.16 standards ensure provision of last mile wireless access, variable and high data rate, point to multi-point communication, large frequency range and QoS (Quality of Service) for various types of applications. The WiMAX standards are published by the Institute of Electric and Electronic Engineers (IEEE) and specify the standards of services and transmissions. However, the way how to run these services and when the transmission should be started are not specified in the IEEE standards and it is up to computer scientists to design scheduling algorithms that can best meet the standards. Finding the best way to implement the WiMAX standards through designing efficient scheduler algorithms is a very important component in wireless systems and the scheduling period presents the most common challenging issue in terms of throughput and time delay. The aim of the research presented in this thesis was to design and develop an efficient scheduling algorithm to provide the QoS support for real-time and non-real-time services with the WiMAX Network. This was achieved by combining a portfolio of algorithms, which will control and update transmission with the required algorithm by the various portfolios for supporting QoS such as; the guarantee of a maximum throughput for real-time and non-real-time traffic. Two algorithms were designed in this process and will be discussed in this thesis: Fixed Portfolio Algorithms and Portfolio Peak Algorithm. In order to evaluate the proposed algorithms and test their efficiency for IEEE 802.16 networks, the authors simulated the algorithms in the NS2 simulator. Evaluation of the proposed Portfolio algorithms was carried out through comparing its performance with those of the conventional algorithms. On the other hand, the proposed Portfolio scheduling algorithm was evaluated by comparing its performance in terms of throughput, delay, and jitter. The simulation results suggest that the Fixed Portfolio Algorithms and the Portfolio Peak Algorithm achieve higher performance in terms of throughput than all other algorithms. Keywords: WiMAX, IEEE802.16, QoS, Scheduling Algorithms, Fixed Portfolio Algorithms, and Portfolio Peak Algorithms.The Mobile WiMAX IEEE 802.16 standards ensure provision of last mile wireless access, variable and high data rate, point to multi-point communication, large frequency range and QoS (Quality of Service) for various types of applications. The WiMAX standards are published by the Institute of Electric and Electronic Engineers (IEEE) and specify the standards of services and transmissions. However, the way how to run these services and when the transmission should be started are not specified in the IEEE standards and it is up to computer scientists to design scheduling algorithms that can best meet the standards. Finding the best way to implement the WiMAX standards through designing efficient scheduler algorithms is a very important component in wireless systems and the scheduling period presents the most common challenging issue in terms of throughput and time delay. The aim of the research presented in this thesis was to design and develop an efficient scheduling algorithm to provide the QoS support for real-time and non-real-time services with the WiMAX Network. This was achieved by combining a portfolio of algorithms, which will control and update transmission with the required algorithm by the various portfolios for supporting QoS such as; the guarantee of a maximum throughput for real-time and non-real-time traffic. Two algorithms were designed in this process and will be discussed in this thesis: Fixed Portfolio Algorithms and Portfolio Peak Algorithm. In order to evaluate the proposed algorithms and test their efficiency for IEEE 802.16 networks, the authors simulated the algorithms in the NS2 simulator. Evaluation of the proposed Portfolio algorithms was carried out through comparing its performance with those of the conventional algorithms. On the other hand, the proposed Portfolio scheduling algorithm was evaluated by comparing its performance in terms of throughput, delay, and jitter. The simulation results suggest that the Fixed Portfolio Algorithms and the Portfolio Peak Algorithm achieve higher performance in terms of throughput than all other algorithms. Keywords: WiMAX, IEEE802.16, QoS, Scheduling Algorithms, Fixed Portfolio Algorithms, and Portfolio Peak Algorithms

    Purple Computational Environment With Mappings to ACE Requirements for the General Availability User Environment Capabilities

    Full text link

    Towards resource-aware computing for task-based runtimes and parallel architectures

    Get PDF
    Current large scale systems show increasing power demands, to the point that it has become a huge strain on facilities and budgets. The increasing restrictions in terms of power consumption of High Performance Computing (HPC) systems and data centers have forced hardware vendors to include power capping capabilities in their commodity processors. Power capping opens up new opportunities for applications to directly manage their power behavior at user level. However, constraining power consumption causes the individual sockets of a parallel system to deliver different performance levels under the same power cap, even when they are equally designed, which is an effect caused by manufacturing variability. Modern chips suffer from heterogeneous power consumption due to manufacturing issues, a problem known as manufacturing or process variability. As a result, systems that do not consider such variability caused by manufacturing issues lead to performance degradations and wasted power. In order to avoid such negative impact, users and system administrators must actively counteract any manufacturing variability. In this thesis we show that parallel systems benefit from taking into account the consequences of manufacturing variability, in terms of both performance and energy efficiency. In order to evaluate our work we have also implemented our own task-based version of the PARSEC benchmark suite. This allows to test our methodology using state-of-the-art parallelization techniques and real world workloads. We present two approaches to mitigate manufacturing variability, by power redistribution at runtime level and by power- and variability-aware job scheduling at system-wide level. A parallel runtime system can be used to effectively deal with this new kind of performance heterogeneity by compensating the uneven effects of power capping. In the context of a NUMA node composed of several multi core sockets, our system is able to optimize the energy and concurrency levels assigned to each socket to maximize performance. Applied transparently within the parallel runtime system, it does not require any programmer interaction like changing the application source code or manually reconfiguring the parallel system. We compare our novel runtime analysis with an offline approach and demonstrate that it can achieve equal performance at a fraction of the cost. The next approach presented in this theis, we show that it is possible to predict the impact of this variability on specific applications by using variability-aware power prediction models. Based on these power models, we propose two job scheduling policies that consider the effects of manufacturing variability for each application and that ensures that power consumption stays under a system wide power budget. We evaluate our policies under different power budgets and traffic scenarios, consisting of both single- and multi-node parallel applications.Los sistemas modernos de gran escala muestran crecientes demandas de energía, hasta el punto de que se ha convertido en una gran presión para las instalaciones y los presupuestos. Las restricciones crecientes de consumo de energía de los sistemas de alto rendimiento (HPC) y los centros de datos han obligado a los proveedores de hardware a incluir capacidades de limitación de energía en sus procesadores. La limitación de energía abre nuevas oportunidades para que las aplicaciones administren directamente su comportamiento de energía a nivel de usuario. Sin embargo, la restricción en el consumo de energía de sockets individuales de un sistema paralelo resulta en diferentes niveles de rendimiento, por el mismo límite de potencia, incluso cuando están diseñados por igual. Esto es un efecto causado durante el proceso de la fabricación. Los chips modernos sufren de un consumo de energía heterogéneo debido a problemas de fabricación, un problema conocido como variabilidad del proceso o fabricación. Como resultado, los sistemas que no consideran este tipo de variabilidad causada por problemas de fabricación conducen a degradaciones del rendimiento y desperdicio de energía. Para evitar dicho impacto negativo, los usuarios y administradores del sistema deben contrarrestar activamente cualquier variabilidad de fabricación. En esta tesis, demostramos que los sistemas paralelos se benefician de tener en cuenta las consecuencias de la variabilidad de la fabricación, tanto en términos de rendimiento como de eficiencia energética. Para evaluar nuestro trabajo, también hemos implementado nuestra propia versión del paquete de aplicaciones de prueba PARSEC, basada en tareas paralelos. Esto permite probar nuestra metodología utilizando técnicas avanzadas de paralelización con cargas de trabajo del mundo real. Presentamos dos enfoques para mitigar la variabilidad de fabricación, mediante la redistribución de la energía a durante la ejecución de las aplicaciones y mediante la programación de trabajos a nivel de todo el sistema. Se puede utilizar un sistema runtime paralelo para tratar con eficacia este nuevo tipo de heterogeneidad de rendimiento, compensando los efectos desiguales de la limitación de potencia. En el contexto de un nodo NUMA compuesto de varios sockets y núcleos, nuestro sistema puede optimizar los niveles de energía y concurrencia asignados a cada socket para maximizar el rendimiento. Aplicado de manera transparente dentro del sistema runtime paralelo, no requiere ninguna interacción del programador como cambiar el código fuente de la aplicación o reconfigurar manualmente el sistema paralelo. Comparamos nuestro novedoso análisis de runtime con los resultados óptimos, obtenidos de una análisis manual exhaustiva, y demostramos que puede lograr el mismo rendimiento a una fracción del costo. El siguiente enfoque presentado en esta tesis, muestra que es posible predecir el impacto de la variabilidad de fabricación en aplicaciones específicas mediante el uso de modelos de predicción de potencia conscientes de la variabilidad. Basados ​​en estos modelos de predicción de energía, proponemos dos políticas de programación de trabajos que consideran los efectos de la variabilidad de fabricación para cada aplicación y que aseguran que el consumo se mantiene bajo un presupuesto de energía de todo el sistema. Evaluamos nuestras políticas con diferentes presupuestos de energía y escenarios de tráfico, que consisten en aplicaciones paralelas que corren en uno o varios nodos.Postprint (published version
    corecore