3 research outputs found

    XSEDE: eXtreme Science and Engineering Discovery Environment Third Quarter 2012 Report

    Get PDF
    The Extreme Science and Engineering Discovery Environment (XSEDE) is the most advanced, powerful, and robust collection of integrated digital resources and services in the world. It is an integrated cyberinfrastructure ecosystem with singular interfaces for allocations, support, and other key services that researchers can use to interactively share computing resources, data, and expertise.This a report of project activities and highlights from the third quarter of 2012.National Science Foundation, OCI-105357

    New cross-layer techniques for multi-criteria scheduling in large-scale systems

    Get PDF
    The global ecosystem of information technology (IT) is in transition to a new generation of applications that require more and more intensive data acquisition, processing and storage systems. As a result of that change towards data intensive computing, there is a growing overlap between high performance computing (HPC) and Big Data techniques in applications, since many HPC applications produce large volumes of data, and Big Data needs HPC capabilities. The hypothesis of this PhD. thesis is that the potential interoperability and convergence of the HPC and Big Data systems are crucial for the future, being essential the unification of both paradigms to address a broad spectrum of research domains. For this reason, the main objective of this Phd. thesis is purposing and developing a monitoring system to allow the HPC and Big Data convergence, thanks to giving information about behaviors of applications in a system which execute both kind of them, giving information to improve scalability, data locality, and to allow adaptability to large scale computers. To achieve this goal, this work is focused on the design of resource monitoring and discovery to exploit parallelism at all levels. These collected data are disseminated to facilitate global improvements at the whole system, and, thus, avoid mismatches between layers. The result is a two-level monitoring framework (both at node and application level) with a low computational load, scalable, and that can communicate with different modules thanks to an API provided for this purpose. All data collected is disseminated to facilitate the implementation of improvements globally throughout the system, and thus avoid mismatches between layers, which combined with the techniques applied to deal with fault tolerance, makes the system robust and with high availability. On the other hand, the developed framework includes a task scheduler capable of managing the launch of applications, their migration between nodes, as well as the possibility of dynamically increasing or decreasing the number of processes. All these thanks to the cooperation with other modules that are integrated into LIMITLESS, and whose objective is to optimize the execution of a stack of applications based on multi-criteria policies. This scheduling mode is called coarse-grain scheduling based on monitoring. For better performance and in order to further reduce the overhead during the monitorization, different optimizations have been applied at different levels to try to reduce communications between components, while trying to avoid the loss of information. To achieve this objective, data filtering techniques, Machine Learning (ML) algorithms, and Neural Networks (NN) have been used. In order to improve the scheduling process and to design new multi-criteria scheduling policies, the monitoring information has been combined with other ML algorithms to identify (through classification algorithms) the applications and their execution phases, doing offline profiling. Thanks to this feature, LIMITLESS can detect which phase is executing an application and tries to share the computational resources with other applications that are compatible (there is no performance degradation between them when both are running at the same time). This feature is called fine-grain scheduling, and can reduce the makespan of the use cases while makes efficient use of the computational resources that other applications do not use.El ecosistema global de las tecnolog铆as de la informaci贸n (IT) se encuentra en transici贸n a una nueva generaci贸n de aplicaciones que requieren sistemas de adquisici贸n de datos, procesamiento y almacenamiento cada vez m谩s intensivo. Como resultado de ese cambio hacia la computaci贸n intensiva de datos, existe una superposici贸n, cada vez mayor, entre la computaci贸n de alto rendimiento (HPC) y las t茅cnicas Big Data en las aplicaciones, pues muchas aplicaciones HPC producen grandes vol煤menes de datos, y Big Data necesita capacidades HPC. La hip贸tesis de esta tesis es que hay un gran potencial en la interoperabilidad y convergencia de los sistemas HPC y Big Data, siendo crucial para el futuro tratar una unificaci贸n de ambos para hacer frente a un amplio espectro de problemas de investigaci贸n. Por lo tanto, el objetivo principal de esta tesis es la propuesta y desarrollo de un sistema de monitorizaci贸n que facilite la convergencia de los paradigmas HPC y Big Data gracias a la provisi贸n de datos sobre el comportamiento de las aplicaciones en un entorno en el que se pueden ejecutar aplicaciones de ambos mundos, ofreciendo informaci贸n 煤til para mejorar la escalabilidad, la explotaci贸n de la localidad de datos y la adaptabilidad en los computadores de gran escala. Para lograr este objetivo, el foco se ha centrado en el dise帽o de mecanismos de monitorizaci贸n y localizaci贸n de recursos para explotar el paralelismo en todos los niveles de la pila del software. El resultado es un framework de monitorizaci贸n en dos niveles (tanto a nivel de nodo como de aplicaci贸n) con una baja carga computacional, escalable, y que se puede comunicar con distintos m贸dulos gracias a una API proporcionada para tal objetivo. Todos datos recolectados se difunden para facilitar la realizaci贸n de mejoras de manera global en todo el sistema, y as铆 evitar desajustes entre capas, lo que combinado con las t茅cnicas aplicadas para lidiar con la tolerancia a fallos, hace que el sistema sea robusto y con una alta disponibilidad. Por otro lado, el framework desarrollado incluye un planificador de tareas capaz de gestionar el lanzamiento de aplicaciones, la migraci贸n de las mismas entre nodos, adem谩s de la posibilidad de incrementar o disminuir su n煤mero de procesos de forma din谩mica. Todo ello gracias a la cooperaci贸n con otros m贸dulos que se integran en LIMITLESS, y cuyo objetivo es optimizar la ejecuci贸n de una pila de aplicaciones en base a pol铆ticas multicriterio. Esta funcionalidad se llama planificaci贸n de grano grueso. Para un mejor desempe帽o y con el objetivo de reducir m谩s a煤n la carga durante la ejecuci贸n, se han aplicado distintas optimizaciones en distintos niveles para tratar de reducir las comunicaciones entre componentes, a la vez que se trata de evitar la p茅rdida de informaci贸n. Para lograr este objetivo se ha hecho uso de t茅cnicas de filtrado de datos, algoritmos de Machine Learning (ML), y Redes Neuronales (NN). Finalmente, para obtener mejores resultados en la planificaci贸n de aplicaciones y para dise帽ar nuevas pol铆ticas de planificaci贸n multi-criterio, los datos de monitorizaci贸n recolectados han sido combinados con nuevos algoritmos de ML para identificar (por medio de algoritmos de clasificaci贸n) aplicaciones y sus fases de ejecuci贸n. Todo ello realizando tareas de profiling offline. Gracias a estas t茅cnicas, LIMITLESS puede detectar en qu茅 fase de su ejecuci贸n se encuentra una determinada aplicaci贸n e intentar compartir los recursos de computacionales con otras aplicaciones que sean compatibles (no se produce una degradaci贸n del rendimiento entre ellas cuando ambas se ejecutan a la vez en el mismo nodo). Esta funcionalidad se llama planificaci贸n de grano fino y puede reducir el tiempo total de ejecuci贸n de la pila de aplicaciones en los casos de uso porque realiza un uso m谩s eficiente de los recursos de las m谩quinas.This PhD dissertation has been partially supported by the Spanish Ministry of Science and Innovation under an FPI fellowship associated to a National Project with reference TIN2016-79637-P (from July 1, 2018 to October 10, 2021)Programa de Doctorado en Ciencia y Tecnolog铆a Inform谩tica por la Universidad Carlos III de MadridPresidente: F茅lix Garc铆a Carballeira.- Secretario: Pedro 脕ngel Cuenca Castillo.- Vocal: Mar铆a Cristina V. Marinesc

    Power Bounded Computing on Current & Emerging HPC Systems

    Get PDF
    Power has become a critical constraint for the evolution of large scale High Performance Computing (HPC) systems and commercial data centers. This constraint spans almost every level of computing technologies, from IC chips all the way up to data centers due to physical, technical, and economic reasons. To cope with this reality, it is necessary to understand how available or permissible power impacts the design and performance of emergent computer systems. For this reason, we propose power bounded computing and corresponding technologies to optimize performance on HPC systems with limited power budgets. We have multiple research objectives in this dissertation. They center on the understanding of the interaction between performance, power bounds, and a hierarchical power management strategy. First, we develop heuristics and application aware power allocation methods to improve application performance on a single node. Second, we develop algorithms to coordinate power across nodes and components based on application characteristic and power budget on a cluster. Third, we investigate performance interference induced by hardware and power contentions, and propose a contention aware job scheduling to maximize system throughput under given power budgets for node sharing system. Fourth, we extend to GPU-accelerated systems and workloads and develop an online dynamic performance & power approach to meet both performance requirement and power efficiency. Power bounded computing improves performance scalability and power efficiency and decreases operation costs of HPC systems and data centers. This dissertation opens up several new ways for research in power bounded computing to address the power challenges in HPC systems. The proposed power and resource management techniques provide new directions and guidelines to green exscale computing and other computing systems
    corecore