12,176 research outputs found

    Sintonización dinámica de aplicaciones MPI

    Get PDF
    En la actualidad, la computación de altas prestaciones está siendo utilizada en multitud de campos científicos donde los distintos problemas estudiados se resuelven mediante aplicaciones paralelas/distribuidas. Estas aplicaciones requieren gran capacidad de cómputo, bien sea por la complejidad de los problemas o por la necesidad de solventar situaciones en tiempo real. Por lo tanto se debe aprovechar los recursos y altas capacidades computacionales de los sistemas paralelos en los que se ejecutan estas aplicaciones con el fin de obtener un buen rendimiento. Sin embargo, lograr este rendimiento en una aplicación ejecutándose en un sistema es una dura tarea que requiere un alto grado de experiencia, especialmente cuando se trata de aplicaciones que presentan un comportamiento dinámico o cuando se usan sistemas heterogéneos. En estos casos actualmente se plantea realizar una mejora de rendimiento automática y dinámica de las aplicaciones como mejor enfoque para el análisis del rendimiento. El presente trabajo de investigación se sitúa dentro de este ámbito de estudio y su objetivo principal es sintonizar dinámicamente mediante MATE (Monitoring, Analysis and Tuning Environment) una aplicación MPI empleada en computación de altas prestaciones que siga un paradigma Master/Worker. Las técnicas de sintonización integradas en MATE han sido desarrolladas a partir del estudio de un modelo de rendimiento que refleja los cuellos de botella propios de aplicaciones situadas bajo un paradigma Master/Worker: balanceo de carga y número de workers. La ejecución de la aplicación elegida bajo el control dinámico de MATE y de la estrategia de sintonización implementada ha permitido observar la adaptación del comportamiento de dicha aplicación a las condiciones actuales del sistema donde se ejecuta, obteniendo así una mejora de su rendimiento.En l'actualitat, la computació d'altes prestacions està sent utilitzada en multitud de camps científics on els diferents problemes estudiats es resolen mitjançant aplicacions paral·leles/distribuïdes. Aquestes aplicacions requereixen gran capacitat de còmput, bé sigui per la complexitat dels problemes o per la necessitat de solucionar situacions en temps real. Per tant s'ha d'aprofitar els recursos i altes capacitats computacionals dels sistemes paral·lels en els quals s'executen aquestes aplicacions amb la finalitat d'obtenir un bon rendiment. No obstant això, assolir aquest rendiment en una aplicació executant-se en un sistema és una tasca complexa que requereix d'un alt grau d'experiència, especialment quan es tracta d'aplicacions que presenten un comportament dinàmic o quan s'usen sistemes heterogenis. En aquests casos actualment es planteja realitzar una millora de rendiment automàtica i dinàmica de les aplicacions com la millor via per l'anàlisi del rendiment. El present treball d'investigació es situa dins d'aquest àmbit d'estudi i el seu objectiu principal és és sintonitzar dinàmicament mitjançant MATE (Monitoring, Analysis and Tuning Environment) una aplicació MPI empleada en computació d'altes prestacions que segueixi un paradigma Master/Worker. Les tècniques de sintonització integrades en MATE han estat desenvolupades a partir de l'estudi d'un model de rendiment que reflecteix els colls d'ampolla propis d'aplicacions situades sota un paradigma Master/Worker: balanceig de càrrega i nombre de workers. L'execució de l'aplicació triada sota el control dinàmic de MATE i de l'estratègia de sintonització implementada ha permès observar l'adaptació del comportament d'aquesta aplicació a les condicions actuals del sistema on s'executa, obtenint així una millora en el seu rendiment.At the present time, high performance computing is used in a multitude of scientific fields, where the problems studied are resolved using parallel/distributed applications. These applications require an enormous computing capacity due to both the complexity of the problems and the necessity to solve them in real time situations. Therefore, the computational capacities and resources of the parallel systems, where these applications are executed, must be taken advantage of to attain this vital high performance. However, achieving high performance in applications executed in parallel systems is a complicated task that requires a high degree of experience, especially when dealing with applications with dynamic behaviour or those running on heterogenous systems. In these cases the use of automatic and dynamic performance improvements is proposed as a better approach to performance analysis. The research presented falls within this field of study and has the principle objective of dynamically tuning, using MATE (Monitoring, Analysis and Tuning Environment), an MPI application which employs high performance computing following the Master/Worker paradigm. The tuning techniques integrated in MATE have been developed following a study of the performance model that reflects the bottlenecks specific to the Master/Worker paradigm: load balancing and the number of workers. The execution of the chosen application under the dynamic control of MATE using the tuning strategies implemented has permitted the observation of the behaviour of said application adapting to the changing conditions in the system where it is being executed, thus obtaining an improvement in the performance

    SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization

    Full text link
    Computer vision is experiencing an AI renaissance, in which machine learning models are expediting important breakthroughs in academic research and commercial applications. Effectively training these models, however, is not trivial due in part to hyperparameters: user-configured values that control a model's ability to learn from data. Existing hyperparameter optimization methods are highly parallel but make no effort to balance the search across heterogeneous hardware or to prioritize searching high-impact spaces. In this paper, we introduce a framework for massively Scalable Hardware-Aware Distributed Hyperparameter Optimization (SHADHO). Our framework calculates the relative complexity of each search space and monitors performance on the learning task over all trials. These metrics are then used as heuristics to assign hyperparameters to distributed workers based on their hardware. We first demonstrate that our framework achieves double the throughput of a standard distributed hyperparameter optimization framework by optimizing SVM for MNIST using 150 distributed workers. We then conduct model search with SHADHO over the course of one week using 74 GPUs across two compute clusters to optimize U-Net for a cell segmentation task, discovering 515 models that achieve a lower validation loss than standard U-Net.Comment: 10 pages, 6 figure

    A methodology for transparent knowledge specification in a dynamic tuning environment

    Get PDF
    The increasing use of parallel/distributed applications demands a continuous support to take significant advantages from parallel power. This includes the evolution of performance analysis and tuning tools which automatically allows for obtaining a better behavior of the applications. Different approaches and tools have been proposed and they are continuously evolving to cover the requirements and expectations of users. One such tool is MATE (Monitoring Analysis and Tuning Environment), which provides automatic and dynamic tuning for parallel/distributed applications. The knowledge used by MATE to analyze and take decisions is based on performance models which include a set of performance parameters and a set of mathematical expressions modeling the solution of the performance problem. These elements are used by the tuning environment to conduct the monitoring and analysis steps, respectively. The tuning phase depends on the results of the performance analysis. This paper presents a methodology to specify performance models. Each performance model specification can be automatically and transparently translated into a piece of software code encapsulating the knowledge to be straightforwardly included in MATE. Applying this methodology, the user does not have to be involved in the implementation details of MATE, which makes the usage of the tool more transparent.Fil: Caymes Scutari, Paola Guadalupe. Universidad Tecnológica Nacional. Facultad Regional de Mendoza; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; ArgentinaFil: Morajko, A.. Universitat Autònoma de Barcelona; EspañaFil: Margalef, T.. Universitat Autònoma de Barcelona; EspañaFil: Luque, E.. Universitat Autònoma de Barcelona; Españ

    Enhanced Failure Detection Mechanism in MapReduce

    Get PDF
    The popularity of MapReduce programming model has increased interest in the research community for its improvement. Among the other directions, the point of fault tolerance, concretely the failure detection issue seems to be a crucial one, but that until now has not reached its satisfying level. Motivated by this, I decided to devote my main research during this period into having a prototype system architecture of MapReduce framework with a new failure detection service, containing both analytical (theoretical) and implementation part. I am confident that this work should lead the way for further contributions in detecting failures to any NoSQL App frameworks, and cloud storage systems in general

    Development and tuning framework of master/worker applications

    Get PDF
    Parallel/distributed programming is a complex task that requires a high degree of expertise to fulfill the expectations of high performance computation. The Master/Worker paradigm is one of the most commonly used because it is easy to understand and there is a wide range of applications that match this paradigm. However, there are certain features, such as data distribution and the number of workers that must be tuned properly to obtain adequate performance. In most cases such features cannot be tuned statically since they depend on the particular conditions of each execution. In this paper, we show a dynamic tuning environment that is based on a theoretical model of Master/Worker behavior and allows for the adaptation of such applications to the dynamic conditions of execution. The environment includes a pattern based application development framework that allows the user to concentrate on the design phase and makes it easier to overcome performance bottlenecks.Facultad de Informátic

    Development and tuning framework of master/worker applications

    Get PDF
    Parallel/distributed programming is a complex task that requires a high degree of expertise to fulfill the expectations of high performance computation. The Master/Worker paradigm is one of the most commonly used because it is easy to understand and there is a wide range of applications that match this paradigm. However, there are certain features, such as data distribution and the number of workers that must be tuned properly to obtain adequate performance. In most cases such features cannot be tuned statically since they depend on the particular conditions of each execution. In this paper, we show a dynamic tuning environment that is based on a theoretical model of Master/Worker behavior and allows for the adaptation of such applications to the dynamic conditions of execution. The environment includes a pattern based application development framework that allows the user to concentrate on the design phase and makes it easier to overcome performance bottlenecks.Facultad de Informátic
    • …