10 research outputs found

    An Embedded System for applying High Performance Computing in Educational Learning Activity

    Get PDF
    HPC (High Performance Computing) has become more popular in the last few years. With the benefits on high computational power, HPC has impact on industry, scientific research and educational activities. Implementing HPC as a curriculum in universities could be consuming a lot of resources because well-known HPC system are using Personal Computer or Server. By using PC as the practical moduls it is need great resources and spaces.  This paper presents an innovative high performance computing cluster system to support education learning activities in HPC course with small size, low cost, and yet powerful enough. In recent years, High Performance computing usually implanted in cluster computing and require high specification computer and expensive cost. It is not efficient applying High Performance Computing in Educational research activiry such as learning in Class. Therefore, our proposed system is created with inexpensive component by using Embedded System to make High Performance Computing applicable for leaning in the class. Students involved in the construction of embedded system, built clusters from basic embedded and network components, do benchmark performance, and implement simple parallel case using the cluster.  In this research we performed evaluation of embedded systems comparing with i5 PC, the results of our embedded system performance of NAS benchmark are similar with i5 PCs. We also conducted surveys about student learning satisfaction that with embedded system students are able to learn about HPC from building the system until making an application that use HPC system

    Massively Parallel "Schizophrenic" Quicksort

    Get PDF

    A self-mobile skeleton in the presence of external loads

    Get PDF
    Multicore clusters provide cost-eïŹ€ective platforms for running CPU-intensive and data-intensive parallel applications. To eïŹ€ectively utilise these platforms, sharing their resources is needed amongst the applications rather than dedicated environments. When such computational platforms are shared, user applications must compete at runtime for the same resource so the demand is irregular and hence the load is changeable and unpredictable. This thesis explores a mechanism to exploit shared multicore clusters taking into account the external load. This mechanism seeks to reduce runtime by ïŹnding the best computing locations to serve the running computations. We propose a generic algorithmic data-parallel skeleton which is aware of its computations and the load state of the computing environment. This skeleton is structured using the Master/Worker pattern where the master and workers are distributed on the nodes of the cluster. This skeleton divides the problem into computations where all these computations are initiated by the master and coordinated by the distributed workers. Moreover, the skeleton has built-in mobility to implicitly move the parallel computations between two workers. This mobility is data mobility controlled by the application, the skeleton. This skeleton is not problem-speciïŹc and therefore it is able to execute diïŹ€erent kinds of problems. Our experiments suggest that this skeleton is able to eïŹƒciently compensate for unpredictable load variations. We also propose a performance cost model that estimates the continuation time of the running computations locally and remotely. This model also takes the network delay, data size and the load state as inputs to estimate the transfer time of the potential movement. Our experiments demonstrate that this model takes accurate decisions based on estimates in diïŹ€erent load patterns to reduce the total execution time. This model is problem-independent because it considers the progress of all current computations. Moreover, this model is based on measurements so it is not dependent on the programming language. Furthermore, this model takes into account the load state of the nodes on which the computation run. This state includes the characteristics of the nodes and hence this model is architecture-independent. Because the scheduling has direct impact on system performance, we support the skeleton with a cost-informed scheduler that uses a hybrid scheduling policy to improve the dynamicity and adaptivity of the skeleton. This scheduler has agents distributed over the participating workers to keep the load information up to date, trigger the estimations, and facilitate the mobility operations. On runtime, the skeleton co-schedules its computations over computational resources without interfering with the native operating system scheduler. We demonstrate that using a hybrid approach the system makes mobility decisions which lead to improved performance and scalability over large number of computational resources. Our experiments suggest that the adaptivity of our skeleton in shared environment improves the performance and reduces resource contention on nodes that are heavily loaded. Therefore, this adaptivity allows other applications to acquire more resources. Finally, our experiments show that the load scheduler has a low incurred overhead, not exceeding 0.6%, compared to the total execution time

    Iterative Schedule Optimization for Parallelization in the Polyhedron Model

    Get PDF
    In high-performance computing, one primary objective is to exploit the performance that the given target hardware can deliver to the fullest. Compilers that have the ability to automatically optimize programs for a specific target hardware can be highly useful in this context. Iterative (or search-based) compilation requires little or no prior knowledge and can adapt more easily to concrete programs and target hardware than static cost models and heuristics. Thereby, iterative compilation helps in situations in which static heuristics do not reflect the combination of input program and target hardware well. Moreover, iterative compilation may enable the derivation of more accurate cost models and heuristics for optimizing compilers. In this context, the polyhedron model is of help as it provides not only a mathematical representation of programs but, more importantly, a uniform representation of complex sequences of program transformations by schedule functions. The latter facilitates the systematic exploration of the set of legal transformations of a given program. Early approaches to purely iterative schedule optimization in the polyhedron model do not limit their search to schedules that preserve program semantics and, thereby, suffer from the need to explore numbers of illegal schedules. More recent research ensures the legality of program transformations but presumes a sequential rather than a parallel execution of the transformed program. Other approaches do not perform a purely iterative optimization. We propose an approach to iterative schedule optimization for parallelization and tiling in the polyhedron model. Our approach targets loop programs that profit from data locality optimization and coarse-grained loop parallelization. The schedule search space can be explored either randomly or by means of a genetic algorithm. To determine a schedule's profitability, we rely primarily on measuring the transformed code's execution time. While benchmarking is accurate, it increases the time and resource consumption of program optimization tremendously and can even make it impractical. We address this limitation by proposing to learn surrogate models from schedules generated and evaluated in previous runs of the iterative optimization and to replace benchmarking by performance prediction to the extent possible. Our evaluation on the PolyBench 4.1 benchmark set reveals that, in a given setting, iterative schedule optimization yields significantly higher speedups in the execution of the program to be optimized. Surrogate performance models learned from training data that was generated during previous iterative optimizations can reduce the benchmarking effort without strongly impairing the optimization result. A prerequisite for this approach is a sufficient similarity between the training programs and the program to be optimized

    Accéleration des traitements de la sécurité mobile avec le calcul parallÚle

    Get PDF
    L’accĂ©lĂ©ration des traitements relatifs Ă  la sĂ©curitĂ© mobile est devenue l’un des problĂšmes les plus importants vu la croissance exponentielle et l’impact important des attaques ciblant ces plateformes. Il est important de protĂ©ger les informations sensibles au sein des tĂ©lĂ©phones mobiles Ă  travers l’implantation de systĂšmes de dĂ©tection de malwares ainsi que le chiffrement des donnĂ©es dans le but de maintenir un plus haut niveau de sĂ©curitĂ©. En effet, pour dĂ©tecter les applications malveillantes, un antivirus analyse un flux de donnĂ©es important et le compare avec une base de donnĂ©es de signatures de malwares. Malheureusement, comme le nombre de menaces augmente continuellement, le nombre de signatures de codes malveillants augmente proportionnellement. Ceci rend le processus de dĂ©tection plus complexe pour les tĂ©lĂ©phones mobiles, surtout qu’ils sont limitĂ©s en termes de mĂ©moire, de batterie et de capacitĂ© de traitement. Comme le niveau de sĂ©curitĂ© de ces systĂšmes s’aggrave, la capacitĂ© de calcul parallĂšle pour les tĂ©lĂ©phones mobiles est de mieux en mieux amĂ©liorĂ©e avec l’évolution des unitĂ©s de traitement graphiques mobiles (GPU). Dans ce mĂ©moire, nous allons porter l’accent sur comment nous pouvons tirer profit de l’évolution des capacitĂ©s de traitement parallĂšle des appareils mobiles afin d’accĂ©lĂ©rer la dĂ©tection des logiciels malveillants ainsi que les traitements de cryptographie sur les tĂ©lĂ©phones Android. Dans ce but, nous avons conçu et mis en oeuvre une architecture parallĂšle pour les appareils mobiles qui exploite les capacitĂ©s de calcul des GPUs mobiles et le traitement distribuĂ© sur les clusters. Une sĂ©rie de techniques de calcul et d’optimisation de la mĂ©moire est proposĂ©e pour augmenter l’efficacitĂ© de la dĂ©tection et le dĂ©bit d’exĂ©cution. Les rĂ©sultats de ce travail de recherche nous mĂšnent Ă  conclure que les GPUs mobiles peuvent ĂȘtre utilisĂ©es efficacement pour accĂ©lĂ©rer la dĂ©tection des malwares pour les tĂ©lĂ©phones mobiles ainsi que les traitements cryptographiques. Les rĂ©sultats montrent Ă©galement que l’architecture locale proposĂ©e sur les tĂ©lĂ©phones mobiles peut ĂȘtre Ă©tendue Ă  une architecture de cluster afin d’avoir un taux d’accĂ©lĂ©ration de traitement plus important lorsque les ressources du tĂ©lĂ©phone mobile sont occupĂ©es

    Towards the formal verification of human-agent-robot teamwork

    Get PDF
    The formal analysis of computational processes is by now a well-established field. However, in practical scenarios, the problem of how we can formally verify interactions with humans still remains. This thesis is concerned with addressing this problem through the use of the Brahms language. Our overall goal is to provide formal verification techniques for human-agent teamwork, particularly astronaut-robot teamwork on future space missions and human-robot interactions in health-care scenarios modelled in Brahms
    corecore