Search CORE

10 research outputs found

An Embedded System for applying High Performance Computing in Educational Learning Activity

Author: Fahmi Nurul
Huda Samsul
Pamenang M. Unggul
Pramadihanto Dadet
Roziqin M. Choirur
Sukaridhoto Stritusta
W. Andri Permana
Widodo Edi Wahyu
Wina Rachmawan Irene Erlyn
Publication venue: 'EMITTER International Journal of Engineering Technology'
Publication date: 01/08/2016
Field of study

HPC (High Performance Computing) has become more popular in the last few years. With the benefits on high computational power, HPC has impact on industry, scientific research and educational activities. Implementing HPC as a curriculum in universities could be consuming a lot of resources because well-known HPC system are using Personal Computer or Server. By using PC as the practical moduls it is need great resources and spaces.Â This paper presents an innovative high performance computing cluster system to support education learning activities in HPC course with small size, low cost, and yet powerful enough. In recent years, High Performance computing usually implanted in cluster computing and require high specification computer and expensive cost. It is not efficient applying High Performance Computing in Educational research activiry such as learning in Class. Therefore, our proposed system is created with inexpensive component by using Embedded System to make High Performance Computing applicable for leaning in the class. Students involved in the construction of embedded system, built clusters from basic embedded and network components, do benchmark performance, and implement simple parallel case using the cluster. Â In this research we performed evaluation of embedded systems comparing with i5 PC, the results of our embedded system performance of NAS benchmark are similar with i5 PCs. We also conducted surveys about student learning satisfaction that with embedded system students are able to learn about HPC from building the system until making an application that use HPC system

EMITTER - International Journal of Engineering Technology

Directory of Open Access Journals

EMITTER International Journal of Engineering Technology

PARFOR. An experimental parallel environment for fortran programmers

Author: Noel Roland
Publication venue
Publication date: 01/01/1988
Field of study

Repository of the University of Namur

Run-Time Selection of Customized Accelerators

Author: José Miguel Carvalho Martins de Campos
Publication venue
Publication date: 24/07/2020
Field of study

Repositório Aberto da Universidade do Porto

Massively Parallel "Schizophrenic" Quicksort

Author: Wiebigke Armin
Publication venue
Publication date: 04/06/2018
Field of study

KITopen

On Collective Communication and Notified Read in the Global Address Space Programming Interface (GASPI)

Author: End Vanessa
Publication venue
Publication date: 14/12/2016
Field of study

Georg-August-University Göttingen

MPG.PuRe

A self-mobile skeleton in the presence of external loads

Author: Alsalkini Turkey
Publication venue: Mathematical and Computer Sciences
Publication date: 01/10/2017
Field of study

Multicore clusters provide cost-eﬀective platforms for running CPU-intensive and data-intensive parallel applications. To eﬀectively utilise these platforms, sharing their resources is needed amongst the applications rather than dedicated environments. When such computational platforms are shared, user applications must compete at runtime for the same resource so the demand is irregular and hence the load is changeable and unpredictable. This thesis explores a mechanism to exploit shared multicore clusters taking into account the external load. This mechanism seeks to reduce runtime by ﬁnding the best computing locations to serve the running computations. We propose a generic algorithmic data-parallel skeleton which is aware of its computations and the load state of the computing environment. This skeleton is structured using the Master/Worker pattern where the master and workers are distributed on the nodes of the cluster. This skeleton divides the problem into computations where all these computations are initiated by the master and coordinated by the distributed workers. Moreover, the skeleton has built-in mobility to implicitly move the parallel computations between two workers. This mobility is data mobility controlled by the application, the skeleton. This skeleton is not problem-speciﬁc and therefore it is able to execute diﬀerent kinds of problems. Our experiments suggest that this skeleton is able to eﬃciently compensate for unpredictable load variations. We also propose a performance cost model that estimates the continuation time of the running computations locally and remotely. This model also takes the network delay, data size and the load state as inputs to estimate the transfer time of the potential movement. Our experiments demonstrate that this model takes accurate decisions based on estimates in diﬀerent load patterns to reduce the total execution time. This model is problem-independent because it considers the progress of all current computations. Moreover, this model is based on measurements so it is not dependent on the programming language. Furthermore, this model takes into account the load state of the nodes on which the computation run. This state includes the characteristics of the nodes and hence this model is architecture-independent. Because the scheduling has direct impact on system performance, we support the skeleton with a cost-informed scheduler that uses a hybrid scheduling policy to improve the dynamicity and adaptivity of the skeleton. This scheduler has agents distributed over the participating workers to keep the load information up to date, trigger the estimations, and facilitate the mobility operations. On runtime, the skeleton co-schedules its computations over computational resources without interfering with the native operating system scheduler. We demonstrate that using a hybrid approach the system makes mobility decisions which lead to improved performance and scalability over large number of computational resources. Our experiments suggest that the adaptivity of our skeleton in shared environment improves the performance and reduces resource contention on nodes that are heavily loaded. Therefore, this adaptivity allows other applications to acquire more resources. Finally, our experiments show that the load scheduler has a low incurred overhead, not exceeding 0.6%, compared to the total execution time

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Iterative Schedule Optimization for Parallelization in the Polyhedron Model

Author: Ganser Stefan
Publication venue
Publication date: 17/03/2020
Field of study

In high-performance computing, one primary objective is to exploit the performance that the given target hardware can deliver to the fullest. Compilers that have the ability to automatically optimize programs for a specific target hardware can be highly useful in this context. Iterative (or search-based) compilation requires little or no prior knowledge and can adapt more easily to concrete programs and target hardware than static cost models and heuristics. Thereby, iterative compilation helps in situations in which static heuristics do not reflect the combination of input program and target hardware well. Moreover, iterative compilation may enable the derivation of more accurate cost models and heuristics for optimizing compilers. In this context, the polyhedron model is of help as it provides not only a mathematical representation of programs but, more importantly, a uniform representation of complex sequences of program transformations by schedule functions. The latter facilitates the systematic exploration of the set of legal transformations of a given program. Early approaches to purely iterative schedule optimization in the polyhedron model do not limit their search to schedules that preserve program semantics and, thereby, suffer from the need to explore numbers of illegal schedules. More recent research ensures the legality of program transformations but presumes a sequential rather than a parallel execution of the transformed program. Other approaches do not perform a purely iterative optimization. We propose an approach to iterative schedule optimization for parallelization and tiling in the polyhedron model. Our approach targets loop programs that profit from data locality optimization and coarse-grained loop parallelization. The schedule search space can be explored either randomly or by means of a genetic algorithm. To determine a schedule's profitability, we rely primarily on measuring the transformed code's execution time. While benchmarking is accurate, it increases the time and resource consumption of program optimization tremendously and can even make it impractical. We address this limitation by proposing to learn surrogate models from schedules generated and evaluated in previous runs of the iterative optimization and to replace benchmarking by performance prediction to the extent possible. Our evaluation on the PolyBench 4.1 benchmark set reveals that, in a given setting, iterative schedule optimization yields significantly higher speedups in the execution of the program to be optimized. Surrogate performance models learned from training data that was generated during previous iterative optimizations can reduce the benchmarking effort without strongly impairing the optimization result. A prerequisite for this approach is a sufficient similarity between the training programs and the program to be optimized

Particionamento Automático de Redes de Restrições para Execução Paralela

Author: Xu Yi
Publication venue
Publication date: 20/11/2013
Field of study

Repositório Aberto da Universidade do Porto

Accéleration des traitements de la sécurité mobile avec le calcul parallèle

Author: Abdellatif Manel
Publication venue: École de technologie supérieure
Publication date
Field of study

L’accélération des traitements relatifs à la sécurité mobile est devenue l’un des problèmes les plus importants vu la croissance exponentielle et l’impact important des attaques ciblant ces plateformes. Il est important de protéger les informations sensibles au sein des téléphones mobiles à travers l’implantation de systèmes de détection de malwares ainsi que le chiffrement des données dans le but de maintenir un plus haut niveau de sécurité. En effet, pour détecter les applications malveillantes, un antivirus analyse un flux de données important et le compare avec une base de données de signatures de malwares. Malheureusement, comme le nombre de menaces augmente continuellement, le nombre de signatures de codes malveillants augmente proportionnellement. Ceci rend le processus de détection plus complexe pour les téléphones mobiles, surtout qu’ils sont limités en termes de mémoire, de batterie et de capacité de traitement. Comme le niveau de sécurité de ces systèmes s’aggrave, la capacité de calcul parallèle pour les téléphones mobiles est de mieux en mieux améliorée avec l’évolution des unités de traitement graphiques mobiles (GPU). Dans ce mémoire, nous allons porter l’accent sur comment nous pouvons tirer profit de l’évolution des capacités de traitement parallèle des appareils mobiles afin d’accélérer la détection des logiciels malveillants ainsi que les traitements de cryptographie sur les téléphones Android. Dans ce but, nous avons conçu et mis en oeuvre une architecture parallèle pour les appareils mobiles qui exploite les capacités de calcul des GPUs mobiles et le traitement distribué sur les clusters. Une série de techniques de calcul et d’optimisation de la mémoire est proposée pour augmenter l’efficacité de la détection et le débit d’exécution. Les résultats de ce travail de recherche nous mènent à conclure que les GPUs mobiles peuvent être utilisées efficacement pour accélérer la détection des malwares pour les téléphones mobiles ainsi que les traitements cryptographiques. Les résultats montrent également que l’architecture locale proposée sur les téléphones mobiles peut être étendue à une architecture de cluster afin d’avoir un taux d’accélération de traitement plus important lorsque les ressources du téléphone mobile sont occupées

Espace ÉTS

Towards the formal verification of human-agent-robot teamwork

Author: Stocker Richard
Publication venue
Publication date
Field of study

The formal analysis of computational processes is by now a well-established field. However, in practical scenarios, the problem of how we can formally verify interactions with humans still remains. This thesis is concerned with addressing this problem through the use of the Brahms language. Our overall goal is to provide formal verification techniques for human-agent teamwork, particularly astronaut-robot teamwork on future space missions and human-robot interactions in health-care scenarios modelled in Brahms

University of Liverpool Repository