Search CORE

1,054 research outputs found

Dynamic Load Balancing Based on Applications Global States Monitoring

Author: Kopanski Damian
Laskowski Eryk
Olejnik Richard
Tudruj Marek
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/06/2013
Field of study

8 pages à paraîtreInternational audienceThe paper presents how to use a special novel distributed program design framework with evolved global control mechanisms to assure processor load balancing during execution of application programs. The new framework supports a programmer with an API and GUI for automated graphical design of program execution control based on global application states monitoring. The framework provides highlevel distributed control primitives at process level and a special control infrastructure for global asynchronous execution control at thread level. Both kinds of control assume observations of current multicore processor performance and communication throughput enabled in the executive distributed system. Methods for designing processor load balancing control based on a system of program and system properties metrics and computational data migration between application executive processes is presented and assessed by experiments with execution of graph representations of distributed programs

HAL - Lille 3

INRIA a CCSD electronic archive server

Macroservers: An Execution Model for DRAM Processor-In-Memory Arrays

Author: Sterling Thomas L.
Zima Hans P.
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/2000
Field of study

The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributuions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of flexible scheduling strategies for the access to resources, using a monitor-like mechanism

CiteSeerX

Caltech Authors

Distributed multi-threading in GNU prolog

Author: Morgadinho Nuno Eduardo Quaresma
Publication venue: 'Universidade de Evora'
Publication date: 01/01/2007
Field of study

Embora a computação paralela já tenha sido alvo de inúmeros estudos, o processo de a tornar acessível as massas ainda mal começou. Através da combinação com o Prolog de um ambiente de programação distribuída e multithreaded, como o PM2, torna-se possível ter computações paralelas e concorrentes usando programação em logica. Com este objetivo foi desenvolvido o PM2-Prolog, um interface Prolog para o sistema PM2. Tal sistema permite correr aplicações Prolog multithreaded em múltiplas instâncias do GNU Prolog num ambiente distribuído, tirando, assim, partido dos recursos disponíveis nos computadores ligados numa rede. Em problemas computacionalmente pesados, onde o tempo de execução é crucial, existe particular vantagem em usar este sistema. A API do sistema oferece primitivas para gestão de threads e para comunicação explícita entre threads. Testes preliminares mostram um ganho de desempenho quase linear, em comparação com uma versão sequencial. /ABSTRACT - Although parallel computing has been widely researched, the process of bringing concurrency and parallel programming to the mainstream has just begun. Combining a distributed multi-threading environment like PM2 with Prolog, opens the way to exploit concurrency and parallel computing using logic programming. To achieve such a purpose, we developed PM2-Prolog, a Prolog interface to the PM2 system. It allows multithreaded Prolog applications to run in multiple GNU Prolog engines in a distributed environment, thus taking advantage of the resources available on a computer network. This is especially useful for computationally intensive problems, where performance is an important factor. The system API offers thread management primitives, as well as explicit communication between threads. Preliminary test results show an almost linear speedup, when compared to a sequential version

Repositório Científico da Universidade de Évora

CEEME: compensating events based execution monitoring enforcement for Cyber-Physical Systems

Author: Gamage Thoshitha T.
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2011
Field of study

Fundamentally, inherently observable events in Cyber-Physical Systems with tight coupling between cyber and physical components can result in a confidentiality violation. By observing how the physical elements react to cyber commands, adversaries can identify critical links in the system and force the cyber control algorithm to make erroneous decisions. Thus, there is a propensity for a breach in confidentiality leading to further attacks on availability or integrity. Due to the highly integrated nature of Cyber-Physical Systems, it is also extremely difficult to map the system semantics into a security framework under existing security models. The far-reaching objective of this research is to develop a science of selfobfuscating systems based on the composition of simple building blocks. A model of Nondeducibility composes the building blocks under Information Flow Security Properties. To this end, this work presents fundamental theories on external observability for basic regular networks and the novel concept of event compensation that can enforce Information Flow Security Properties at runtime --Abstract, page iii

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Acta Cybernetica : Volume 9. Number 3.

Author
Publication venue
Publication date: 01/01/1990
Field of study

University of Szeged

Some techniques for automated, resource-aware distributed and mobile computing in a multi-paradigm programming system

Author: Albert Albiol Elvira
Hermenegildo Manuel V.
López García Pedro
Puebla Sánchez Alvaro Germán
Publication venue: Facultad de Informática (UPM)
Publication date: 01/08/2004
Field of study

Distributed parallel execution systems speed up applications by splitting tasks into processes whose execution is assigned to different receiving nodes in a high-bandwidth network. On the distributing side, a fundamental problem is grouping and scheduling such tasks such that each one involves sufñcient computational cost when compared to the task creation and communication costs and other such practical overheads. On the receiving side, an important issue is to have some assurance of the correctness and characteristics of the code received and also of the kind of load the particular task is going to pose, which can be specified by means of certificates. In this paper we present in a tutorial way a number of general solutions to these problems, and illustrate them through their implementation in the Ciao multi-paradigm language and program development environment. This system includes facilities for parallel and distributed execution, an assertion language for specifying complex programs properties (including safety and resource-related properties), and compile-time and run-time tools for performing automated parallelization and resource control, as well as certification of programs with resource consumption assurances and efñcient checking of such certificates

Archivo Digital UPM

Recommended from our members

A clinical patient vital signs parameter measurement, processing and predictive algorithm using ECG

Author: Holzhausen Rudolf
Publication venue: Brunel University Brunel Business School PhD Theses
Publication date: 01/01/2011
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In the modern clinical and healthcare setting, the electronic collection and analysis of patient related vital signs and parameters are a fundamental part of the relevant treatment plan and positive patient response. Modern analytical techniques combined with readily available computer software today allow for the near real time analysis of digitally acquired measurements. In the clinical context, this can directly relate to patient survival rates and treatment success. The processing of clinical parameters, especially the Electrocardiogram (ECG) in the critical care setting has changed little in recent years and the analytical processes have mostly been managed by highly trained and experienced cardiac specialists. Warning, detection and measurement techniques are focused on the post processing of events relying heavily on averaging and analogue filtering to accurately capture waveform morphologies and deviations. This Ph.D. research investigates an alternative and the possibility to analyse, in the digital domain, bio signals with a focus on the ECG to determine if the feasibility of bit by bit or near real time analysis is indeed possible but more so if the data captured has any significance in the analysis and presentation of the wave patterns in a patient monitoring environment. The research and experiments have shown the potential for the development of logical models that address both the detection and short term predication of possible follow-on events with a focus on Myocardial Ischemic (MI) and Infraction based deviations. The research has shown that real time waveform processing compared to traditional graph based analysis, is both accurate and has the potential to be of benefit to the clinician by detecting deviations and morphologies in a real time domain. This is a significant step forward and has the potential to embed years of clinical experience into the measurement processes of clinical devices, in real terms. Also, providing expert analytical and identification input electronically at the patient bedside. The global human population is testing the healthcare systems and care capabilities with the shortage of clinical and healthcare providers in ever decreasing coverage of treatment that can be provided. The research is a moderate step in further realizing this and aiding the caregiver by providing true and relevant information and data, which assists in the clinical decision process and ultimately improving the required standard of patient care

Brunel University Research Archive

Toward Reliable and Efficient Message Passing Software for HPC Systems: Fault Tolerance and Vector Extension

Author: Zhong Dong
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2021
Field of study

As the scale of High-performance Computing (HPC) systems continues to grow, researchers are devoted themselves to achieve the best performance of running long computing jobs on these systems. My research focus on reliability and efficiency study for HPC software. First, as systems become larger, mean-time-to-failure (MTTF) of these HPC systems is negatively impacted and tends to decrease. Handling system failures becomes a prime challenge. My research aims to present a general design and implementation of an efficient runtime-level failure detection and propagation strategy targeting large-scale, dynamic systems that is able to detect both node and process failures. Using multiple overlapping topologies to optimize the detection and propagation, minimizing the incurred overhead sand guaranteeing the scalability of the entire framework. Results from different machines and benchmarks compared to related works shows that my design and implementation outperforms non-HPC solutions significantly, and is competitive with specialized HPC solutions that can manage only MPI applications. Second, I endeavor to implore instruction level parallelization to achieve optimal performance. Novel processors support long vector extensions, which enables researchers to exploit the potential peak performance of target architectures. Intel introduced Advanced Vector Extension (AVX512 and AVX2) instructions for x86 Instruction Set Architecture (ISA). Arm introduced Scalable Vector Extension (SVE) with a new set of A64 instructions. Both enable greater parallelisms. My research utilizes long vector reduction instructions to improve the performance of MPI reduction operations. Also, I use gather and scatter feature to speed up the packing and unpacking operation in MPI. The evaluation of the resulting software stack under different scenarios demonstrates that the approach is not only efficient but also generalizable to many vector architecture and efficient

University of Tennessee, Knoxville: Trace

Data-Driven Self-Tuning in a Coordination Programming Language

Author: Kuznetcov Maksim
Publication venue
Publication date: 24/05/2016
Field of study

Coordination programming is a paradigm for managing composition, communication, and synchronisation of concurrent components. AstraKahn is a new dataflow coordination language based on Gilles Kahn’s model of process network with some significant refinements. AstraKahn provides a mechanism of implicit data parallelism that is expected to rely on self-tuning, i.e. adaptive optimisation of execution parameters in order to improve the performance of the program. This is achieved by providing a programmer with a number of special network primitives that allow an AstraKahn runtime system to extract optimisation parameters and adjust them while monitoring the performance of execution. In this thesis, we present the architecture of an AstraKahn prototype including a compiler and a runtime system. On the runtime system level the built-in compound network primitives are constructed from simple ones. This approach allows us to make the implementation clear and easily extensible. As a minor contribution we present a number of potential self-tuning heuristics for a simple network pattern. Also, for illustrative purposes, a practical application of the morphism pattern is presented. The particle-in-cell problem, whose parallelisation requires load-balancing, is formulated this way

University of Hertfordshire Research Archive

Χρήση μοντέλου παράλληλου προγραμματισμού για σύνθεση αρχιτεκτονικών

Author: Owaida Muhsen
Publication venue
Publication date: 01/01/2012
Field of study

The problem of automatically generating hardware modules from high level application representations has been at the forefront of EDA research during the last few years. In this Dissertation we introduce a methodology to automatically synthesize hardware accelerators from OpenCL applications. OpenCL is a recent industry supported standard for writing programs that execute on multicore platforms and accelerators such as GPUs. Our methodology maps OpenCL kernels into hardware accelerators based on architectural templates that explicitly decouple computation from memory communication whenever this is possible. The templates can be tuned to provide a wide repertoire of accelerators that meet user performance requirements and FPGA device characteristics. Furthermore a set of high- and low-level compiler optimizations is applied to generate optimized accelerators. Our experimental evaluation shows that the generated accelerators are tuned efficiently to match the applications memory access pattern and computational complexity and to achieve user performance requirements. An important objective of our tool is to expand the FPGA development user base to software engineers thereby expanding the scope of FPGAs beyond the realm of hardware design.To πρόβλημα της αυτόματης δημιουργίας μονάδων υλικό από παραστάσεις υψηλού επιπέδου εφαρμογής είναι στην πρώτη γραμμή της EDA έρευνας κατά τη διάρκεια των τελευταίων ετών. Σε αυτή την διατριβή παρουσιάζουμε μια μεθοδολογία για τη αυτόματη σύνθεση επιταχυντές υλικού από εφαρμογές OpenCL. OpenCL είναι ένα πρόσφατο πρότυπο για τη σύνταξη των προγραμμάτων που εκτελούνται σε πλατφόρμες πολλαπλών πυρήνων και επιταχυντές όπως GPUs. Η μεθοδολογία μας μετατρέπει προγράμματα OpenCL σε επιταχυντές υλικού με βάση αρχιτεκτονικά πρότυπα που ρητά αποσυνδέει τους υπολογισμούς από την μεταφορά δεδομένων από/προς την μνήμη όποτε αυτό είναι δυνατό. Τα πρότυπα μπορούν να συντονιστούν ώστε να παρέχουν ένα ευρύ ρεπερτόριο από επιταχυντές που πληρούν τις απαιτήσεις απόδοσης των χρηστών και τα χαρακτηριστικά της συσκευής FPGA. Επιπλέον ένα σύνολο υψηλής και χαμηλής στάθμης βελτιστοποιήσεις μεταγλωττιστή εφαρμόζεται για να παράγει βελτιστοποιημένα επιταχυντές. Η πειραματική αξιολόγηση δείχνει ότι οι επιταχυντές που δημιουργούνται αποτελεσματικά συντονισμένοι για να ταιριάζει με το μοτίβο πρόσβασης στην μνήμη κάθε εφαρμογής και την υπολογιστική πολυπλοκότητα και να επιτύχουν τις απαιτήσεις απόδοσης των χρηστών. Ένας σημαντικός στόχος του εργαλείου μας είναι η επέκταση της βάσης χρηστών πλατφόρμες FPGA για μηχανικούς λογισμικού ώστε να γίνει ανάπτυξη FPGA συστήματα από μηχανικούς λογισμικού χωρίς την ανάγκη για εμπειρία σχεδιασμού υλικού

Hellenic National Archive of Doctoral Dissertations

University of Thessaly Institutional Repository