Search CORE

4,598 research outputs found

Scalable parallel communications

Author: Foudriat E. C.
Khanna S.
Maly K.
Mukkamala R.
Overstreet C. M.
Sekhar Y. S.
Zubair M.
Publication venue
Publication date
Field of study

Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups

NASA Technical Reports Server

MORA: an Energy-Aware Slack Reclamation Scheme for Scheduling Sporadic Real-Time Tasks upon Multiprocessor Platforms

Author: Goossens Joel
Nelis Vincent
Publication venue
Publication date: 01/06/2009
Field of study

In this paper, we address the global and preemptive energy-aware scheduling problem of sporadic constrained-deadline tasks on DVFS-identical multiprocessor platforms. We propose an online slack reclamation scheme which profits from the discrepancy between the worst- and actual-case execution time of the tasks by slowing down the speed of the processors in order to save energy. Our algorithm called MORA takes into account the application-specific consumption profile of the tasks. We demonstrate that MORA does not jeopardize the system schedulability and we show by performing simulations that it can save up to 32% of energy (in average) compared to execution without using any energy-aware algorithm.Comment: 11 page

arXiv.org e-Print Archive

DI-fusion

k2U: A General Framework from k-Point Effective Schedulability Analysis to Utilization-Based Tests

Author: Chen Jian-Jia
Huang Wen-Hung
Liu Cong
Publication venue
Publication date: 14/09/2015
Field of study

To deal with a large variety of workloads in different application domains in real-time embedded systems, a number of expressive task models have been developed. For each individual task model, researchers tend to develop different types of techniques for deriving schedulability tests with different computation complexity and performance. In this paper, we present a general schedulability analysis framework, namely the k2U framework, that can be potentially applied to analyze a large set of real-time task models under any fixed-priority scheduling algorithm, on both uniprocessor and multiprocessor scheduling. The key to k2U is a k-point effective schedulability test, which can be viewed as a "blackbox" interface. For any task model, if a corresponding k-point effective schedulability test can be constructed, then a sufficient utilization-based test can be automatically derived. We show the generality of k2U by applying it to different task models, which results in new and improved tests compared to the state-of-the-art. Analogously, a similar concept by testing only k points with a different formulation has been studied by us in another framework, called k2Q, which provides quadratic bounds or utilization bounds based on a different formulation of schedulability test. With the quadratic and hyperbolic forms, k2Q and k2U frameworks can be used to provide many quantitive features to be measured, like the total utilization bounds, speed-up factors, etc., not only for uniprocessor scheduling but also for multiprocessor scheduling. These frameworks can be viewed as a "blackbox" interface for schedulability tests and response-time analysis

arXiv.org e-Print Archive

Crossref

Coordinated Scheduling and Dynamic Performance Analysis in Multiprocessors Systems

Author: Corbalán González Julita
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2002
Field of study

El rendimiento de los actuales sistemas multiprocesador de memoria compartida depende tanto de la utilización eficiente de todos los componentes del sistema (procesadores, memoria, etc), como de las características del conjunto de aplicaciones a ejecutar. Esta Tesis tiene como principal objetivo mejorar la ejecución de conjuntos de aplicaciones paralelas en sistemas multiprocesador de memoria compartida mediante la utilización de información sobre el rendimiento de las aplicaciones para la planificación de los procesadores.Es una práctica común de los usuarios de un sistema multiprocesador reservar muchos procesadores para ejecutar sus aplicaciones asumiendo que cuantos más procesadores utilicen mejor rendimiento sacarán sus aplicaciones. Sin embargo, normalmente esto no es cierto. Las aplicaciones paralelas tienen diferentes características respecto a su escalabilidad. Su rendimiento depende además de parámetros que sólo son conocidos en tiempo de ejecución, como por ejemplo el conjunto de datos de entrada o la influencia que pueden ejercer determinadas aplicaciones que se ejecutan de forma concurrente.En esta tesis proponemos que el sistema no base sus decisiones solamente en las peticiones de recursos de los usuarios sino que él, dinámicamente, mida el rendimiento que están consiguiendo las aplicaciones y base, o ajuste, sus decisiones teniendo en cuenta esa información.El rendimiento de las aplicaciones paralelas puede ser medido por el sistema de forma dinámica y automática sin introducir una sobrecarga significativa en la ejecución de las aplicaciones. Utilizando esta información, la planificación de procesadores puede ser decidida, o ajustada, siendo mucho más robusta a requerimientos incorrectos por parte de los usuarios, que otras políticas que no consideran este tipo de información. Además de considerar el rendimiento, proponemos imponer una eficiencia objetivo a las aplicaciones paralelas. Esta eficiencia objetivo determinará si la aplicación está consiguiendo un rendimiento aceptable o no, y será usada para ajustar la asignación de procesadores. La eficiencia objetivo de un sistema podrá ser un parámetro ajustable dinámicamente en función del estado del sistema: número de aplicaciones ejecutándose, aplicaciones encoladas, etc.También proponemos coordinar los diferentes niveles de planificación que intervienen en la planificación de procesadores: Nivel librería de usuario, planificador de procesadores (en el S.O), y gestión del sistema de colas. La idea es establecer una interficie entre niveles para enviar y recibir información entre niveles, así como considerar esta información para tomar las decisiones propias de cada nivel.La evaluación de esta Tesis ha sido realizada utilizando un enfoque práctico. Hemos diseñado e implementado un entorno de ejecución completo para ejecutar aplicaciones paralelas que siguen el modelo de programación OpenMP. Hemos introducido nuestras propuestas modificando los tres niveles de planificación mencionados. Los resultados muestran que las ideas propuestas en esta tesis mejoran significativamente el rendimiento del sistema. En aquellos casos en que tanto las aplicaciones como los parámetros del sistema han sido previamente optimizados, las propuestas realizadas introducen una penalización del 5% en el peor de los casos, comparado con el mejor de los resultados obtenidos por otras políticas evaluadas. Sin embargo, en otros casos evaluados, las propuestas realizadas en esta tesis han mejorado el rendimiento del sistema hasta un 400% también comparado con el mejor resultado obtenido por otras políticas evaluadas.Las principales conclusiones que podemos obtener de esta Tesis son las siguientes: - El rendimiento de las aplicaciones paralelas puede ser medido en tiempo de ejecución. Los requisitos para aplicar el mecanismo de medida propuesto en esta Tesis son que las aplicaciones sean maleables y estar en un entorno de ejecución multiprocesador de memoria compartida. - El rendimiento de las aplicaciones paralelas debe ser considerado para decidir la asignación de procesadores a aplicaciones. El sistema debe utilizar la información del rendimiento para auto-ajustar sus decisiones. Además, el sistema debe imponer una eficiencia objetivo para asegurar el uso eficiente de procesadores.- Los diferentes niveles de planificación deben estar coordinados para evitar interferencias entre ellosThe performance of current shared-memory multiprocessor systems depends on both the efficient utilization of all the architectural elements in the system (processors, memory, etc), and the workload characteristics.This Thesis has the main goal of improving the execution of workloads of parallel applications in shared-memory multiprocessor systems by using real performance information in the processor scheduling.It is a typical practice of users in multiprocessor systems to request for a high number of processors assuming that the higher the processor request, the higher the number of processors allocated, and the higher the speedup achieved by their applications. However, this is not true. Parallel applications have different characteristics with respect to their scalability. Their speedup also depends on run-time parameters such as the influence of the rest of running applications.This Thesis proposes that the system should not base its decisions on the users requests only, but the system must decide, or adjust, its decisions based on real performance information calculated at run-time. The performance of parallel applications is information that the system can dynamically measure without introducing a significant penalty in the application execution time. Using this information, the processor allocation can be decided, or modified, being robust to incorrect processor requests given by users. We also propose that the system use a target efficiency to ensure the efficient use of processors. This target efficiency is a system parameter and can be dynamically decided as a function of the characteristics of running applications or the number of queued applications.We also propose to coordinate the different scheduling levels that operate in the processor scheduling: the run-time scheduler, the processor scheduler, and the queueing system. We propose to establish an interface between levels to send and receive information, and to take scheduling decisions considering the information provided by the rest of levels.The evaluation of this Thesis has been done using a practical approach. We have designed and implemented a complete execution environment to execute OpenMP parallel applications. We have introduced our proposals, modifying the three scheduling levels (run-time library, processor scheduler, and queueing system).Results show that the ideas proposed in this Thesis significantly improve the system performance. If the evaluated workload has been previously tuned, in the worst case, we have introduced a slowdown around 5% in the workload execution time compared with the best execution time achieved. However, in some extreme cases, with a workload and a system configuration not previously tuned, we have improved the system performance in a 400%, also compared with the next best time.The main results achieved in this Thesis can be summarized as follows:- The performance of parallel applications can be measured at run-time. The requirements to apply the mechanism proposed in this Thesis are to have malleable applications and shared-memory multiprocessor architectures.- The performance of parallel applications 1must be considered to decide the processor allocation. The system must use this information to self-adjust its decisions based on the achieved performance. Moreover, the system must impose a target efficiency to ensure the efficient use of processors.- The different scheduling levels must be coordinated to avoid interferences between levels

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

3E: Energy-Efficient Elastic Scheduling for Independent Tasks in Heterogeneous Computing Systems

Author: Ge Rong
He Chuan
Sun Jinguang
Zhu Xiaomin
Publication venue: e-Publications@Marquette
Publication date: 01/02/2013
Field of study

Reducing energy consumption is a major design constraint for modern heterogeneous computing systems to minimize electricity cost, improve system reliability and protect environment. Conventional energy-efficient scheduling strategies developed on these systems do not sufficiently exploit the system elasticity and adaptability for maximum energy savings, and do not simultaneously take account of user expected finish time. In this paper, we develop a novel scheduling strategy named energy-efficient elastic (3E) scheduling for aperiodic, independent and non-real-time tasks with user expected finish times on DVFS-enabled heterogeneous computing systems. The 3E strategy adjusts processors’ supply voltages and frequencies according to the system workload, and makes trade-offs between energy consumption and user expected finish times. Compared with other energy-efficient strategies, 3E significantly improves the scheduling quality and effectively enhances the system elasticity

epublications@Marquette

Survivable algorithms and redundancy management in NASA's distributed computing systems

Author: Malek Miroslaw
Publication venue
Publication date
Field of study

The design of survivable algorithms requires a solid foundation for executing them. While hardware techniques for fault-tolerant computing are relatively well understood, fault-tolerant operating systems, as well as fault-tolerant applications (survivable algorithms), are, by contrast, little understood, and much more work in this field is required. We outline some of our work that contributes to the foundation of ultrareliable operating systems and fault-tolerant algorithm design. We introduce our consensus-based framework for fault-tolerant system design. This is followed by a description of a hierarchical partitioning method for efficient consensus. A scheduler for redundancy management is introduced, and application-specific fault tolerance is described. We give an overview of our hybrid algorithm technique, which is an alternative to the formal approach given

NASA Technical Reports Server

Coordinated Scheduling and Dynamic Performance Analysis in Multiprocessors Systems

Author: Corbalán González Julita
Publication venue: Universitat Politècnica de Catalunya
Publication date: 05/07/2002
Field of study

UPCommons. Portal del coneixement obert de la UPC

libcppa - Designing an Actor Semantic for C++11

Author: Charousset Dominik
Schmidt Thomas C.
Publication venue
Publication date: 01/01/2013
Field of study

Parallel hardware makes concurrency mandatory for efficient program execution. However, writing concurrent software is both challenging and error-prone. C++11 provides standard facilities for multiprogramming, such as atomic operations with acquire/release semantics and RAII mutex locking, but these primitives remain too low-level. Using them both correctly and efficiently still requires expert knowledge and hand-crafting. The actor model replaces implicit communication by sharing with an explicit message passing mechanism. It applies to concurrency as well as distribution, and a lightweight actor model implementation that schedules all actors in a properly pre-dimensioned thread pool can outperform equivalent thread-based applications. However, the actor model did not enter the domain of native programming languages yet besides vendor-specific island solutions. With the open source library libcppa, we want to combine the ability to build reliable and distributed systems provided by the actor model with the performance and resource-efficiency of C++11.Comment: 10 page

arXiv.org e-Print Archive

REPOSIT