96 research outputs found

    Application Development using Compositional Performance Analysis

    Get PDF
    A parallel programming archetype [Cha94, CMMM95] is an abstraction that captures the common features of a class of problems with a similar computational structure and combines them with a parallelization strategy to produce a pattern of dataflow and communication. Such abstractions are useful in application development, both as a conceptual framework and as a basis for tools and techniques. The efficiency of a parallel program can depend a great deal on how its data and tasks are decomposed and distributed. This thesis describes a simple performance evaluation methodology that includes an analytic model for predicting the performance of parallel and distributed computations developed for multicomputer machines and networked personal computers. This analytic model can be supplemented by a simulation infrastructure for application writers to use when developing parallel programs using archetypes. These performance evaluation tools were developed with the following restricted goal in mind: We require accuracy of the analytic model and simulation infrastructure only to the extent that they suggest directions for the programmer to make the appropriate optimizations. This restricted goal sacrifices some accuracy, but makes the tools simpler and easier to use. A programmer can use these tools to design programs with decomposition and distribution specialized to a given machine configuration. By instantiating a few architecture-based parameters, the model can be employed in the performance analysis of data-parallel applications, guiding process generation, communication, and mapping decisions. The model is language-independent and machine-independent; it can be applied to help programmers make decisions about performance-affecting parameters as programs are ported across architectures and languages. Furthermore, the model incorporates both platform-specific and application-specific aspects, and it allows programmers to experiment with tradeoffs better than either strictly simulation-based or purely theoretical models. In addition, the model was designed to be simple. In summary, this thesis outlines a simple method for benchmarking a parallel communication library and for using the results to model the performance of applications developed with that communication library. We use compositional performance analysis - decomposing a parallel program into its modular parts and analyzing their respective performances - to gain perspective on the performance of the whole program. This model is useful for predicting parallel program execution times for different types of program archetypes (e.g., mesh and mesh-spectral), using communication libraries built with different message-passing schemes (e.g., Fortran M and Fortran with MPI) running on different architectures (e.g., IBM SP2 and a network of Pentium personal computers)

    Performance Analysis for Mesh and Mesh-Spectral Archetype Applications

    Get PDF
    This document outlines a simple method for benchmarking a parallel communication library and for using the results to model the performance of applications developed with that communication library. We use compositional performance analysis - decomposing a parallel program into its modular parts and analyzing their respective performances - to gain perspective on the performance of the whole program. This model is useful for predicting parallel program execution times for different types of program archetypes, (e.g., mesh and mesh-spectral) using communication libraries built with different message-passing schemes (e.g., Fortran M and Fortran with MPI) running on different architectures (e.g., IBM SP2 and a network of Pentium personal computers)

    Performance Evaluation of Components Using a Granularity-based Interface Between Real-Time Calculus and Timed Automata

    Get PDF
    To analyze complex and heterogeneous real-time embedded systems, recent works have proposed interface techniques between real-time calculus (RTC) and timed automata (TA), in order to take advantage of the strengths of each technique for analyzing various components. But the time to analyze a state-based component modeled by TA may be prohibitively high, due to the state space explosion problem. In this paper, we propose a framework of granularity-based interfacing to speed up the analysis of a TA modeled component. First, we abstract fine models to work with event streams at coarse granularity. We perform analysis of the component at multiple coarse granularities and then based on RTC theory, we derive lower and upper bounds on arrival patterns of the fine output streams using the causality closure algorithm. Our framework can help to achieve tradeoffs between precision and analysis time.Comment: QAPL 201

    Response-Time Analysis of ROS 2 Processing Chains Under Reservation-Based Scheduling

    Get PDF
    Bounding the end-to-end latency of processing chains in distributed real-time systems is a well-studied problem, relevant in multiple industrial fields, such as automotive systems and robotics. Nonetheless, to date, only little attention has been given to the study of the impact that specific frameworks and implementation choices have on real-time performance. This paper proposes a scheduling model and a response-time analysis for ROS 2 (specifically, version "Crystal Clemmys" released in December 2018), a popular framework for the rapid prototyping, development, and deployment of robotics applications with thousands of professional users around the world. The purpose of this paper is threefold. Firstly, it is aimed at providing to robotic engineers a practical analysis to bound the worst-case response times of their applications. Secondly, it shines a light on current ROS 2 implementation choices from a real-time perspective. Finally, it presents a realistic real-time scheduling model, which provides an opportunity for future impact on the robotics industry

    Towards Predictable and Intelligent Real-time IoT Applications

    Get PDF
    CPS Student Forum Portugal was held as part of the Cyber-Physical Systems Week (CPS Week 2018), 10 to 13 April, Porto-Portugal.Predictable timing property is fundamental to Real-Time applications. ï Real-Time IoT applications requires both predictable and intelligent responses to actionable events ï A previous work proposed fog computing for intelligent IoT applications but does not examine timing properties of the approach.info:eu-repo/semantics/publishedVersio

    Buffer-Aware Worst-Case Timing Analysis of Wormhole NoCs Using Network Calculus

    Get PDF
    Abstract—Conducting worst-case timing analyses for wormhole Networks-on-chip (NoCs) is a fundamental aspect to guarantee real-time requirements, but it is known to be a challenging issue due to complex congestion patterns that can occur. In that respect, we introduce in this paper a new buffer-aware timing analysis of wormhole NoCs based on Network Calculus. Our main idea consists in considering the flows serialization phenomena along the path of a flow of interest (f.o.i), by paying the bursts of interfering flows only at the first convergence point, and refining the interference patterns for the f.o.i accounting for the limited buffer size. Moreover, we aim to handle such an issue for wormhole NoCs, implementing a fixed priority-preemptive arbitration of Virtual Channels (VCs), that can be assigned to an arbitrary number of traffic classes with different priority levels, i.e. VC sharing, and each traffic class may contain an arbitrary number of flows, i.e. priority sharing. It is worth noting that such characteristics cover a large panel of wormhole NoCs. The derived delay bounds are analyzed and compared to available results of existing approaches, based on Scheduling Theory as well as Compositional Performance Analysis (CPA). In doing this, we highlight a noticeable enhancement of the delay bounds tightness in comparison to CPA approach, and the inherent safe bounds of our proposal in comparison to Scheduling Theory approaches. Finally, we perform experiments on a manycore platform, to confront our timing analysis predictions to experimental data and assess its tightness

    Analysis of Memory Latencies in Multi-Processor Systems

    Get PDF
    Predicting timing behavior is key to efficient embedded real-time system design and verification. Current approaches to determine end-to-end latencies in parallel heterogeneous architectures focus on performance analysis either on task or system level. Especially memory accesses, basic operations of embedded application, cannot be accurately captured on a single level alone: While task level methods simplify system behavior, system level methods simplify task behavior. Both perspectives lead to overly pessimistic estimations. To tackle these complex interactions we integrate task and system level analysis. Each analysis level is provided with the necessary data to allow precise computations, while adequate abstraction prevents high time complexity

    Process algebra for performance evaluation

    Get PDF
    This paper surveys the theoretical developments in the field of stochastic process algebras, process algebras where action occurrences may be subject to a delay that is determined by a random variable. A huge class of resource-sharing systems – like large-scale computers, client–server architectures, networks – can accurately be described using such stochastic specification formalisms. The main emphasis of this paper is the treatment of operational semantics, notions of equivalence, and (sound and complete) axiomatisations of these equivalences for different types of Markovian process algebras, where delays are governed by exponential distributions. Starting from a simple actionless algebra for describing time-homogeneous continuous-time Markov chains, we consider the integration of actions and random delays both as a single entity (like in known Markovian process algebras like TIPP, PEPA and EMPA) and as separate entities (like in the timed process algebras timed CSP and TCCS). In total we consider four related calculi and investigate their relationship to existing Markovian process algebras. We also briefly indicate how one can profit from the separation of time and actions when incorporating more general, non-Markovian distributions
    • …
    corecore