5,947 research outputs found

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

    Get PDF
    Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

    An accurate analysis for guaranteed performance of multiprocessor streaming applications

    Get PDF
    Already for more than a decade, consumer electronic devices have been available for entertainment, educational, or telecommunication tasks based on multimedia streaming applications, i.e., applications that process streams of audio and video samples in digital form. Multimedia capabilities are expected to become more and more commonplace in portable devices. This leads to challenges with respect to cost efficiency and quality. This thesis contributes models and analysis techniques for improving the cost efficiency, and therefore also the quality, of multimedia devices. Portable consumer electronic devices should feature flexible functionality on the one hand and low power consumption on the other hand. Those two requirements are conflicting. Therefore, we focus on a class of hardware that represents a good trade-off between those two requirements, namely on domain-specific multiprocessor systems-on-chip (MP-SoC). Our research work contributes to dynamic (i.e., run-time) optimization of MP-SoC system metrics. The central question in this area is how to ensure that real-time constraints are satisfied and the metric of interest such as perceived multimedia quality or power consumption is optimized. In these cases, we speak of quality-of-service (QoS) and power management, respectively. In this thesis, we pursue real-time constraint satisfaction that is guaranteed by the system by construction and proven mainly based on analytical reasoning. That approach is often taken in real-time systems to ensure reliable performance. Therefore the performance analysis has to be conservative, i.e. it has to use pessimistic assumptions on the unknown conditions that can negatively influence the system performance. We adopt this hypothesis as the foundation of this work. Therefore, the subject of this thesis is the analysis of guaranteed performance for multimedia applications running on multiprocessors. It is very important to note that our conservative approach is essentially different from considering only the worst-case state of the system. Unlike the worst-case approach, our approach is dynamic, i.e. it makes use of run-time characteristics of the input data and the environment of the application. The main purpose of our performance analysis method is to guide the run-time optimization. Typically, a resource or quality manager predicts the execution time, i.e., the time it takes the system to process a certain number of input data samples. When the execution times get smaller, due to dependency of the execution time on the input data, the manager can switch the control parameter for the metric of interest such that the metric improves but the system gets slower. For power optimization, that means switching to a low-power mode. If execution times grow, the manager can set parameters so that the system gets faster. For QoS management, for example, the application can be switched to a different quality mode with some degradation in perceived quality. The real-time constraints are then never violated and the metrics of interest are kept as good as possible. Unfortunately, maintaining system metrics such as power and quality at the optimal level contradicts with our main requirement, i.e., providing performance guarantees, because for this one has to give up some quality or power consumption. Therefore, the performance analysis approach developed in this thesis is not only conservative, but also accurate, so that the optimization of the metric of interest does not suffer too much from conservativity. This is not trivial to realize when two factors are combined: parallel execution on multiple processors and dynamic variation of the data-dependent execution delays. We achieve the goal of conservative and accurate performance estimation for an important class of multiprocessor platforms and multimedia applications. Our performance analysis technique is realizable in practice in QoS or power management setups. We consider a generic MP-SoC platform that runs a dynamic set of applications, each application possibly using multiple processors. We assume that the applications are independent, although it is possible to relax this requirement in the future. To support real-time constraints, we require that the platform can provide guaranteed computation, communication and memory budgets for applications. Following important trends in system-on-chip communication, we support both global buses and networks-on-chip. We represent every application as a homogeneous synchronous dataflow (HSDF) graph, where the application tasks are modeled as graph nodes, called actors. We allow dynamic datadependent actor execution delays, which makes HSDF graphs very useful to express modern streaming applications. Our reason to consider HSDF graphs is that they provide a good basic foundation for analytical performance estimation. In this setup, this thesis provides three major contributions: 1. Given an application mapped to an MP-SoC platform, given the performance guarantees for the individual computation units (the processors) and the communication unit (the network-on-chip), and given constant actor execution delays, we derive the throughput and the execution time of the system as a whole. 2. Given a mapped application and platform performance guarantees as in the previous item, we extend our approach for constant actor execution delays to dynamic datadependent actor delays. 3. We propose a global implementation trajectory that starts from the application specification and goes through design-time and run-time phases. It uses an extension of the HSDF model of computation to reflect the design decisions made along the trajectory. We present our model and trajectory not only to put the first two contributions into the right context, but also to present our vision on different parts of the trajectory, to make a complete and consistent story. Our first contribution uses the idea of so-called IPC (inter-processor communication) graphs known from the literature, whereby a single model of computation (i.e., HSDF graphs) are used to model not only the computation units, but also the communication unit (the global bus or the network-on-chip) and the FIFO (first-in-first-out) buffers that form a ‘glue’ between the computation and communication units. We were the first to propose HSDF graph structures for modeling bounded FIFO buffers and guaranteed throughput network connections for the network-on-chip communication in MP-SoCs. As a result, our HSDF models enable the formalization of the on-chip FIFO buffer capacity minimization problem under a throughput constraint as a graph-theoretic problem. Using HSDF graphs to formalize that problem helps to find the performance bottlenecks in a given solution to this problem and to improve this solution. To demonstrate this, we use the JPEG decoder application case study. Also, we show that, assuming constant – worst-case for the given JPEG image – actor delays, we can predict execution times of JPEG decoding on two processors with an accuracy of 21%. Our second contribution is based on an extension of the scenario approach. This approach is based on the observation that the dynamic behavior of an application is typically composed of a limited number of sub-behaviors, i.e., scenarios, that have similar resource requirements, i.e., similar actor execution delays in the context of this thesis. The previous work on scenarios treats only single-processor applications or multiprocessor applications that do not exploit all the flexibility of the HSDF model of computation. We develop new scenario-based techniques in the context of HSDF graphs, to derive the timing overlap between different scenarios, which is very important to achieve good accuracy for general HSDF graphs executing on multiprocessors. We exploit this idea in an application case study – the MPEG-4 arbitrarily-shaped video decoder, and demonstrate execution time prediction with an average accuracy of 11%. To the best of our knowledge, for the given setup, no other existing performance technique can provide a comparable accuracy and at the same time performance guarantees

    The earlier the better: a theory of timed actor interfaces

    Get PDF
    Programming embedded and cyber-physical systems requires attention not only to functional behavior and correctness, but also to non-functional aspects and specifically timing and performance. A structured, compositional, model-based approach based on stepwise refinement and abstraction techniques can support the development process, increase its quality and reduce development time through automation of synthesis, analysis or verification. Toward this, we introduce a theory of timed actors whose notion of refinement is based on the principle of worst-case design that permeates the world of performance-critical systems. This is in contrast with the classical behavioral and functional refinements based on restricting sets of behaviors. Our refinement allows time-deterministic abstractions to be made of time-non-deterministic systems, improving efficiency and reducing complexity of formal analysis. We show how our theory relates to, and can be used to reconcile existing time and performance models and their established theories

    A performance analysis of the PASLIB version 2.1X SEND and RECV routines on the finite element machine

    Get PDF
    The Finite Element Machine is an experimental array processor designed to support research in parallel algorithms and architectures. This report presents a case study of communications using the SENDa and RECV system software routines on the Finite Element Machine, followed by a discussion of the effect of I/O performance on the efficiency of parallel algorithms

    Proceedings of the first international workshop on Investigating dataflow in embedded computing architectures (IDEA 2015), January 21, 2015, Amsterdam, The Netherlands

    Get PDF
    IDEA '15 held at HiPEAC 2015, Amsterdam, The Netherlands on January 21st, 2015 is the rst workshop on Investigating Data ow in Embedded computing Architectures. This technical report comprises of the proceedings of IDEA '15. Over the years, data ow has been gaining popularity among Embedded Systems researchers around Europe and the world. However, research on data ow is limited to small pockets in dierent communities without a common forum for discussion. The goal of the workshop was to provide a platform to researchers and practitioners to present work on modelling and analysis of present and future high performance embedded computing architectures using data ow. Despite being the rst edition of the workshop, it was very pleasant to see a total of 14 submissions, out of which 6 papers were selected following a thorough reviewing process. All the papers were reviewed by at least 5 reviewers. This workshop could not have become a reality without the help of a Technical Program Committee (TPC). The TPC members not only did the hard work to give helpful reviews in time, but also participated in extensive discussion following the reviewing process, leading to an excellent workshop program and very valuable feedback to authors. Likewise, the Organisation Committee also deserves acknowledgment to make this workshop a successful event. We take this opportunity to thank everyone who contributed in making this workshop a success

    Proceedings of the first international workshop on Investigating dataflow in embedded computing architectures (IDEA 2015), January 21, 2015, Amsterdam, The Netherlands

    Get PDF
    IDEA '15 held at HiPEAC 2015, Amsterdam, The Netherlands on January 21st, 2015 is the rst workshop on Investigating Data ow in Embedded computing Architectures. This technical report comprises of the proceedings of IDEA '15. Over the years, data ow has been gaining popularity among Embedded Systems researchers around Europe and the world. However, research on data ow is limited to small pockets in dierent communities without a common forum for discussion. The goal of the workshop was to provide a platform to researchers and practitioners to present work on modelling and analysis of present and future high performance embedded computing architectures using data ow. Despite being the rst edition of the workshop, it was very pleasant to see a total of 14 submissions, out of which 6 papers were selected following a thorough reviewing process. All the papers were reviewed by at least 5 reviewers. This workshop could not have become a reality without the help of a Technical Program Committee (TPC). The TPC members not only did the hard work to give helpful reviews in time, but also participated in extensive discussion following the reviewing process, leading to an excellent workshop program and very valuable feedback to authors. Likewise, the Organisation Committee also deserves acknowledgment to make this workshop a successful event. We take this opportunity to thank everyone who contributed in making this workshop a success

    Performance Bounds for Synchronized Queueing Networks

    Get PDF
    Las redes de Petri estocásticas constituyen un modelo unificado de las diferentes extensiones de redes de colas con sincronizaciones existentes en la literatura, válido para el diseño y análisis de prestaciones de sistemas informáticos distribuidos. En este trabajo se proponen técnicas de cálculo de cotas superiores e inferiores de las prestaciones de redes de Petri estocásticas en estado estacionario. Las cotas obtenidas son calculables en tiempo polinómico en el tamaño del modelo, por medio de la resolución de ciertos problemas de programación lineal definidos a partir de la matriz de incidencia de la red (en este sentido, las técnicas desarrolladas pueden considerarse estructurales). Las cotas calculadas dependen sólamente de los valores medios de las variables aleatorias que describen la temporización del sistema, y son independientes de los momentos de mayor orden. Esta independencia de la forma de las distribuciones de probabilidad asociadas puede considerarse como una útil generalización de otros resultados existentes para distribuciones particulares, puesto que los momentos de orden superior son, habitualmente, desconocidos en la realidad y difíciles de estimar. Finalmente, las técnicas desarrolladas se aplican al análisis de diferentes ejemplos tomados de la literatura sobre sistemas informáticos distribuidos y sistemas de fabricación. ******* Product form queueing networks have long been used for the performance evaluation of computer systems. Their success has been due to their capability of naturally expressing sharing of resources and queueing, that are typical situations of traditional computer systems, as well as to their efficient solution algorithms, of polynomial complexity on the size of the model. Unfortunately, the introduction of synchronization constraints usually destroys the product form solution, so that general concurrent and distributed systems are not easily studied with this class of models. Petri nets have been proved specially adequate to model parallel and distributed systems. Moreover, they have a well-founded theory of analysis that allows to investigate a great number of qualitative properties of the system. In the original definition, Petri nets did not include the notion of time, and tried to model only the logical behaviour of systems by describing the causal relations existing among events. This approach showed its power in the specification and analysis of concurrent systems in a way independent of the concept of time. Nevertheless the introduction of a timing specification is essential if we want to use this class of models for the performance evaluation of distributed systems. One of the main problems in the actual use of timed and stochastic Petri net models for the quantitative evaluation of large systems is the explosion of the computational complexity of the analysis algorithms. In general, exact performance results are obtained from the numerical solution of a continuous time Markov chain, whose dimension is given by the size of the state space of the model. Structural computation of exact performance measures has been possible for some subclasses of nets such as those with state machine topology. These nets, under certain assumptions on the stochastic interpretation are isomorphic to Gordon and Newell's networks, in queueing theory terminology. In the general case, efficient methods for the derivation of performance measures are still needed. Two complementary approaches to the derivation of exact measures for the analysis of distributed systems are the utilization of approximation techniques and the computation of bounds. Approximate values for the performance parameters are in general more efficiently derived than the exact ones. On the other hand, "exactness" only exists in theory! In other words, numerical algorithms must be applied in practice for the computation of exact values, therefore making errors is inevitable. Performance bounds are useful in the preliminary phases of the design of a system, in which many parameters are not known accurately. Several alternatives for those parameters should be quickly evaluated, and rejected those that are clearly bad. Exact (and even approximate) solutions would be computationally very expensive. Bounds become useful in these instances since they usually require much less computation effort. The computation of upper and lower bounds for the steady-state performance of timed and stochastic Petri nets is considered in this work. In particular, we study the throughput of transitions, defined as the average number of firings per time unit. For this measure we try to compute upper and lower bounds in polynomial time on the size of the net model, by means of proper linear programming problems defined from the incidence matrix of the net (in this sense, we develop structural techniques). These bounds depend only on the mean values and not on the higher moments of the probability distribution functions of the random variables that describe the timing of the system. The independence of the probability distributions can be viewed as a useful generalization of the performance results, since higher moments of the delays are usually unknown for real cases, and difficult to estimate and assess. From a different perspective, the obtained results can be applied to the analysis of queueing networks extended with some synchronization schemes. Monoclass queueing networks can be mapped on stochastic Petri nets. On the other hand, stochastic Petri nets can be interpreted as monoclass queueing networks augmented with synchronization primitives. Concerning the presentation of this manuscript, it should be mentioned that chapter 1 has been written with the object of giving the reader an outline of the stochastic Petri net model: its definition, terminology, basic properties, and related concepts, together with its deep relation with other classic stochastic network models. Chapter 2 is devoted to the presentation of the net subclasses considered in the rest of the work. The classification presented here is quite different from the one which is usual in the framework of Petri nets. The reason lies on the fact that our classification criterion, the computability of visit ratios for transitions, is introduced for the first time in the field of stochastic Petri nets in this work. The significance of that criterion is based on the important role that the visit ratios play in the computation of upper and lower bounds for the performance of the models. Nevertheless, classical important net subclasses are identified here in terms of the computability of their visit ratios from different parameters of the model. Chapter 3 is concerned with the computation of reachable upper and lower bounds for the most restrictive subclass of those presented in chapter 2: marked graphs. The explanation of this fact is easy to understand. The more simple is the model the more accessible will be the techniques an ideas for the development of good results. Chapter 4 provides a generalization for live and bounded free choice nets of the results presented in the previous chapter. Quality of obtained bounds is similar to that for strongly connected marked graphs: throughput lower bounds are reachable for bounded nets while upper bounds are reachable for 1-bounded nets. Chapter 5 considers the extension to other net subclasses, like mono-T-semiflow nets, FRT-nets, totally open deterministic systems of sequential processes, and persistent nets. The results are of diverse colours. For mono-T-semiflow nets and, therefore, for general FRT-nets, it is not possible (so far) to obtain reachable throughput bounds. On the other hand, for bounded ordinary persistent nets, tight throughput upper bounds are derived. Moreover, in the case of totally open deterministic systems of sequential processes the exact steady-state performance measures can be computed in polynomial time on the net size. In chapter 6 bounds for other interesting performance measures are derived from throughput bounds and from classical queueing theory laws. After that, we explore the introduction of more information from the probability distribution functions of service times in order to improve the bounds. In particular, for Coxian service delay of transitions it is possible to improve the throughput upper bounds of previous chapters which held for more general forms of distribution functions. This improvement shows to be specially fruitful for live and bounded free choice nets. Chapter 7 is devoted to case studies. Several examples taken from literature in the fields of distributed computing systems and manufacturing systems are modelled by means of stochastic Petri nets and evaluated using the techniques developed in previous chapters. Finally, some concluding remarks and considerations on possible extensions of the work are presented
    corecore