653 research outputs found

    Performance analysis of networks on chips

    Get PDF
    Modules on a chip (such as processors and memories) are traditionally connected through a single link, called a bus. As chips become more complex and the number of modules on a chip increases, this connection method becomes inefficient because the bus can only be used by one module at a time. Networks on chips are an emerging technology for the connection of on-chip modules. In networks on chips, switches are used to transmit data from one module to another, which entails that multiple links can be used simultaneously so that communication is more efficient. Switches consist of a number of input ports to which data arrives and output ports from which data leaves. If data at multiple input ports has to be transmitted to the same output port, only one input port may actually transmit its data, which may lead to congestion. Queueing theory deals with the analysis of congestion phenomena caused by competition for service facilities with scarce resources. Such phenomena occur, for example, in traffic intersections, manufacturing systems, and communication networks like networks on chips. These congestion phenomena are typically analysed using stochastic models, which capture the uncertain and unpredictable nature of processes leading to congestion (such as irregular car arrivals to a traffic intersection). Stochastic models are useful tools for the analysis of networks on chips as well, due to the complexity of data traffic on these networks. In this thesis, we therefore study queueing models aimed at networks on chips. The thesis is centred around two key models: A model of a switch in isolation, the so-called single-switch model, and a model of a network of switches where all traffic has the same destination, the so-called network of polling stations. For both models we are interested in the throughput (the amount of data transmitted per time unit) and the mean delay (the time it takes data to travel across the network). Single-switch models are often studied under the assumption that the number of ports tends to infinity and that traffic is uniform (i.e., on average equally many packets arrive to all buffers, and all possible destinations are equally likely). In networks on chips, however, the number of buffers is typically small. We introduce a new approximation specifically aimed at small switches with (memoryless) Bernoulli arrivals. We show that, for such switches, this approximation is more accurate than currently known approximations. As traffic in networks on chips is usually non-uniform, we also extend our approximation to non-uniform switches. The key difference between uniform and nonuniform switches is that in non-uniform switches, all queues have a different maximum throughput. We obtain a very accurate approximation of this throughput, which allows us to extend the mean delay approximation. The extended approximation is derived for Bernoulli arrivals and correlated arrival processes. Its accuracy is verified through a comparison with simulation results. The second key model is that of concentrating tree networks of polling stations (polling stations are essentially switches where all traffic has the same output port as destination). Single polling stations have been studied extensively in literature, but only few attempts have been made to analyse networks of polling stations. We establish a reduction theorem that states that networks of polling stations can be reduced to single polling stations while preserving some information on mean waiting times. This reduction theorem holds under the assumption that the last node of the network uses a so-called HoL-based service discipline, which means that the choice to transmit data from a certain buffer may only depend on which buffers are empty, but not on the amount of data in the buffers. The reduction theorem is a key tool for the analysis of networks of polling stations. In addition to this, mean waiting times in single polling stations have to be calculated, either exactly or approximately. To this end, known results can be used, but we also devise a new single-station approximation that can be used for a large subclass of HoL-based service disciplines. Finally, networks on chips typically implement flow control, which is a mechanism that limits the amount of data in the network from one source. We analyse the division of throughput over several sources in a network of polling stations with flow control. Our results indicate that the throughput in such a network is determined by an interaction between buffer sizes, flow control limits, and service disciplines. This interaction is studied in more detail by means of a numerical analysis

    Characterization of the Burst Stabilization Protocol for the RR/RR CICQ Switch

    Full text link
    Input buffered switches with Virtual Output Queueing (VOQ) can be unstable when presented with unbalanced loads. Existing scheduling algorithms, including iSLIP for Input Queued (IQ) switches and Round Robin (RR) for Combined Input and Crossbar Queued (CICQ) switches, exhibit instability for some schedulable loads. We investigate the use of a queue length threshold and bursting mechanism to achieve stability without requiring internal speed-up. An analytical model is developed to prove that the burst stabilization protocol achieves stability and to predict the minimum burst value needed as a function of offered load. The analytical model is shown to have very good agreement with simulation results. These results show the advantage of the RR/RR CICQ switch as a contender for the next generation of high-speed switches.Comment: Presented at the 28th Annual IEEE Conference on Local Computer Networks (LCN), Bonn/Konigswinter, Germany, Oct 20-24, 200

    Approximation of discrete-time polling systems via structured Markov chains

    Get PDF
    We devise an approximation of the marginal queue length distribution in discrete-time polling systems with batch arrivals and fixed packet sizes. The polling server uses the Bernoulli service discipline and Markovian routing. The 1-limited and exhaustive service disciplines are special cases of the Bernoulli service discipline, and traditional cyclic routing is a special case of Markovian routing. The key step of our approximation is the translation of the polling system to a structured Markov chain, while truncating all but one queue. Numerical experiments show that the approximation is very accurate in general. Our study is motivated by networks on chips with multiple masters (e.g., processors) sharing a single slave (e.g., memory)

    Effect of Switchover Time in Cyclically Switched Systems

    Get PDF

    Energy-Delay Tradeoff and Dynamic Sleep Switching for Bluetooth-Like Body-Area Sensor Networks

    Full text link
    Wireless technology enables novel approaches to healthcare, in particular the remote monitoring of vital signs and other parameters indicative of people's health. This paper considers a system scenario relevant to such applications, where a smart-phone acts as a data-collecting hub, gathering data from a number of wireless-capable body sensors, and relaying them to a healthcare provider host through standard existing cellular networks. Delay of critical data and sensors' energy efficiency are both relevant and conflicting issues. Therefore, it is important to operate the wireless body-area sensor network at some desired point close to the optimal energy-delay tradeoff curve. This tradeoff curve is a function of the employed physical-layer protocol: in particular, it depends on the multiple-access scheme and on the coding and modulation schemes available. In this work, we consider a protocol closely inspired by the widely-used Bluetooth standard. First, we consider the calculation of the minimum energy function, i.e., the minimum sum energy per symbol that guarantees the stability of all transmission queues in the network. Then, we apply the general theory developed by Neely to develop a dynamic scheduling policy that approaches the optimal energy-delay tradeoff for the network at hand. Finally, we examine the queue dynamics and propose a novel policy that adaptively switches between connected and disconnected (sleeping) modes. We demonstrate that the proposed policy can achieve significant gains in the realistic case where the control "NULL" packets necessary to maintain the connection alive, have a non-zero energy cost, and the data arrival statistics corresponding to the sensed physical process are bursty.Comment: Extended version (with proofs details in the Appendix) of a paper accepted for publication on the IEEE Transactions on Communication

    A pseudoconservation law for a time-limited service polling system with structured batch poisson arrivals

    Get PDF
    AbstractWe consider a cyclic-service queueing system (polling system) with time-limited service, in which the length of a service period for each queue is controlled by a timer, i.e., the server serves customers until the timer expires or the queue becomes empty, whichever occurs first, and then proceeds to the next queue. The customer whose service is interrupted due to the timer expiration is attended according to the nonpreemptive service discipline. For the cyclic-service system with structured batch Poisson arrivals (Mx/G/1) and an exponential timer, we derive a pseudoconservation law and an exact mean waiting time formula for the symmetric system

    Event Stream Processing with Multiple Threads

    Full text link
    Current runtime verification tools seldom make use of multi-threading to speed up the evaluation of a property on a large event trace. In this paper, we present an extension to the BeepBeep 3 event stream engine that allows the use of multiple threads during the evaluation of a query. Various parallelization strategies are presented and described on simple examples. The implementation of these strategies is then evaluated empirically on a sample of problems. Compared to the previous, single-threaded version of the BeepBeep engine, the allocation of just a few threads to specific portions of a query provides dramatic improvement in terms of running time

    Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

    Get PDF
    Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric--static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the target processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency

    Polling models with multi-phase gated service

    Get PDF
    In this paper we introduce and analyze a new class of service policies called multi-phase gated service. This policy is a generalization of the classical single-phase and two-phase gated policies and works as follows. Each customer that arrives at queue i will have to wait K_i cycles before it receives service. The aim of this policy is to provide an interleaving scheme to avoid monopolization of the system by heavily loaded queues, by choosing the proper values of interleaving levels Ki. In this paper, we analyze the effectiveness of the interleaving scheme on the queueing behavior of the system, and consider the problem of identifying the proper combination of interleaving levels (K_1,...,K_N) that minimizes a weighted sum of the mean waiting times at each of the N queues. Obviously, the proper choice of the interleaving levels is most critical when the system is heavily loaded. For this reason, we to obtain closed-form expressions for the asymptotic waiting-time distributions in heavy trafficc, and use these expressions to derive simple heuristics for approximating the optimal interleaving scheme. Numerical results with simulations demonstrate that the accuracy of these approximations is extremely high
    • …
    corecore