653 research outputs found
Performance analysis of networks on chips
Modules on a chip (such as processors and memories) are traditionally connected through a single link, called a bus. As chips become more complex and the number of modules on a chip increases, this connection method becomes inefficient because the bus can only be used by one module at a time. Networks on chips are an emerging technology for the connection of on-chip modules. In networks on chips, switches are used to transmit data from one module to another, which entails that multiple links can be used simultaneously so that communication is more efficient. Switches consist of a number of input ports to which data arrives and output ports from which data leaves. If data at multiple input ports has to be transmitted to the same output port, only one input port may actually transmit its data, which may lead to congestion. Queueing theory deals with the analysis of congestion phenomena caused by competition for service facilities with scarce resources. Such phenomena occur, for example, in traffic intersections, manufacturing systems, and communication networks like networks on chips. These congestion phenomena are typically analysed using stochastic models, which capture the uncertain and unpredictable nature of processes leading to congestion (such as irregular car arrivals to a traffic intersection). Stochastic models are useful tools for the analysis of networks on chips as well, due to the complexity of data traffic on these networks. In this thesis, we therefore study queueing models aimed at networks on chips. The thesis is centred around two key models: A model of a switch in isolation, the so-called single-switch model, and a model of a network of switches where all traffic has the same destination, the so-called network of polling stations. For both models we are interested in the throughput (the amount of data transmitted per time unit) and the mean delay (the time it takes data to travel across the network). Single-switch models are often studied under the assumption that the number of ports tends to infinity and that traffic is uniform (i.e., on average equally many packets arrive to all buffers, and all possible destinations are equally likely). In networks on chips, however, the number of buffers is typically small. We introduce a new approximation specifically aimed at small switches with (memoryless) Bernoulli arrivals. We show that, for such switches, this approximation is more accurate than currently known approximations. As traffic in networks on chips is usually non-uniform, we also extend our approximation to non-uniform switches. The key difference between uniform and nonuniform switches is that in non-uniform switches, all queues have a different maximum throughput. We obtain a very accurate approximation of this throughput, which allows us to extend the mean delay approximation. The extended approximation is derived for Bernoulli arrivals and correlated arrival processes. Its accuracy is verified through a comparison with simulation results. The second key model is that of concentrating tree networks of polling stations (polling stations are essentially switches where all traffic has the same output port as destination). Single polling stations have been studied extensively in literature, but only few attempts have been made to analyse networks of polling stations. We establish a reduction theorem that states that networks of polling stations can be reduced to single polling stations while preserving some information on mean waiting times. This reduction theorem holds under the assumption that the last node of the network uses a so-called HoL-based service discipline, which means that the choice to transmit data from a certain buffer may only depend on which buffers are empty, but not on the amount of data in the buffers. The reduction theorem is a key tool for the analysis of networks of polling stations. In addition to this, mean waiting times in single polling stations have to be calculated, either exactly or approximately. To this end, known results can be used, but we also devise a new single-station approximation that can be used for a large subclass of HoL-based service disciplines. Finally, networks on chips typically implement flow control, which is a mechanism that limits the amount of data in the network from one source. We analyse the division of throughput over several sources in a network of polling stations with flow control. Our results indicate that the throughput in such a network is determined by an interaction between buffer sizes, flow control limits, and service disciplines. This interaction is studied in more detail by means of a numerical analysis
Characterization of the Burst Stabilization Protocol for the RR/RR CICQ Switch
Input buffered switches with Virtual Output Queueing (VOQ) can be unstable
when presented with unbalanced loads. Existing scheduling algorithms, including
iSLIP for Input Queued (IQ) switches and Round Robin (RR) for Combined Input
and Crossbar Queued (CICQ) switches, exhibit instability for some schedulable
loads. We investigate the use of a queue length threshold and bursting
mechanism to achieve stability without requiring internal speed-up. An
analytical model is developed to prove that the burst stabilization protocol
achieves stability and to predict the minimum burst value needed as a function
of offered load. The analytical model is shown to have very good agreement with
simulation results. These results show the advantage of the RR/RR CICQ switch
as a contender for the next generation of high-speed switches.Comment: Presented at the 28th Annual IEEE Conference on Local Computer
Networks (LCN), Bonn/Konigswinter, Germany, Oct 20-24, 200
Approximation of discrete-time polling systems via structured Markov chains
We devise an approximation of the marginal queue length distribution in discrete-time polling systems with batch arrivals and fixed packet sizes. The polling server uses the Bernoulli service discipline and Markovian routing. The 1-limited and exhaustive service disciplines are special cases of the Bernoulli service discipline, and traditional cyclic routing is a special case of Markovian routing. The key step of our approximation is the translation of the polling system to a structured Markov chain, while truncating all but one queue. Numerical experiments show that the approximation is very accurate in general. Our study is motivated by networks on chips with multiple masters (e.g., processors) sharing a single slave (e.g., memory)
Energy-Delay Tradeoff and Dynamic Sleep Switching for Bluetooth-Like Body-Area Sensor Networks
Wireless technology enables novel approaches to healthcare, in particular the
remote monitoring of vital signs and other parameters indicative of people's
health. This paper considers a system scenario relevant to such applications,
where a smart-phone acts as a data-collecting hub, gathering data from a number
of wireless-capable body sensors, and relaying them to a healthcare provider
host through standard existing cellular networks. Delay of critical data and
sensors' energy efficiency are both relevant and conflicting issues. Therefore,
it is important to operate the wireless body-area sensor network at some
desired point close to the optimal energy-delay tradeoff curve. This tradeoff
curve is a function of the employed physical-layer protocol: in particular, it
depends on the multiple-access scheme and on the coding and modulation schemes
available. In this work, we consider a protocol closely inspired by the
widely-used Bluetooth standard. First, we consider the calculation of the
minimum energy function, i.e., the minimum sum energy per symbol that
guarantees the stability of all transmission queues in the network. Then, we
apply the general theory developed by Neely to develop a dynamic scheduling
policy that approaches the optimal energy-delay tradeoff for the network at
hand. Finally, we examine the queue dynamics and propose a novel policy that
adaptively switches between connected and disconnected (sleeping) modes. We
demonstrate that the proposed policy can achieve significant gains in the
realistic case where the control "NULL" packets necessary to maintain the
connection alive, have a non-zero energy cost, and the data arrival statistics
corresponding to the sensed physical process are bursty.Comment: Extended version (with proofs details in the Appendix) of a paper
accepted for publication on the IEEE Transactions on Communication
A pseudoconservation law for a time-limited service polling system with structured batch poisson arrivals
AbstractWe consider a cyclic-service queueing system (polling system) with time-limited service, in which the length of a service period for each queue is controlled by a timer, i.e., the server serves customers until the timer expires or the queue becomes empty, whichever occurs first, and then proceeds to the next queue. The customer whose service is interrupted due to the timer expiration is attended according to the nonpreemptive service discipline. For the cyclic-service system with structured batch Poisson arrivals (Mx/G/1) and an exponential timer, we derive a pseudoconservation law and an exact mean waiting time formula for the symmetric system
Event Stream Processing with Multiple Threads
Current runtime verification tools seldom make use of multi-threading to
speed up the evaluation of a property on a large event trace. In this paper, we
present an extension to the BeepBeep 3 event stream engine that allows the use
of multiple threads during the evaluation of a query. Various parallelization
strategies are presented and described on simple examples. The implementation
of these strategies is then evaluated empirically on a sample of problems.
Compared to the previous, single-threaded version of the BeepBeep engine, the
allocation of just a few threads to specific portions of a query provides
dramatic improvement in terms of running time
Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors
Asymmetric multicore processors (AMPs) have recently emerged as an appealing
technology for severely energy-constrained environments, especially in mobile
appliances where heterogeneity in applications is mainstream. In addition,
given the growing interest for low-power high performance computing, this type
of architectures is also being investigated as a means to improve the
throughput-per-Watt of complex scientific applications.
In this paper, we design and embed several architecture-aware optimizations
into a multi-threaded general matrix multiplication (gemm), a key operation of
the BLAS, in order to obtain a high performance implementation for ARM
big.LITTLE AMPs. Our solution is based on the reference implementation of gemm
in the BLIS library, and integrates a cache-aware configuration as well as
asymmetric--static and dynamic scheduling strategies that carefully tune and
distribute the operation's micro-kernels among the big and LITTLE cores of the
target processor. The experimental results on a Samsung Exynos 5422, a
system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the
big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric
scheduling attain important gains in performance with respect to its
architecture-oblivious counterparts while exploiting all the resources of the
AMP to deliver considerable energy efficiency
Polling models with multi-phase gated service
In this paper we introduce and analyze a new class of service policies called multi-phase
gated service. This policy is a generalization of the classical single-phase and two-phase gated
policies and works as follows. Each customer that arrives at queue i will have to wait K_i
cycles before it receives service. The aim of this policy is to provide an interleaving scheme
to avoid monopolization of the system by heavily loaded queues, by choosing the proper
values of interleaving levels Ki. In this paper, we analyze the effectiveness of the interleaving
scheme on the queueing behavior of the system, and consider the problem of identifying the
proper combination of interleaving levels (K_1,...,K_N) that minimizes a weighted sum
of the mean waiting times at each of the N queues. Obviously, the proper choice of the
interleaving levels is most critical when the system is heavily loaded. For this reason, we
to obtain closed-form expressions for the
asymptotic waiting-time distributions in heavy trafficc, and use these expressions to derive
simple heuristics for approximating the optimal interleaving scheme. Numerical results
with simulations demonstrate that the accuracy of these approximations is extremely high
- …