12,154 research outputs found

    A load-sharing architecture for high performance optimistic simulations on multi-core machines

    Get PDF
    In Parallel Discrete Event Simulation (PDES), the simulation model is partitioned into a set of distinct Logical Processes (LPs) which are allowed to concurrently execute simulation events. In this work we present an innovative approach to load-sharing on multi-core/multiprocessor machines, targeted at the optimistic PDES paradigm, where LPs are speculatively allowed to process simulation events with no preventive verification of causal consistency, and actual consistency violations (if any) are recovered via rollback techniques. In our approach, each simulation kernel instance, in charge of hosting and executing a specific set of LPs, runs a set of worker threads, which can be dynamically activated/deactivated on the basis of a distributed algorithm. The latter relies in turn on an analytical model that provides indications on how to reassign processor/core usage across the kernels in order to handle the simulation workload as efficiently as possible. We also present a real implementation of our load-sharing architecture within the ROme OpTimistic Simulator (ROOT-Sim), namely an open-source C-based simulation platform implemented according to the PDES paradigm and the optimistic synchronization approach. Experimental results for an assessment of the validity of our proposal are presented as well

    Load sharing for optimistic parallel simulations on multicore machines

    Get PDF
    Parallel Discrete Event Simulation (PDES) is based on the partitioning of the simulation model into distinct Logical Processes (LPs), each one modeling a portion of the entire system, which are allowed to execute simulation events concurrently. This allows exploiting parallel computing architectures to speedup model execution, and to make very large models tractable. In this article we cope with the optimistic approach to PDES, where LPs are allowed to concurrently process their events in a speculative fashion, and rollback/ recovery techniques are used to guarantee state consistency in case of causality violations along the speculative execution path. Particularly, we present an innovative load sharing approach targeted at optimizing resource usage for fruitful simulation work when running an optimistic PDES environment on top of multi-processor/multi-core machines. Beyond providing the load sharing model, we also define a load sharing oriented architectural scheme, based on a symmetric multi-threaded organization of the simulation platform. Finally, we present a real implementation of the load sharing architecture within the open source ROme OpTimistic Simulator (ROOT-Sim) package. Experimental data for an assessment of both viability and effectiveness of our proposal are presented as well. Copyright is held by author/owner(s)

    Dynamic Bandwidth Allocation in Heterogeneous OFDMA-PONs Featuring Intelligent LTE-A Traffic Queuing

    Get PDF
    This work was supported by the ACCORDANCE project, through the 7th ICT Framework Programme. This is an Accepted Manuscript of an article accepted for publication in Journal of Lightwave Technology following peer review. © 2014 IEEE Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.A heterogeneous, optical/wireless dynamic bandwidth allocation framework is presented, exhibiting intelligent traffic queuing for practically controlling the quality-of-service (QoS) of mobile traffic, backhauled via orthogonal frequency division multiple access–PON (OFDMA-PON) networks. A converged data link layer is presented between long term evolution-advanced (LTE-A) and next-generation passive optical network (NGPON) topologies, extending beyond NGPON2. This is achieved by incorporating in a new protocol design, consistent mapping of LTE-A QCIs and OFDMA-PON queues. Novel inter-ONU algorithms have been developed, based on the distribution of weights to allocate subcarriers to both enhanced node B/optical network units (eNB/ONUs) and residential ONUs, sharing the same infrastructure. A weighted, intra-ONU scheduling mechanism is also introduced to control further the QoS across the network load. The inter and intra-ONU algorithms are both dynamic and adaptive, providing customized solutions to bandwidth allocation for different priority queues at different network traffic loads exhibiting practical fairness in bandwidth distribution. Therefore, middle and low priority packets are not unjustifiably deprived in favor of high priority packets at low network traffic loads. Still the protocol adaptability allows the high priority queues to automatically over perform when the traffic load has increased and the available bandwidth needs to be rationally redistributed. Computer simulations have confirmed that following the application of adaptive weights the fairness index of the new scheme (representing the achieved throughput for each queue), has improved across the traffic load to above 0.9. Packet delay reduction of more than 40ms has been recorded as a result for the low priority queues, while high priories still achieve sufficiently low packet delays in the range of 20 to 30msPeer reviewe

    Power series approximations for two-class generalized processor sharing systems

    Get PDF
    We develop power series approximations for a discrete-time queueing system with two parallel queues and one processor. If both queues are nonempty, a customer of queue 1 is served with probability beta, and a customer of queue 2 is served with probability 1-beta. If one of the queues is empty, a customer of the other queue is served with probability 1. We first describe the generating function U(z (1),z (2)) of the stationary queue lengths in terms of a functional equation, and show how to solve this using the theory of boundary value problems. Then, we propose to use the same functional equation to obtain a power series for U(z (1),z (2)) in beta. The first coefficient of this power series corresponds to the priority case beta=0, which allows for an explicit solution. All higher coefficients are expressed in terms of the priority case. Accurate approximations for the mean stationary queue lengths are obtained from combining truncated power series and Pad, approximation

    The effective bandwidth problem revisited

    Full text link
    The paper studies a single-server queueing system with autonomous service and â„“\ell priority classes. Arrival and departure processes are governed by marked point processes. There are â„“\ell buffers corresponding to priority classes, and upon arrival a unit of the kkth priority class occupies a place in the kkth buffer. Let N(k)N^{(k)}, k=1,2,...,â„“k=1,2,...,\ell denote the quota for the total kkth buffer content. The values N(k)N^{(k)} are assumed to be large, and queueing systems both with finite and infinite buffers are studied. In the case of a system with finite buffers, the values N(k)N^{(k)} characterize buffer capacities. The paper discusses a circle of problems related to optimization of performance measures associated with overflowing the quota of buffer contents in particular buffers models. Our approach to this problem is new, and the presentation of our results is simple and clear for real applications.Comment: 29 pages, 11pt, Final version, that will be published as is in Stochastic Model

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    Lock-free Concurrent Data Structures

    Full text link
    Concurrent data structures are the data sharing side of parallel programming. Data structures give the means to the program to store data, but also provide operations to the program to access and manipulate these data. These operations are implemented through algorithms that have to be efficient. In the sequential setting, data structures are crucially important for the performance of the respective computation. In the parallel programming setting, their importance becomes more crucial because of the increased use of data and resource sharing for utilizing parallelism. The first and main goal of this chapter is to provide a sufficient background and intuition to help the interested reader to navigate in the complex research area of lock-free data structures. The second goal is to offer the programmer familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computin
    • …
    corecore