23,957 research outputs found

    Load Balancing via Random Local Search in Closed and Open systems

    Full text link
    In this paper, we analyze the performance of random load resampling and migration strategies in parallel server systems. Clients initially attach to an arbitrary server, but may switch server independently at random instants of time in an attempt to improve their service rate. This approach to load balancing contrasts with traditional approaches where clients make smart server selections upon arrival (e.g., Join-the-Shortest-Queue policy and variants thereof). Load resampling is particularly relevant in scenarios where clients cannot predict the load of a server before being actually attached to it. An important example is in wireless spectrum sharing where clients try to share a set of frequency bands in a distributed manner.Comment: Accepted to Sigmetrics 201

    Task-based adaptive multiresolution for time-space multi-scale reaction-diffusion systems on multi-core architectures

    Get PDF
    A new solver featuring time-space adaptation and error control has been recently introduced to tackle the numerical solution of stiff reaction-diffusion systems. Based on operator splitting, finite volume adaptive multiresolution and high order time integrators with specific stability properties for each operator, this strategy yields high computational efficiency for large multidimensional computations on standard architectures such as powerful workstations. However, the data structure of the original implementation, based on trees of pointers, provides limited opportunities for efficiency enhancements, while posing serious challenges in terms of parallel programming and load balancing. The present contribution proposes a new implementation of the whole set of numerical methods including Radau5 and ROCK4, relying on a fully different data structure together with the use of a specific library, TBB, for shared-memory, task-based parallelism with work-stealing. The performance of our implementation is assessed in a series of test-cases of increasing difficulty in two and three dimensions on multi-core and many-core architectures, demonstrating high scalability

    Doing-it-All with Bounded Work and Communication

    Get PDF
    We consider the Do-All problem, where pp cooperating processors need to complete tt similar and independent tasks in an adversarial setting. Here we deal with a synchronous message passing system with processors that are subject to crash failures. Efficiency of algorithms in this setting is measured in terms of work complexity (also known as total available processor steps) and communication complexity (total number of point-to-point messages). When work and communication are considered to be comparable resources, then the overall efficiency is meaningfully expressed in terms of effort defined as work + communication. We develop and analyze a constructive algorithm that has work O(t+plogp(plogp+tlogt))O( t + p \log p\, (\sqrt{p\log p}+\sqrt{t\log t}\, ) ) and a nonconstructive algorithm that has work O(t+plog2p)O(t +p \log^2 p). The latter result is close to the lower bound Ω(t+plogp/loglogp)\Omega(t + p \log p/ \log \log p) on work. The effort of each of these algorithms is proportional to its work when the number of crashes is bounded above by cpc\,p, for some positive constant c<1c < 1. We also present a nonconstructive algorithm that has effort O(t+p1.77)O(t + p ^{1.77})

    Semi-Partitioned Scheduling of Dynamic Real-Time Workload: A Practical Approach Based on Analysis-Driven Load Balancing

    Get PDF
    Recent work showed that semi-partitioned scheduling can achieve near-optimal schedulability performance, is simpler to implement compared to global scheduling, and less heavier in terms of runtime overhead, thus resulting in an excellent choice for implementing real-world systems. However, semi-partitioned scheduling typically leverages an off-line design to allocate tasks across the available processors, which requires a-priori knowledge of the workload. Conversely, several simple global schedulers, as global earliest-deadline first (G-EDF), can transparently support dynamic workload without requiring a task-allocation phase. Nonetheless, such schedulers exhibit poor worst-case performance. This work proposes a semi-partitioned approach to efficiently schedule dynamic real-time workload on a multiprocessor system. A linear-time approximation for the C=D splitting scheme under partitioned EDF scheduling is first presented to reduce the complexity of online scheduling decisions. Then, a load-balancing algorithm is proposed for admitting new real-time workload in the system with limited workload re-allocation. A large-scale experimental study shows that the linear-time approximation has a very limited utilization loss compared to the exact technique and the proposed approach achieves very high schedulability performance, with a consistent improvement on G-EDF and pure partitioned EDF scheduling

    Flow-Aware Elephant Flow Detection for Software-Defined Networks

    Get PDF
    Software-defined networking (SDN) separates the network control plane from the packet forwarding plane, which provides comprehensive network-state visibility for better network management and resilience. Traffic classification, particularly for elephant flow detection, can lead to improved flow control and resource provisioning in SDN networks. Existing elephant flow detection techniques use pre-set thresholds that cannot scale with the changes in the traffic concept and distribution. This paper proposes a flow-aware elephant flow detection applied to SDN. The proposed technique employs two classifiers, each respectively on SDN switches and controller, to achieve accurate elephant flow detection efficiently. Moreover, this technique allows sharing the elephant flow classification tasks between the controller and switches. Hence, most mice flows can be filtered in the switches, thus avoiding the need to send large numbers of classification requests and signaling messages to the controller. Experimental findings reveal that the proposed technique outperforms contemporary methods in terms of the running time, accuracy, F-measure, and recall
    corecore