68,719 research outputs found

    Static and Dynamic Scheduling for Effective Use of Multicore Systems

    Get PDF
    Multicore systems have increasingly gained importance in high performance computers. Compared to the traditional microarchitectures, multicore architectures have a simpler design, higher performance-to-area ratio, and improved power efficiency. Although the multicore architecture has various advantages, traditional parallel programming techniques do not apply to the new architecture efficiently. This dissertation addresses how to determine optimized thread schedules to improve data reuse on shared-memory multicore systems and how to seek a scalable solution to designing parallel software on both shared-memory and distributed-memory multicore systems. We propose an analytical cache model to predict the number of cache misses on the time-sharing L2 cache on a multicore processor. The model provides an insight into the impact of cache sharing and cache contention between threads. Inspired by the model, we build the framework of affinity based thread scheduling to determine optimized thread schedules to improve data reuse on all the levels in a complex memory hierarchy. The affinity based thread scheduling framework includes a model to estimate the cost of a thread schedule, which consists of three submodels: an affinity graph submodel, a memory hierarchy submodel, and a cost submodel. Based on the model, we design a hierarchical graph partitioning algorithm to determine near-optimal solutions. We have also extended the algorithm to support threads with data dependences. The algorithms are implemented and incorporated into a feedback directed optimization prototype system. The prototype system builds upon a binary instrumentation tool and can improve program performance greatly on shared-memory multicore architectures. We also study the dynamic data-availability driven scheduling approach to designing new parallel software on distributed-memory multicore architectures. We have implemented a decentralized dynamic runtime system. The design of the runtime system is focused on the scalability metric. At any time only a small portion of a task graph exists in memory. We propose an algorithm to solve data dependences without process cooperation in a distributed manner. Our experimental results demonstrate the scalability and practicality of the approach for both shared-memory and distributed-memory multicore systems. Finally, we present a scalable nonblocking topology-aware multicast scheme for distributed DAG scheduling applications

    The End of Slow Networks: It's Time for a Redesign

    Full text link
    Next generation high-performance RDMA-capable networks will require a fundamental rethinking of the design and architecture of modern distributed DBMSs. These systems are commonly designed and optimized under the assumption that the network is the bottleneck: the network is slow and "thin", and thus needs to be avoided as much as possible. Yet this assumption no longer holds true. With InfiniBand FDR 4x, the bandwidth available to transfer data across network is in the same ballpark as the bandwidth of one memory channel, and it increases even further with the most recent EDR standard. Moreover, with the increasing advances of RDMA, the latency improves similarly fast. In this paper, we first argue that the "old" distributed database design is not capable of taking full advantage of the network. Second, we propose architectural redesigns for OLTP, OLAP and advanced analytical frameworks to take better advantage of the improved bandwidth, latency and RDMA capabilities. Finally, for each of the workload categories, we show that remarkable performance improvements can be achieved

    A customizable multi-agent system for distributed data mining

    Get PDF
    We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances
    corecore