191 research outputs found

    A Study of Quality of Service Communication for High-Speed Packet-Switching Computer Sub-Networks

    Get PDF
    In this thesis, we analyze various factors that affect quality of service (QoS) communication in high-speed, packet-switching sub-networks. We hypothesize that sub-network-wide bandwidth reservation and guaranteed CPU processing power at endpoint systems for handling data traffic are indispensable to achieving hard end-to-end quality of service. Different bandwidth reservation strategies, traffic characterization schemes, and scheduling algorithms affect the network resources and CPU usage as well as the extent that QoS can be achieved. In order to analyze those factors, we design and implement a communication layer. Our experimental analysis supports our research hypothesis. The Resource ReSerVation Protocol (RSVP) is designed to realize resource reservation. Our analysis of RSVP shows that using RSVP solely is insufficient to provide hard end-to-end quality of service in a high-speed sub-network. Analysis of the IEEE 802.lp protocol also supports the research hypothesis

    The Nornir run-time system for parallel programs using Kahn process networks on multi-core machines – A flexible alternative to MapReduce

    Get PDF
    Even though shared-memory concurrency is a paradigm frequently used for developing parallel applications on small- and middle-sized machines, experience has shown that it is hard to use. This is largely caused by synchronization primitives which are low-level, inherently non-deterministic, and, consequently, non-intuitive to use. In this paper, we present the Nornir run-time system. Nornir is comparable to well-known frameworks such as MapReduce and Dryad that are recognized for their efficiency and simplicity. Unlike these frameworks, Nornir also supports process structures containing branches and cycles. Nornir is based on the formalism of Kahn process networks, which is a shared-nothing, message-passing model of concurrency. We deem this model a simple and deterministic alternative to shared-memory concurrency. Experiments with real and synthetic benchmarks on up to 8 CPUs show that performance in most cases scales almost linearly with the number of CPUs, when not limited by data dependencies. We also show that the modeling flexibility allows Nornir to outperform its MapReduce counterparts using well-known benchmarks. This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited

    Multicore Architecture-aware Scientific Applications

    Get PDF
    Modern high performance systems are becoming increasingly complex and powerful due to advancements in processor and memory architecture. In order to keep up with this increasing complexity, applications have to be augmented with certain capabilities to fully exploit such systems. These may be at the application level, such as static or dynamic adaptations or at the system level, like having strategies in place to override some of the default operating system polices, the main objective being to improve computational performance of the application. The current work proposes two such capabilites with respect to multi-threaded scientific applications, in particular a large scale physics application computing ab-initio nuclear structure. The first involves using a middleware tool to invoke dynamic adaptations in the application, so as to be able to adjust to the changing computational resource availability at run-time. The second involves a strategy for effective placement of data in main memory, to optimize memory access latencies and bandwidth. These capabilties when included were found to have a significant impact on the application performance, resulting in average speedups of as much as two to four times

    A flexible simulation framework for processor scheduling algorithms in multicore systems.

    Get PDF
    In traditional uniprocessor systems, processor scheduling is the responsibility of the operating system. In high performance computing (HPC) domains that largely involve parallel processors, the responsibility of scheduling is usually left to the applications. So far, parallel computing has been confined to a small group of specialized HPC users. In this context, the hardware, operating system, and the applications have been mostly designed independently with minimal interactions. As the multicore processors are becoming the norm, parallel programming is expected to emerge as the mainstream software development approach. This new trend poses several challenges including performance, power management, system utilization, and predictable response. Such a demand is hard to meet without the cooperation from hardware, operating system, and applications. Particularly, an efficient scheduling of cores to the application threads is fundamentally important in assuring the above mentioned characteristics. We believe, operating system requires to take a larger responsibility in ensuring efficient multicore scheduling of application threads. To study the performance of a new scheduling algorithm for the future multicore systems with hundreds and thousands of cores, we need a flexible scheduling simulation testbed. Designing such a multicore scheduling simulation testbed and illustrating its functionality by studying some well known scheduling algorithms Linux and Solaris are the main contributions of this thesis. In addition to studying Linux and Solaris scheduling algorithms, we demonstrate the power, flexibility, and use of the proposed scheduling testbed by simulating two popular gang scheduling algorithms - adaptive first-come-first-served (AFCFS) and largest gang first served (LGFS). As a result of this performance study, we designed a new gang scheduling algorithm and we compared its performance with AFCFS. The proposed scheduling simulation testbed is developed using Java and expected to be released for public use.The original print copy of this thesis may be available here: http://wizard.unbc.ca/record=b180562

    Multicore Architecture-aware Scientific Applications

    Full text link

    Demand-based coscheduling of parallel jobs on multiprogrammed multiprocessors

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (p. 92-94).by Patrick Gregory Sobalvarro.Ph.D

    Portable Checkpointing for Parallel Applications

    Full text link
    High Performance Computing (HPC) systems represent the peak of modern computational capability. As ever-increasing demands for computational power have fuelled the demand for ever-larger computing systems, modern HPC systems have grown to incorporate hundreds, thousands or as many as 130,000 processors. At these scales, the huge number of individual components in a single system makes the probability that a single component will fail quite high, with today's large HPC systems featuring mean times between failures on the order of hours or a few days. As many modern computational tasks require days or months to complete, fault tolerance becomes critical to HPC system design. The past three decades have seen significant amounts of research on parallel system fault tolerance. However, as most of it has been either theoretical or has focused on low-level solutions that are embedded into a particular operating system or type of hardware, this work has had little impact on real HPC systems. This thesis attempts to address this lack of impact by describing a high-level approach for implementing checkpoint/restart functionality that decouples the fault tolerance solution from the details of the operating system, system libraries and the hardware and instead connects it to the APIs implemented by the above components. The resulting solution enables applications that use these APIs to become self-checkpointing and self-restarting regardless of the the software/hardware platform that may implement the APIs. The particular focus of this thesis is on the problem of checkpoint/restart of parallel applications. It presents two theoretical checkpointing protocols, one for the message passing communication model and one for the shared memory model. The former is the first protocol to be compatible with application-level checkpointing of individual processes, while the latter is the first protocol that is compatible with arbitrary shared memory models, APIs, implementations and consistency protocols. These checkpointing protocols are used to implement checkpointing systems for applications that use the MPI and OpenMP parallel APIs, respectively, and are first in providing checkpoint/restart to arbitrary implementations of these popular APIs. Both checkpointing systems are extensively evaluated on multiple software/hardware platforms and are shown to feature low overheads
    • …
    corecore