103,666 research outputs found

    Exploiting variability for energy optimization of parallel programs

    Full text link
    In this paper we present optimizations that use DVFS mechanisms to reduce the total energy usage in scientific applications. Our main insight is that noise is intrinsic to large scale parallel executions and it appears whenever shared resources are contended. The presence of noise allows us to identify and manipulate any program regions amenable to DVFS. When compared to previous energy optimizations that make per core decisions using predictions of the running time, our scheme uses a qualitative approach to recognize the signature of executions amenable to DVFS. By recognizing the "shape of variability" we can optimize codes with highly dynamic behavior, which pose challenges to all existing DVFS techniques. We validate our approach using offline and online analyses for one-sided and two-sided communication paradigms. We have applied our methods to NWChem, and we show best case improvements in energy use of 12% at no loss in performance when using online optimizations running on 720 Haswell cores with one-sided communication. With NWChem on MPI two-sided and offline analysis, capturing the initialization, we find energy savings of up to 20%, with less than 1% performance cost

    On the conditions for efficient interoperability with threads: An experience with PGAS languages using Cray communication domains

    Get PDF
    Today's high performance systems are typically built from shared memory nodes connected by a high speed network. That architecture, combined with the trend towards less memory per core, encourages programmers to use a mixture of message passing and multithreaded programming. Unfortunately, the advantages of using threads for in-node programming are hindered by their inability to efficiently communicate between nodes. In this work, we identify some of the performance problems that arise in such hybrid programming environments and characterize conditions needed to achieve high communication performance for multiple threads: addressability of targets, separability of communication paths, and full direct reachability to targets. Using the GASNet communication layer on the Cray XC30 as our experimental platform, we show how to satisfy these conditions. We also discuss how satisfying these conditions is influenced by the communication abstraction, implementation constraints, and the interconnect messaging capabilities. To evaluate these ideas, we compare the communication performance of a thread-based node runtime to a process-based runtime. Without our GASNet extensions, thread communication is significantly slower than processes - up to 21x slower. Once the implementation is modified to address each of our conditions, the two runtimes have comparable communication performance. This allows programmers to more easily mix models like OpenMP, CILK, or pthreads with a GASNet-based model like UPC, with the associated performance, convenience and interoperability advantages that come from using threads within a node. © 2014 ACM

    DART-MPI: An MPI-based Implementation of a PGAS Runtime System

    Full text link
    A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of developing parallel applications, because no explicit communication has to be specified in the program for data exchange between different computing nodes. In this paper we present DART, a runtime environment, which implements the PGAS paradigm on large-scale high-performance computing clusters. A specific feature of our implementation is the use of one-sided communication of the Message Passing Interface (MPI) version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated the performance of the implementation with several low-level kernels in order to determine overheads and limitations in comparison to the underlying MPI-3.Comment: 11 pages, International Conference on Partitioned Global Address Space Programming Models (PGAS14

    OPR

    Get PDF
    The ability to reproduce a parallel execution is desirable for debugging and program reliability purposes. In debugging (13), the programmer needs to manually step back in time, while for resilience (6) this is automatically performed by the the application upon failure. To be useful, replay has to faithfully reproduce the original execution. For parallel programs the main challenge is inferring and maintaining the order of conflicting operations (data races). Deterministic record and replay (R&R) techniques have been developed for multithreaded shared memory programs (5), as well as distributed memory programs (14). Our main interest is techniques for large scale scientific (3; 4) programming models

    Designing Scalable Business Models

    Full text link
    Digital business models are often designed for rapid growth, and some relatively young companies have indeed achieved global scale. However despite the visibility and importance of this phenomenon, analysis of scale and scalability remains underdeveloped in management literature. When it is addressed, analysis of this phenomenon is often over-influenced by arguments about economies of scale in production and distribution. To redress this omission, this paper draws on economic, organization and technology management literature to provide a detailed examination of the sources of scaling in digital businesses. We propose three mechanisms by which digital business models attempt to gain scale: engaging both non- paying users and paying customers; organizing customer engagement to allow self- customization; and orchestrating networked value chains, such as platforms or multi-sided business models. Scaling conditions are discussed, and propositions developed and illustrated with examples of big data entrepreneurial firms

    How managers can build trust in strategic alliances: a meta-analysis on the central trust-building mechanisms

    Get PDF
    Trust is an important driver of superior alliance performance. Alliance managers are influential in this regard because trust requires active involvement, commitment and the dedicated support of the key actors involved in the strategic alliance. Despite the importance of trust for explaining alliance performance, little effort has been made to systematically investigate the mechanisms that managers can use to purposefully create trust in strategic alliances. We use Parkhe’s (1998b) theoretical framework to derive nine hypotheses that distinguish between process-based, characteristic-based and institutional-based trust-building mechanisms. Our meta-analysis of 64 empirical studies shows that trust is strongly related to alliance performance. Process-based mechanisms are more important for building trust than characteristic- and institutional-based mechanisms. The effects of prior ties and asset specificity are not as strong as expected and the impact of safeguards on trust is not well understood. Overall, theoretical trust research has outpaced empirical research by far and promising opportunities for future empirical research exist

    MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface

    Full text link
    Application development for distributed computing "Grids" can benefit from tools that variously hide or enable application-level management of critical aspects of the heterogeneous environment. As part of an investigation of these issues, we have developed MPICH-G2, a Grid-enabled implementation of the Message Passing Interface (MPI) that allows a user to run MPI programs across multiple computers, at the same or different sites, using the same commands that would be used on a parallel computer. This library extends the Argonne MPICH implementation of MPI to use services provided by the Globus Toolkit for authentication, authorization, resource allocation, executable staging, and I/O, as well as for process creation, monitoring, and control. Various performance-critical operations, including startup and collective operations, are configured to exploit network topology information. The library also exploits MPI constructs for performance management; for example, the MPI communicator construct is used for application-level discovery of, and adaptation to, both network topology and network quality-of-service mechanisms. We describe the MPICH-G2 design and implementation, present performance results, and review application experiences, including record-setting distributed simulations.Comment: 20 pages, 8 figure

    Distributed Channel Assignment in Cognitive Radio Networks: Stable Matching and Walrasian Equilibrium

    Full text link
    We consider a set of secondary transmitter-receiver pairs in a cognitive radio setting. Based on channel sensing and access performances, we consider the problem of assigning channels orthogonally to secondary users through distributed coordination and cooperation algorithms. Two economic models are applied for this purpose: matching markets and competitive markets. In the matching market model, secondary users and channels build two agent sets. We implement a stable matching algorithm in which each secondary user, based on his achievable rate, proposes to the coordinator to be matched with desirable channels. The coordinator accepts or rejects the proposals based on the channel preferences which depend on interference from the secondary user. The coordination algorithm is of low complexity and can adapt to network dynamics. In the competitive market model, channels are associated with prices and secondary users are endowed with monetary budget. Each secondary user, based on his utility function and current channel prices, demands a set of channels. A Walrasian equilibrium maximizes the sum utility and equates the channel demand to their supply. We prove the existence of Walrasian equilibrium and propose a cooperative mechanism to reach it. The performance and complexity of the proposed solutions are illustrated by numerical simulations.Comment: submitted to IEEE Transactions on Wireless Communicaitons, 13 pages, 10 figures, 4 table
    • …
    corecore