Search CORE

746 research outputs found

Scalable parallel communications

Author: Foudriat E. C.
Khanna S.
Maly K.
Mukkamala R.
Overstreet C. M.
Sekhar Y. S.
Zubair M.
Publication venue
Publication date
Field of study

Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups

NASA Technical Reports Server

Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations

Author: Hallock Michael J.
Stone John E.
Roberts Elijah
Fry Corey
Luthey-Schulten Zaida
Publication venue: The Authors. Published by Elsevier B.V.
Publication date: 01/01/2008
Field of study

AbstractSimulation of in vivo cellular processes with the reaction–diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel efficiency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli. Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems

Elsevier - Publisher Connector

Crossref

Institute Of Software, Chinese Academy Of Sciences

Molecular simulations and visualization: introduction and overview

Author: Agapito
Anderson
Andrade
Andrei
Baker
Betz
Block
Bohannon
Bradley
Bromberg
Bromley
Brown
Brown
Bruckner
Bryden
Buyya
Chavent
Chavent
Chavent
Chen
Cipriano
Cooper
Cooper
Da Costa
Dahl
Dalkas
David R. Glowacki
Delalande
Dreher
Ebejer
Eiben
Ellingson
Falk
Farber
Ferey
Francl
Fung
Genovese
Gillet
Good
Good
Grottel
Grottel
Haag
Hacene
Hamdi
Heyd
Hornus
Humphrey
Höst
Iwasa
Johnson
Johnson
Jonathan D. Hirst
Karaca
Kasson
Khatib
Khatib
Korb
Krieger
Krone
Krone
Krone
Lakhani
Lane
Larsson
Leang
Lindow
Lindow
Lintott
Lv
Marc Baaden
Marion
Matthey
McGill
Mura
O'Donoghue
Parulek
Parulek
Pauling
Phillips
Pickard
Plimpton
Praneenararat
Ricci
Romano
Salomon-Ferrer
Schneidman-Duhovny
Schwede
Shaw
Simard
Simard
Sisto
Sommer
Sterpone
Stone
Stone
Stone
Surowiecki
Tarini
Tek
Thakur
Ufimtsev
van der Zwan
Wahle
Weber
Wilkinson
Wollacott
Wong
Wu
Yasuda
Yennamalli
Zheng
Zhou
ZKM Center for Art and Media Karlsruhe
Zonta
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 30/11/2013
Field of study

Here we provide an introduction and overview of current progress in the field of molecular simulation and visualization, touching on the following topics: (1) virtual and augmented reality for immersive molecular simulations; (2) advanced visualization and visual analytic techniques; (3) new developments in high performance computing; and (4) applications and model building

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

Hal-Diderot

Explore Bristol Research

Multilevel Parallel Communications

Author: Khanna Sanjay
Publication venue: ODU Digital Commons
Publication date: 01/07/1993
Field of study

The research reported in this thesis investigates the use of parallelism at multiple levels to realize high-speed networks that offer advantages in throughput, cost, reliability, and flexibility over alternative approaches. This research specifically considers use of parallelism at two levels: the upper level and the lower level. At the upper level, N protocol processors perform functions included in the transport and network layers. At the lower level, M channels provide data and physical layer functions. The resulting system provides very high bandwidth to an application. A key concept of this research is the use of replicated channels to provide a single, high bandwidth channel to a single application. The parallelism provided by the network is transparent to communicating applications, thus differentiating this strategy from schemes that provide a collection of disjoint channels between applications on different nodes. Another innovative aspect of this research is that parallelism is exploited at multiple layers of the network to provide high throughput not only at the physical layer, but also at upper protocol layers. Schedulers are used to distribute data from a single stream to multiple channels and to merge data from multiple channels to reconstruct a single coherent stream. High throughput is possible by providing the combined bandwidth of multiple channels to a single source and destination through use of parallelism at multiple protocol layers. This strategy is cost effective since systems can be built using standard technologies that benefit from the economies of a broad applications base. The exotic and revolutionary components needed in non-parallel approaches to build high speed networks are not required. The replicated channels can be used to achieve high reliability as well. Multilevel parallelism is flexible since the degree of parallelism provided at any level can be matched to protocol processing demands and application requirements

Old Dominion University

Hierarchy-Aware Message-Passing in the Upcoming Many-Core Era

Author: Carsten Clauss
Simon Pickartz
Stefan Lankes
Thomas Bemmerl
Publication venue: 'IntechOpen'
Publication date: 01/01/2012
Field of study

IntechOpen

Crossref

Publikationsserver der RWTH Aachen University

Designing, Building, and Modeling Maneuverable Applications within Shared Computing Resources

Author: Moody William Clay
Publication venue: Clemson University Libraries
Publication date: 01/05/2015
Field of study

Extending the military principle of maneuver into war-ﬁghting domain of cyberspace, academic and military researchers have produced many theoretical and strategic works, though few have focused on researching actual applications and systems that apply this principle. We present our research in designing, building and modeling maneuverable applications in order to gain the system advantages of resource provisioning, application optimization, and cybersecurity improvement. We have coined the phrase “Maneuverable Applications” to be deﬁned as distributed and parallel application that take advantage of the modiﬁcation, relocation, addition or removal of computing resources, giving the perception of movement. Our work with maneuverable applications has been within shared computing resources, such as the Clemson University Palmetto cluster, where multiple users share access and time to a collection of inter-networked computers and servers. In this dissertation, we describe our implementation and analytic modeling of environments and systems to maneuver computational nodes, network capabilities, and security enhancements for overcoming challenges to a cyberspace platform. Speciﬁcally we describe our work to create a system to provision a big data computational resource within academic environments. We also present a computing testbed built to allow researchers to study network optimizations of data centers. We discuss our Petri Net model of an adaptable system, which increases its cybersecurity posture in the face of varying levels of threat from malicious actors. Lastly, we present work and investigation into integrating these technologies into a prototype resource manager for maneuverable applications and validating our model using this implementation

Clemson University: TigerPrints

Overlapping of Communication and Computation and Early Binding: Fundamental Mechanisms for Improving Parallel Performance on Clusters of Workstations

Author: Dimitrov Rossen Petkov
Publication venue: Scholars Junction
Publication date: 12/05/2001
Field of study

This study considers software techniques for improving performance on clusters of workstations and approaches for designing message-passing middleware that facilitate scalable, parallel processing. Early binding and overlapping of communication and computation are identified as fundamental approaches for improving parallel performance and scalability on clusters. Currently, cluster computers using the Message-Passing Interface for interprocess communication are the predominant choice for building high-performance computing facilities, which makes the findings of this work relevant to a wide audience from the areas of high-performance computing and parallel processing. The performance-enhancing techniques studied in this work are presently underutilized in practice because of the lack of adequate support by existing message-passing libraries and are also rarely considered by parallel algorithm designers. Furthermore, commonly accepted methods for performance analysis and evaluation of parallel systems omit these techniques and focus primarily on more obvious communication characteristics such as latency and bandwidth. This study provides a theoretical framework for describing early binding and overlapping of communication and computation in models for parallel programming. This framework defines four new performance metrics that facilitate new approaches for performance analysis of parallel systems and algorithms. This dissertation provides experimental data that validate the correctness and accuracy of the performance analysis based on the new framework. The theoretical results of this performance analysis can be used by designers of parallel system and application software for assessing the quality of their implementations and for predicting the effective performance benefits of early binding and overlapping. This work presents MPI/Pro, a new MPI implementation that is specifically optimized for clusters of workstations interconnected with high-speed networks. This MPI implementation emphasizes features such as persistent communication, asynchronous processing, low processor overhead, and independent message progress. These features are identified as critical for delivering maximum performance to applications. The experimental section of this dissertation demonstrates the capability of MPI/Pro to facilitate software techniques that result in significant application performance improvements. Specific demonstrations with Virtual Interface Architecture and TCP/IP over Ethernet are offered

Scholars Junction - Mississippi State University Institutional Repository

Identifying and Harnessing Concurrency for Parallel and Distributed Network Simulation

Author: Andelfinger Philipp Josef
Publication venue: KIT Scientific Publishing
Publication date: 30/07/2019
Field of study

Although computer networks are inherently parallel systems, the parallel execution of network simulations on interconnected processors frequently yields only limited benefits. In this thesis, methods are proposed to estimate and understand the parallelization potential of network simulations. Further, mechanisms and architectures for exploiting the massively parallel processing resources of modern graphics cards to accelerate network simulations are proposed and evaluated

Directory of Open Access Books (DOAB)

Identifying and Harnessing Concurrency for Parallel and Distributed Network Simulation

Author: Andelfinger Philipp Josef
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2016
Field of study

KITopen