Search CORE

87 research outputs found

Interconnect Performance Evaluation of SGI Altix 3700 BX2, Cray X1, Cray Opteron Cluster, and Dell PowerEdge

Author: Ciotti Robert
Fatoohi Rod
Saini Subbash
Publication venue
Publication date
Field of study

We study the performance of inter-process communication on four high-speed multiprocessor systems using a set of communication benchmarks. The goal is to identify certain limiting factors and bottlenecks with the interconnect of these systems as well as to compare these interconnects. We measured network bandwidth using different number of communicating processors and communication patterns, such as point-to-point communication, collective communication, and dense communication patterns. The four platforms are: a 512-processor SGI Altix 3700 BX2 shared-memory machine with 3.2 GB/s links; a 64-processor (single-streaming) Cray XI shared-memory machine with 32 1.6 GB/s links; a 128-processor Cray Opteron cluster using a Myrinet network; and a 1280-node Dell PowerEdge cluster with an InfiniBand network. Our, results show the impact of the network bandwidth and topology on the overall performance of each interconnect

NASA Technical Reports Server

An Application-Based Performance Characterization of the Columbia Supercluster

Author: Biswas Rupak
Djomehri Jahed M.
Hood Robert
Jin Hoaqiang
Kiris Cetin
Saini Subhash
Publication venue
Publication date
Field of study

Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as the second-fastest computer in the world. In this paper, we present the performance characteristics of Columbia obtained on up to four computing nodes interconnected via the InfiniBand and/or NUMAlink4 communication fabrics. We evaluate floating-point performance, memory bandwidth, message passing communication speeds, and compilers using a subset of the HPC Challenge benchmarks, and some of the NAS Parallel Benchmarks including the multi-zone versions. We present detailed performance results for three scientific applications of interest to NASA, one from molecular dynamics, and two from computational fluid dynamics. Our results show that both the NUMAlink4 and the InfiniBand hold promise for application scaling to a large number of processors

CiteSeerX

NASA Technical Reports Server

Static and Dynamic Scheduling for Effective Use of Multicore Systems

Author: Song Fengguang
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2009
Field of study

Multicore systems have increasingly gained importance in high performance computers. Compared to the traditional microarchitectures, multicore architectures have a simpler design, higher performance-to-area ratio, and improved power efficiency. Although the multicore architecture has various advantages, traditional parallel programming techniques do not apply to the new architecture efficiently. This dissertation addresses how to determine optimized thread schedules to improve data reuse on shared-memory multicore systems and how to seek a scalable solution to designing parallel software on both shared-memory and distributed-memory multicore systems. We propose an analytical cache model to predict the number of cache misses on the time-sharing L2 cache on a multicore processor. The model provides an insight into the impact of cache sharing and cache contention between threads. Inspired by the model, we build the framework of affinity based thread scheduling to determine optimized thread schedules to improve data reuse on all the levels in a complex memory hierarchy. The affinity based thread scheduling framework includes a model to estimate the cost of a thread schedule, which consists of three submodels: an affinity graph submodel, a memory hierarchy submodel, and a cost submodel. Based on the model, we design a hierarchical graph partitioning algorithm to determine near-optimal solutions. We have also extended the algorithm to support threads with data dependences. The algorithms are implemented and incorporated into a feedback directed optimization prototype system. The prototype system builds upon a binary instrumentation tool and can improve program performance greatly on shared-memory multicore architectures. We also study the dynamic data-availability driven scheduling approach to designing new parallel software on distributed-memory multicore architectures. We have implemented a decentralized dynamic runtime system. The design of the runtime system is focused on the scalability metric. At any time only a small portion of a task graph exists in memory. We propose an algorithm to solve data dependences without process cooperation in a distributed manner. Our experimental results demonstrate the scalability and practicality of the approach for both shared-memory and distributed-memory multicore systems. Finally, we present a scalable nonblocking topology-aware multicast scheme for distributed DAG scheduling applications

University of Tennessee, Knoxville: Trace

A High-Level Programming Language for Modelling the Earth

Author: Davies Matt
Gross Lutz
Smillie Jonathan
Publication venue
Publication date: 01/01/2004
Field of study

Computational models based on the solution of partial differential equations (PDEs) play a key role in Earth systems simulations. The software implementing these models depends on the discretisation method, data structures and the computer architecture. For this reason, it is difficult for scientists to implement new models without strong software engineering skills. In this paper, we present a computational modeling language (escript) based on the object-oriented scripting language (Python). This language, is designed to implement PDE-based models with a high degree of abstraction from the underlying discretization techniques and their implementation. The main components of escript are the Data class objects which handle data with a spatial distribution and the linearPDE class which define linear PDEs to be solved in each step of a time integration or non-linear iteration scheme. As an example we will discuss the solution of the Lame equation and the implementation of a quasi-static model for crustal fault systems

University of Queensland eSpace

Jahresbericht 2006 zur kooperativen DV-Versorgung

Author
Publication venue: Technische Universität Dresden
Publication date: 01/07/2013
Field of study

:VORWORT 9 ÜBERSICHT DER INSERENTEN 12 TEIL I ZUR ARBEIT DER DV-KOMMISSION 15 MITGLIEDER DER DV-KOMMISSION 15 ZUR ARBEIT DES LENKUNGSAUSSCHUSSES FÜR DAS ZIH 17 TEIL II 1 DAS ZENTRUM FÜR INFORMATIONSDIENSTE UND HOCHLEISTUNGSRECHNEN (ZIH) 21 1.1 AUFGABEN 21 1.2 ZAHLEN UND FAKTEN (REPRÄSENTATIVE AUSWAHL) 21 1.3 HAUSHALT 22 1.4 STRUKTUR / PERSONAL 23 1.5 STANDORT 24 1.6 GREMIENARBEIT 25 2 KOMMUNIKATIONSINFRASTRUKTUR 27 2.1 NUTZUNGSÜBERSICHT NETZDIENSTE 27 2.1.1 WiN-IP-Verkehr 27 2.2 NETZWERKINFRASTRUKTUR 27 2.2.1 Allgemeine Versorgungsstruktur 27 2.2.2 Netzebenen 27 2.2.3 Backbone und lokale Vernetzung 28 2.2.4 Druck-Kopierer-Netz 32 2.2.5 Funk-LAN (WLAN) 32 2.2.6 Datennetz zwischen den Universitätsstandorten und Außenanbindung 33 2.2.7 Datennetz zu den Wohnheimstandorten 36 2.3 KOMMUNIKATIONS- UND INFORMATIONSDIENSTE 38 2.3.1 Electronic-Mail 38 2.3.1.1 Einführung einheitlicher E-Mail-Adressen an der TU Dresden 39 2.3.1.2 Einführung funktionsbezogener TU-Mail-Adressen 40 2.3.1.3 ZIH verwaltete Nutzer-Mailboxen 40 2.3.1.4 Web-Mail 41 2.3.2 WWW 41 2.3.3 Wählzugänge 43 2.2.4 Time-Service 43 3 ZENTRALE DIENSTANGEBOTE UND SERVER 45 3.1 BENUTZERBERATUNG (BB) 45 3.2 TROUBLE TICKET SYSTEM (TTS) 45 3.3 NUTZERMANAGEMENT 46 3.4 LOGIN-SERVICE 47 3.5 STORAGE-MANAGEMENT 47 3.5.1 Backup-Service 50 3.5.2 File-Service 52 3.6 LIZENZ-SERVICE 54 3.7 PERIPHERIE-SERVICES 54 3.8 PC-POOLS 55 3.9 SECURITY 56 4 SERVICELEISTUNGEN FÜR DEZENTRALE DV-SYSTEME 59 4.1 ALLGEMEINES 59 4.2 PC-SUPPORT 59 4.2.1 Investberatung 59 4.2.2 Implementierung 59 4.2.3 Instandhaltung 59 4.2.4 Notebook-Ausleihe 60 4.3 MICROSOFT WINDOWS-SUPPORT 60 4.4 ZENTRALE SOFTWARE-BESCHAFFUNG FÜR DIE TU DRESDEN 66 4.4.1 Arbeitsgruppentätigkeit 66 4.4.2 Strategie des Software-Einsatzes an der TU Dresden 67 4.4.3 Software-Beschaffung 67 5 HOCHLEISTUNGSRECHNEN 75 5.1 HOCHLEISTUNGSRECHNER/SPEICHERKOMPLEX (HRSK) 75 5.1.1 HRSK-Neubau 76 5.1.2 SGI Altix 3700 (Stufe 1a) 76 5.1.3 SGI Altix 4700 77 5.1.4 Linux Networx PC-Farm (Stufe 1a) 78 5.1.5 Linux Networx PC-Farm 79 5.2 NUTZUNGSÜBERSICHT DER COMPUTE-SERVER 80 5.2.1 SGI Origin 3800 82 5.2.2 NEC SX6i 82 5.2.3 SGI Origin 2800 83 5.2.4 Anwender-Cluster 84 5.3 BIODATENBANKEN-SERVICE 84 5.4 ANWENDUNGSSOFTWARE 85 5.5 VISUALISIERUNG 85 5.6 PERFORMANCE TOOLS 86 6 WISSENSCHAFTLICHE KOOPERATION, PROJEKTE 89 6.1. DAS PROJEKT „KOMPETENZZENTRUM FÜR VIDEOKONFERENZDIENSTE“ 89 6.1.1 Überblick 89 6.1.2 Aufgaben und Entwicklungsarbeiten 89 6.1.3 Neuer Webauftritt 91 6.1.4 Weitere Aktivitäten 91 6.1.5 Der Dienst „DFNVideoConference“ - Mehrpunktkonferenzen im G-WiN 92 6.1.6 Tendenzen und Ausblicke 93 6.2 D-GRID 93 6.2.1 Hochenergiephysik Community Grid (HEP CG) - Entwicklung von Anwendungen und Komponenten zur Datenauswertung in der Hochenergiephysik in einer nationalen e-Science-Umgebung 93 6.2.2 MediGRID - Ressourcefusion für Medizin und Lebenswissenschaften 94 6.2.3 D-Grid Integrationsprojekt 94 6.2.4 Chemomentum 95 6.3 BIOLOGIE 95 6.3.1 BISON (Biologie-inspirierte Techniken zur Selbstorganisation in dynamischen Netzwerken) 95 6.3.2 Verständnis der molekularen Grundlage der Biogenese und Funktion der Endocytose 96 6.3.3 Mathematische Modellierung und Computersimulation des Tumorwachstums und Therapien 96 6.3.4 Entwicklung eines SME-freundlichen Zuchtprogramms für Korallen 97 6.3.5 Analyse raum-zeitlicher Musterbildung von Mikroorganismen 97 6.3.6 Regeneration beim Axolotl 97 6.3.7 Entwicklung und Analyse von stochastischen Interagierenden Vielteilchen-Modellen für biologische Zellinteraktion 98 6.3.8 Kompetenznetzwerk MTBio 98 6.4 PERFORMANCE EVALUIERUNG 98 6.4.1 Automatisches Auffinden von Performance-Engpässen in parallelen 98 Programmen unter Zuhilfenahme ihrer Tracedaten 6.4.2 SFB 609: Elektromagnetische Strömungsbeeinflussung in Metallurgie, Kristallzüchtung und Elektrochemie - Teilprojekt A1: Numerische Modellierung turbulenter MFD-Strömungen 99 6.5 HERSTELLERKOOPERATIONEN 100 6.5.1 Intel-Kooperation 100 6.5.2 NEC-Kooperation 100 7 AUSBILDUNGSBETRIEB UND PRAKTIKA 101 7.1 AUSBILDUNG ZUM FACHINFORMATIKER/FACHRICHTUNG ANWENDUNGSENTWICKLUNG 101 7.2 PRAKTIKA 101 8 AUS- UND WEITERBILDUNGSVERANSTALTUNGEN 103 9 VERANSTALTUNGEN 105 10 PUBLIKATIONEN 107 TEIL III BERICHTE DER ZENTRALEN EINRICHTUNGEN UND DER ZENTRALEN UNIVERSITÄTSVERWALTUNG AUDIO-VISUELLES MEDIENZENTRUM (AVMZ) 113 LEHRZENTRUM SPRACHEN UND KULTURRÄUME (LSK) 121 UNIVERSITÄTSARCHIV 125 ZENTRALE UNIVERSITÄTSVERWALTUNG 127 MDC 129 BIOTECHNOLOGISCHES ZENTRUM (BIOTEC) 131 TEIL IV BERICHT DER SÄCHSISCHEN LANDESBIBLIOTHEK - STAATS UND UNIVERSITÄTSBIBLIOTHEK DRESDEN 13

Technische Universität Dresden: Qucosa

{\sc CosmoNet}: fast cosmological parameter estimation in non-flat models using neural networks

Author: Auld
Bailer-Jones
Dickinson
Fendt
Gull
Habib
Hinshaw
Hobson
Jimenez
Jones
Kaplinghat
Kosowsky
Kuo
Leshno
Lewis
Lewis
M. Bridges
M. P. Hobson
Mackay
Montroy
Percival
Piacentini
Readhead
Rosenblatt
Sandvik
Seljak
Skilling
T. Auld
Tegmark
Publication venue: 'Wiley'
Publication date: 16/03/2007
Field of study

We present a further development of a method for accelerating the calculation of CMB power spectra, matter power spectra and likelihood functions for use in cosmological Bayesian inference. The algorithm, called {\sc CosmoNet}, is based on training a multilayer perceptron neural network. We compute CMB power spectra (up to

\ell=2000

) and matter transfer functions over a hypercube in parameter space encompassing the

4\sigma

confidence region of a selection of CMB (WMAP + high resolution experiments) and large scale structure surveys (2dF and SDSS). We work in the framework of a generic 7 parameter non-flat cosmology. Additionally we use {\sc CosmoNet} to compute the WMAP 3-year, 2dF and SDSS likelihoods over the same region. We find that the average error in the power spectra is typically well below cosmic variance for spectra, and experimental likelihoods calculated to within a fraction of a log unit. We demonstrate that marginalised posteriors generated with {\sc CosmoNet} spectra agree to within a few percent of those generated by {\sc CAMB} parallelised over 4 CPUs, but are obtained 2-3 times faster on just a \emph{single} processor. Furthermore posteriors generated directly via {\sc CosmoNet} likelihoods can be obtained in less than 30 minutes on a single processor, corresponding to a speed up of a factor of

\sim 32

. We also demonstrate the capabilities of {\sc CosmoNet} by extending the CMB power spectra and matter transfer function training to a more generic 10 parameter cosmological model, including tensor modes, a varying equation of state of dark energy and massive neutrinos. {\sc CosmoNet} and interfaces to both {\sc CosmoMC} and {\sc Bayesys} are publically available at {\tt www.mrao.cam.ac.uk/software/cosmonet}.Comment: 8 pages, submitted to MNRA

arXiv.org e-Print Archive

Crossref

CERN Document Server

Fast cosmological parameter estimation using neural networks

Author: M. Bridges
M. P. Hobson
Rosenblatt
S. F. Gull
T. Auld
Publication venue: 'Wiley'
Publication date: 08/08/2006
Field of study

We present a method for accelerating the calculation of CMB power spectra, matter power spectra and likelihood functions for use in cosmological parameter estimation. The algorithm, called CosmoNet, is based on training a multilayer perceptron neural network and shares all the advantages of the recently released Pico algorithm of Fendt & Wandelt, but has several additional benefits in terms of simplicity, computational speed, memory requirements and ease of training. We demonstrate the capabilities of CosmoNet by computing CMB power spectra over a box in the parameter space of flat \Lambda CDM models containing the 3\sigma WMAP1 confidence region. We also use CosmoNet to compute the WMAP3 likelihood for flat \Lambda CDM models and show that marginalised posteriors on parameters derived are very similar to those obtained using CAMB and the WMAP3 code. We find that the average error in the power spectra is typically 2-3% of cosmic variance, and that CosmoNet is \sim 7 \times 10^4 faster than CAMB (for flat models) and \sim 6 \times 10^6 times faster than the official WMAP3 likelihood code. CosmoNet and an interface to CosmoMC are publically available at www.mrao.cam.ac.uk/software/cosmonet.Comment: 5 pages, 5 figures, minor changes to match version accepted by MNRAS letter

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Interface modeling in incompressible media using level sets in Escript

Author: A.J. Hale
Arrow
Bramble
Cahouet
Davies
Farnetani
Farnetani
Girault
Greenfield
H.-B. Mühlhaus
Hale
Hess
Hu
L. Bourgouin
L. Gross
Leicht
Leicht
Lutz
Sussman
Tornberg
Zienkiewicz
Publication venue: 'Elsevier BV'
Publication date: 15/08/2007
Field of study

We use a finite element (FEM) formulation of the level set method to model geological fluid flow problems involving interface propagation. Interface problems are ubiquitous in geophysics. Here we focus on a Rayleigh-Taylor instability, namely mantel plumes evolution, and the growth of lava domes. Both problems require the accurate description of the propagation of an interface between heavy and light materials (plume) or between high viscous lava and low viscous air (lava dome), respectively. The implementation of the models is based on Escript which is a Python module for the solution of partial differential equations (PDEs) using spatial discretization techniques such as FEM. It is designed to describe numerical models in the language of PDEs while using computational components implemented in C and C++ to achieve high performance for time-intensive, numerical calculations. A critical step in the solution geological flow problems is the solution of the velocity-pressure problem. We describe how the Escript module can be used for a high-level implementation of an efficient variant of the well-known Uzawa scheme. We begin with a brief outline of the Escript modules and then present illustrations of its usage for the numerical solutions of the problems mentioned above

Crossref

University of Queensland eSpace