Search CORE

21 research outputs found

Effective Simulation for The Giga-scale Massively Parallel Supercomputer SR2201

Author: Junji Nakagoshi
Kaoru Suzuki
Masato Kurosaki
Shunsuke Miyamoto
Publication venue
Publication date: 11/04/2020
Field of study

FFT for the APE Parallel Computer

Author: Davies C. T. H.
Federico Toschi
Katz G.
Klaus Schilling
Lippert Th.
Raffaele Tripiccione
Sven Trentmann
Thomas Lippert
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/1997
Field of study

We present a parallel FFT algorithm for SIMD systems following the `Transpose Algorithm' approach. The method is based on the assignment of the data field onto a 1-dimensional ring of systolic cells. The systolic array can be universally mapped onto any parallel system. In particular for systems with next-neighbour connectivity our method has the potential to improve the efficiency of matrix transposition by use of hyper-systolic communication. We have realized a scalable parallel FFT on the APE100/Quadrics massively parallel computer, where our implementation is part of a 2-dimensional hydrodynamics code for turbulence studies. A possible generalization to 4-dimensional FFT is presented, having in mind QCD applications.Comment: 17 pages, 13 figures, figures include

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

Juelich Shared Electronic Resources

CERN Document Server

Neuer massiv-paralleler Rechner

Author: Haas Peter
Pöhlmann Heinz W.
Publication venue
Publication date: 07/02/2013
Field of study

Mit dem neuen Supercomputer Hitachi SR2201 stellt das Rechenzentrum seinen Nutzern einen weiteren interessanten Parallelrechner mit Zukunftsperspektive zur Verfügung

Studies on file systems and interconnection networks for enterprise servers

Author: 保田淑子
Publication venue: Waseda University
Publication date: 01/03/2005
Field of study

制度:新 ; 文部省報告番号:乙1956号 ; 学位の種類:博士(工学) ; 授与年月日:2005/3/3 ; 早大学位記番号:新403

Waseda University Repository

Optimization of Parallel FDTD Computations Based on Program Macro Data Flow Graph Transformations

Author: Adam Smyk
Marek Tudruj
Publication venue: 'IntechOpen'
Publication date: 05/07/2011
Field of study

IntechOpen

Crossref

El modelo de computación colectiva: una metodología eficiente para la ampliación del modelo de libreria de paso de mensajes con paralelismo de datos anidados

Author: Sande González Francisco de
Publication venue
Publication date: 01/01/1998
Field of study

Se propone el Modelo de Computación Colectiva para la traslación eficiente de algoritmos con paralelismo de datos anidados sobre arquitecturas paralelas reales. El modelo viene caracterizado por una tripleta (M. Div. Col) donde M representa la plataforma paralela, Div es el conjunto de funciones de división y Col el conjunto de funciones colectivas. Una función se dice colectiva cuando es realizada por todos los procesadores del conjunto actual. Los conjuntos de procesadores pueden ser divididos utilizando las funciones de Div. Se hace una propuesta para una implementación eficiente de los procesos de división con la idea subyacente a de que cada uno de los procesadores de uno de los conjuntos producto de la escisión mantiene una relación con uno (o más) de los procesadores en los otros subconjuntos. Esta relación determina las comunicaciones de los resultados producto de la tarea realizada por el conjunto al que el procesador pertenece. Esta estructura de división da lugar a patrones de comunicaciones que se asemejan a los de un hipercubo. La dimensión viene determinada por el número de divisiones demandadas mientras que la aricidad en cada dimensión es igual al número de subconjuntos solicitados. A semejanza de lo que ocurre en un hipercubo k-ario convencional, una dimensión divide al conjunto en k subconjuntos comunicados a través de la dimensión. Sin embargo, los subconjuntos opuestos según una dimensión no tienen porqué tener el mismo cardinal. A estas estructuras resultantes se las ha denominado Hipercubos Dinámicos. Se presenta una clasificación de problemas paralelos en función de las características de los datos de entrada y de salida de los mismos con respecto a la visión que de ellos tienen los procesadores de la máquina. La nomenclatura introducida se utiliza pra caracterizar los problemas que se pesentan en la memoria. Se aportan ejemplos de algoritmos tanto del tipo de los que se han denominado de Computación Colectiva como de Computación Colectiva Común. Este último tipo de algoritmos resuelven un tipo concreto de problemas según la clasificación introducida. Para ambos tipos de algoritmos se estudian diferentes formas de introducir equilibrado de la carga de trabajo y los resultados que produce cada una de ellas. Se presenta también una herramienta, La Laguna C, que representa una implementación concreta de las ideas subyacentes al Modelo de Computación Colectiva y se exponen los resultados computacionales obtenidos para varios algoritmos en diferentes arquitectura

Repositorio Institucional de la Universidad de La Laguna

HSP-Wrap: The Design and Evaluation of Reusable Parallelism for a Subclass of Data-Intensive Applications

Author: Giblock Paul R.
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2015
Field of study

There is an increasing gap between the rate at which data is generated by scientific and non-scientific fields and the rate at which data can be processed by available computing resources. In this paper, we introduce the fields of Bioinformatics and Cheminformatics; two fields where big data has become a problem due to continuing advances in the technologies that drives these fields: such as gene sequencing and small ligand exploration. We introduce high performance computing as a means to process this growing base of data in order to facilitate knowledge discovery. We enumerate goals of the project including reusability, efficiency, reliability, and scalability. We then describe the implementation of a software scheduler which aims to improve input and output performance of a targeted collection of informatics tools, as well as the profiling and optimization needed to tune the software. We evaluate the performance of the software with a scalability study of the Bioinformatics tools BLAST, HMMER, and MUSCLE; as well as the Cheminformatics tool DOCK6

University of Tennessee, Knoxville: Trace

Recommended from our members

Novel geometry gradient coils for MRI designed by genetic algorithm

Author: Williams Guy Barnett
Publication venue: University of Cambridge
Publication date: 19/06/2001
Field of study

This thesis concerns the design of gradient coils for magnetic resonance imaging systems. The method of design by genetic algorithm optimisation is applied to novel gradient geometries both by use of conventional computer facilities, and, by parallelisation of the design algorithm, on a supercomputer architecture. Geometries and regions of interests which are inaccessible to analytic solution are considered, and the criteria which are difficult to include in such algorithms, such as the robustness of the design, are also included. To exemplify this, in the first instance a two axis biplanar coil was designed and the performance of the genetic algorithm tested and evaluated. The coil was tested computationally; a working example was constructed and tested in a MRI scanner both on phantom objects and on a human knee. Consideration of the usefulness of the coil regions not optimised for linearity for image reconstruction was done. The gradient efficiencies of the final designs in the z and y directions respectively were 0.3 mTm-1A-1 and 0.4 mTm-1A-1 over a 15 cm diameter region of interest. The size of the interior of the gradient set was designed to be 40.0 cm x 24.4 cm x 40.0 cm, to fit within the confines of the bore of an existing scanner. The linearity in the primary direction over the region of optimisation was less than 5% for both coils. The algorithm was extended for operation on a Hitachi SR2201 supercomputer using parallelisation. The performance in this mode was evaluated and found to be favourable in comparison with the standard computer architecture, with an increase in speed in real time of a factor of-!llore than 40 in some configurations of the supercomputer. Various polygonal cross-section design shapes requiring the use of this improved computer performance were optimised and evaluated computationally. Such designs have previously been inaccessible to the genetic algorithm optimisation model. Tests were made between the performance of the genetic algorithm on various similar design problems, and simulated images from such gradient coils were produced. Finally an example of a transverse coaxial return path gradient coil is presented computationally. This coil had an internal diameter of 32 cm, a d external diameter of 44 cm and a length of 40 cm. It achieved a strength of 0.1 mTm- 1A -1 over a cylinder of diameter 20 cm and length 25 cm, with a deviation from linearity of less than 5% over this volume.British Heart Foundatio

Apollo (Cambridge)

Performance analysis of wormhole routing in multicomputer interconnection networks

Author: Sarbazi-Azad Hamid
Publication venue
Publication date: 01/01/2001
Field of study

Perhaps the most critical component in determining the ultimate performance potential of a multicomputer is its interconnection network, the hardware fabric supporting communication among individual processors. The message latency and throughput of such a network are affected by many factors of which topology, switching method, routing algorithm and traffic load are the most significant. In this context, the present study focuses on a performance analysis of k-ary n-cube networks employing wormhole switching, virtual channels and adaptive routing, a scenario of especial interest to current research. This project aims to build upon earlier work in two main ways: constructing new analytical models for k-ary n-cubes, and comparing the performance merits of cubes of different dimensionality. To this end, some important topological properties of k-ary n-cubes are explored initially; in particular, expressions are derived to calculate the number of nodes at/within a given distance from a chosen centre. These results are important in their own right but their primary significance here is to assist in the construction of new and more realistic analytical models of wormhole-routed k-ary n-cubes. An accurate analytical model for wormhole-routed k-ary n-cubes with adaptive routing and uniform traffic is then developed, incorporating the use of virtual channels and the effect of locality in the traffic pattern. New models are constructed for wormhole k-ary n-cubes, with the ability to simulate behaviour under adaptive routing and non-uniform communication workloads, such as hotspot traffic, matrix-transpose and digit-reversal permutation patterns. The models are equally applicable to unidirectional and bidirectional k-ary n-cubes and are significantly more realistic than any in use up to now. With this level of accuracy, the effect of each important network parameter on the overall network performance can be investigated in a more comprehensive manner than before. Finally, k-ary n-cubes of different dimensionality are compared using the new models. The comparison takes account of various traffic patterns and implementation costs, using both pin-out and bisection bandwidth as metrics. Networks with both normal and pipelined channels are considered. While previous similar studies have only taken account of network channel costs, our model incorporates router costs as well thus generating more realistic results. In fact the results of this work differ markedly from those yielded by earlier studies which assumed deterministic routing and uniform traffic, illustrating the importance of using accurate models to conduct such analyses

Glasgow Theses Service