21 research outputs found

    FFT for the APE Parallel Computer

    Get PDF
    We present a parallel FFT algorithm for SIMD systems following the `Transpose Algorithm' approach. The method is based on the assignment of the data field onto a 1-dimensional ring of systolic cells. The systolic array can be universally mapped onto any parallel system. In particular for systems with next-neighbour connectivity our method has the potential to improve the efficiency of matrix transposition by use of hyper-systolic communication. We have realized a scalable parallel FFT on the APE100/Quadrics massively parallel computer, where our implementation is part of a 2-dimensional hydrodynamics code for turbulence studies. A possible generalization to 4-dimensional FFT is presented, having in mind QCD applications.Comment: 17 pages, 13 figures, figures include

    Neuer massiv-paralleler Rechner

    Get PDF
    Mit dem neuen Supercomputer Hitachi SR2201 stellt das Rechenzentrum seinen Nutzern einen weiteren interessanten Parallelrechner mit Zukunftsperspektive zur Verfügung

    Studies on file systems and interconnection networks for enterprise servers

    Get PDF
    制度:新 ; 文部省報告番号:乙1956号 ; 学位の種類:博士(工学) ; 授与年月日:2005/3/3 ; 早大学位記番号:新403

    El modelo de computación colectiva: una metodología eficiente para la ampliación del modelo de libreria de paso de mensajes con paralelismo de datos anidados

    Get PDF
    Se propone el Modelo de Computación Colectiva para la traslación eficiente de algoritmos con paralelismo de datos anidados sobre arquitecturas paralelas reales. El modelo viene caracterizado por una tripleta (M. Div. Col) donde M representa la plataforma paralela, Div es el conjunto de funciones de división y Col el conjunto de funciones colectivas. Una función se dice colectiva cuando es realizada por todos los procesadores del conjunto actual. Los conjuntos de procesadores pueden ser divididos utilizando las funciones de Div. Se hace una propuesta para una implementación eficiente de los procesos de división con la idea subyacente a de que cada uno de los procesadores de uno de los conjuntos producto de la escisión mantiene una relación con uno (o más) de los procesadores en los otros subconjuntos. Esta relación determina las comunicaciones de los resultados producto de la tarea realizada por el conjunto al que el procesador pertenece. Esta estructura de división da lugar a patrones de comunicaciones que se asemejan a los de un hipercubo. La dimensión viene determinada por el número de divisiones demandadas mientras que la aricidad en cada dimensión es igual al número de subconjuntos solicitados. A semejanza de lo que ocurre en un hipercubo k-ario convencional, una dimensión divide al conjunto en k subconjuntos comunicados a través de la dimensión. Sin embargo, los subconjuntos opuestos según una dimensión no tienen porqué tener el mismo cardinal. A estas estructuras resultantes se las ha denominado Hipercubos Dinámicos. Se presenta una clasificación de problemas paralelos en función de las características de los datos de entrada y de salida de los mismos con respecto a la visión que de ellos tienen los procesadores de la máquina. La nomenclatura introducida se utiliza pra caracterizar los problemas que se pesentan en la memoria. Se aportan ejemplos de algoritmos tanto del tipo de los que se han denominado de Computación Colectiva como de Computación Colectiva Común. Este último tipo de algoritmos resuelven un tipo concreto de problemas según la clasificación introducida. Para ambos tipos de algoritmos se estudian diferentes formas de introducir equilibrado de la carga de trabajo y los resultados que produce cada una de ellas. Se presenta también una herramienta, La Laguna C, que representa una implementación concreta de las ideas subyacentes al Modelo de Computación Colectiva y se exponen los resultados computacionales obtenidos para varios algoritmos en diferentes arquitectura

    HSP-Wrap: The Design and Evaluation of Reusable Parallelism for a Subclass of Data-Intensive Applications

    Get PDF
    There is an increasing gap between the rate at which data is generated by scientific and non-scientific fields and the rate at which data can be processed by available computing resources. In this paper, we introduce the fields of Bioinformatics and Cheminformatics; two fields where big data has become a problem due to continuing advances in the technologies that drives these fields: such as gene sequencing and small ligand exploration. We introduce high performance computing as a means to process this growing base of data in order to facilitate knowledge discovery. We enumerate goals of the project including reusability, efficiency, reliability, and scalability. We then describe the implementation of a software scheduler which aims to improve input and output performance of a targeted collection of informatics tools, as well as the profiling and optimization needed to tune the software. We evaluate the performance of the software with a scalability study of the Bioinformatics tools BLAST, HMMER, and MUSCLE; as well as the Cheminformatics tool DOCK6

    Performance analysis of wormhole routing in multicomputer interconnection networks

    Get PDF
    Perhaps the most critical component in determining the ultimate performance potential of a multicomputer is its interconnection network, the hardware fabric supporting communication among individual processors. The message latency and throughput of such a network are affected by many factors of which topology, switching method, routing algorithm and traffic load are the most significant. In this context, the present study focuses on a performance analysis of k-ary n-cube networks employing wormhole switching, virtual channels and adaptive routing, a scenario of especial interest to current research. This project aims to build upon earlier work in two main ways: constructing new analytical models for k-ary n-cubes, and comparing the performance merits of cubes of different dimensionality. To this end, some important topological properties of k-ary n-cubes are explored initially; in particular, expressions are derived to calculate the number of nodes at/within a given distance from a chosen centre. These results are important in their own right but their primary significance here is to assist in the construction of new and more realistic analytical models of wormhole-routed k-ary n-cubes. An accurate analytical model for wormhole-routed k-ary n-cubes with adaptive routing and uniform traffic is then developed, incorporating the use of virtual channels and the effect of locality in the traffic pattern. New models are constructed for wormhole k-ary n-cubes, with the ability to simulate behaviour under adaptive routing and non-uniform communication workloads, such as hotspot traffic, matrix-transpose and digit-reversal permutation patterns. The models are equally applicable to unidirectional and bidirectional k-ary n-cubes and are significantly more realistic than any in use up to now. With this level of accuracy, the effect of each important network parameter on the overall network performance can be investigated in a more comprehensive manner than before. Finally, k-ary n-cubes of different dimensionality are compared using the new models. The comparison takes account of various traffic patterns and implementation costs, using both pin-out and bisection bandwidth as metrics. Networks with both normal and pipelined channels are considered. While previous similar studies have only taken account of network channel costs, our model incorporates router costs as well thus generating more realistic results. In fact the results of this work differ markedly from those yielded by earlier studies which assumed deterministic routing and uniform traffic, illustrating the importance of using accurate models to conduct such analyses
    corecore