Search CORE

27 research outputs found

Interconnection Networks Embeddings and Efficient Parallel Computations.

Author: Abuelrub Emadeddin Mohamed
Publication venue: LSU Digital Commons
Publication date: 01/01/1993
Field of study

To obtain a greater performance, many processors are allowed to cooperate to solve a single problem. These processors communicate via an interconnection network or a bus. The most essential function of the underlying interconnection network is the efficient interchanging of messages between processes in different processors. Parallel machines based on the hypercube topology have gained a great respect in parallel computation because of its many attractive properties. Many versions of the hypercube have been introduced by many researchers mainly to enhance communications. The twisted hypercube is one of the most attractive versions of the hypercube. It preserves the important features of the hypercube and reduces its diameter by a factor of two. This dissertation investigates relations and transformations between various interconnection networks and the twisted hypercube and explore its efficiency in parallel computation. The capability of the twisted hypercube to simulate complete binary trees, complete quad trees, and rings is demonstrated and compared with the hypercube. Finally, the fault-tolerance of the twisted hypercube is investigated. We present optimal algorithms to simulate rings in a faulty twisted hypercube environment and compare that with the hypercube

Louisiana State University

Processor allocation strategies for modified hypercubes

Author: Haravu Nagasimha G.
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/1992
Field of study

Parallel processing has been widely accepted to be the future in high speed computing. Among the various parallel architectures proposed/implemented, the hypercube has shown a lot of promise because of its poweful properties, like regular topology, fault tolerance, low diameter, simple routing, and ability to efficiently emulate other architectures. The major drawback of the hypercube network is that it can not be expanded in practice because the number of communication ports for each processor grows as the logarithm of the total number of processors in the system. Therefore, once a hypercube supercomputer of a certain dimensionality has been built, any future expansions can be accomplished only by replacing the VLSI chips. This is an undesirable feature and a lot of work has been under progress to eliminate this stymie, thus providing a platform for easier expansion. Modified hypercubes (MHs) have been proposed as the building blocks of hypercube-based systems supporting incremental growth techniques without introducing extra resources for individual hypercubes. However, processor allocation on MHs proves to be a challenge due to a slight deviation in their topology from that of the standard hypercube network. This thesis addresses the issue of processor allocation on MHs and proposes various strategies which are based, partially or entirely, on table look-up approaches. A study of the various task allocation strategies for standard hypercubes is conducted and their suitability for MHs is evaluated. It is shown that the proposed strategies have a perfect subcube recognition ability and a superior performance. Existing processor allocation strategies for pure hypercube networks are demonstrated to be ineffective for MHs, in the light of their inability to recognize all available subcubes. A comparative analysis that involves the buddy strategy and the new strategies is carried out using simulation results

Digital Commons @ New Jersey Institute of Technology (NJIT)

Parallel Computation on Hypercube-Like Machines.

Author: Kwon Kyung Hee
Publication venue: LSU Digital Commons
Publication date: 01/01/1991
Field of study

The hypercube interconnection network has been recognized to be very suitable for a parallel computing architecture due to its attractive topological properties. Recently, several modified hypercubes have been propose to improve the performance of a hypercube. This dissertation deals with two modified hypercubes, the X-hypercube and the Z-cube. The X-hypercube is a variant of the hypercube, with the same amount of hardware but a diameter of only

\lceil

(n + 1)/2

\rceil

in a hypercube of dimension n. The Z-cube has only 75 percent of the edges of a hypercube with the same number vertices and the same diameter as the hypercube. In this dissertation, we investigate some topological properties and the effectiveness of the X-hypercube and the Z-cube in their combinatorial and computational aspects. We give the optimal or nearly optimal data communication algorithms including routing, broadcasting, and census function for the X-hypercube and the Z-cube. We also give the optimal embedding algorithms between the X-hypercube and the hypercube. It is shown that the average distance between vertices in a X-hypercube is roughly 13/16 of that in a hypercube. This implies that a X-hypercube achieves the better average communication performance than a hypercube. In addition, a set of fundamental SIMD algorithms for a X-hypercube is given. Our results indicate that the X-hypercube makes an improvement in performance over the hypercube, but not as much as the reduction in a diameter, and the Z-cube is a good alternative for the hypercube as far as the VLSI implementation is of major concern

Louisiana State University

Fast algorithm for real-time rings reconstruction

Author: Ammendola R.
Bauce Matteo
Biagioni A.
Capuani S.
Chiozzi Stefano
Cotta Ramusino Angelo
Di Domenico Giovanni
Fantechi R.
Fiorini Massimiliano
Giagu S.
Gianoli Alberto
Graverini E.
Lamanna Gianluca
Lonardo A.
Messina A.
Neri Ilaria
Palombo Marco
Pantaleo F.
Paolucci P.S.
Piandani R.
Pontisso L.
Rescigno M.
Simula F.
Sozzi Marco
Vicini P.
Publication venue: Verlag Deutsches Elektronen-Synchrotron
Publication date: 01/01/2015
Field of study

The GAP project is dedicated to study the application of GPU in several contexts in which real-time response is important to take decisions. The definition of real-time depends on the application under study, ranging from answer time of μs up to several hours in case of very computing intensive task. During this conference we presented our work in low level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6]. Apart from the study of dedicated solution to decrease the latency due to data transport and preparation, the computing algorithms play an essential role in any GPU application. In this contribution, we show an original algorithm developed for triggers application, to accelerate the ring reconstruction in RICH detector when it is not possible to have seeds for reconstruction from external trackers

DESY Publication Database

DESY

Archivio istituzionale della ricerca - Università di Ferrara

Archivio della ricerca- Università di Roma La Sapienza

CERN Document Server

HPCCP/CAS Workshop Proceedings 1998

Author: Mata Ellen
Schulbach Catherine
Schulbach Catherine
Publication venue
Publication date
Field of study

This publication is a collection of extended abstracts of presentations given at the HPCCP/CAS (High Performance Computing and Communications Program/Computational Aerosciences Project) Workshop held on August 24-26, 1998, at NASA Ames Research Center, Moffett Field, California. The objective of the Workshop was to bring together the aerospace high performance computing community, consisting of airframe and propulsion companies, independent software vendors, university researchers, and government scientists and engineers. The Workshop was sponsored by the HPCCP Office at NASA Ames Research Center. The Workshop consisted of over 40 presentations, including an overview of NASA's High Performance Computing and Communications Program and the Computational Aerosciences Project; ten sessions of papers representative of the high performance computing research conducted within the Program by the aerospace industry, academia, NASA, and other government laboratories; two panel sessions; and a special presentation by Mr. James Bailey

NASA Technical Reports Server

The Fifth NASA Symposium on VLSI Design

Author
Publication venue
Publication date
Field of study

The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

NASA Technical Reports Server

Bibliography of Lewis Research Center technical publications announced in 1985

Author
Publication venue
Publication date
Field of study

This compilation of abstracts describes and indexes the technical reporting that resulted from the scientific and engineering work performed and managed by the Lewis Research Center in 1985. All the publications were announced in the 1985 issues of STAR (Scientific and Technical Aerospace Reports) and/or IAA (International Aerospace Abstracts). Included are research reports, journal articles, conference presentations, patents and patent applications, and theses

NASA Technical Reports Server

Optimal Simulation of Linear Multiprocessor Architectures on Multiply-twisted Cube Using Generalized Gray Codes

Author: Latifi Shahram
Zheng S. Q.
Publication venue: Digital Scholarship@UNLV
Publication date: 01/06/1996
Field of study

We consider the problem of simulating linear arrays and rings on the multiply twisted cube. We introduce a new concept, the reflected link label sequence, and use it to define a generalized Gray Code (GGC). We show that GGCs can be easily used to identify Hamiltonian paths and cycles in the multiply twisted cube. We also give a method for embedding a ring of arbitrary number of nodes into the multiply twisted cub

University of Nevada, Las Vegas Repository

Domänen parallele Maschinen

Author: Montag Aaron
Publication venue: Technische Universität München
Publication date
Field of study

A computational model is introduced, which abstracts and idealizes computers with access to fragment shaders. While the set of functions computable by this model remains the same, the running times can be drastically reduced through parallelization compared to conventional models. Some of the algorithms designed for the model can be approximated using fragment shaders. With an automatic transcompilation scheme, fragment shader programs can be generated automatically from a description in a high-level language.In dieser Arbeit wird ein Rechenmodell, das Computer mit Zugriff zu Fragment Shader abstrahiert und idealisiert, eingeführt. Zwar bleibt der Umfang der durch dieses Modell berechenbarer Funktionen gleich, jedoch können die Laufzeiten durch Parallelisierung im Vergleich zu herkömmlichen Modellen drastisch verkürzt werden. Einige der für das Modell entworfenen Algorithmen lassen sich mithilfe von Fragment Shadern approximieren. In einer Hochsprache beschriebene Algorithmen werden automatisiert in Fragment Shader Programme übersetzt

Fast Volume Rendering and Deformation Algorithms

Author: Chen Haixin
Publication venue: Universität Mannheim
Publication date: 01/01/2001
Field of study

Volume rendering is a technique for simultaneous visualization of surfaces and inner structures of objects. However, the huge number of volume primitives (voxels) in a volume, leads to high computational cost. In this dissertation I developed two algorithms for the acceleration of volume rendering and volume deformation. The first algorithm accelerates the ray casting of volume. Previous ray casting acceleration techniques like space-leaping and early-ray-termination are only efficient when most voxels in a volume are either opaque or transparent. When many voxels are semi-transparent, the rendering time will increase considerably. Our new algorithm improves the performance of ray casting of semi-transparently mapped volumes by exploiting the opacity coherency in object space, leading to a speedup factor between 1.90 and 3.49 in rendering semi-transparent volumes. The acceleration is realized with the help of pre-computed coherency distances. We developed an efficient algorithm to encode the coherency information, which requires less than 12 seconds for data sets with about 8 million voxels. The second algorithm is for volume deformation. Unlike the traditional methods, our method incorporates the two stages of volume deformation, i.e. deformation and rendering, into a unified process. Instead to deform each voxel to generate an intermediate deformed volume, the algorithm follows inversely deformed rays to generate the desired deformation. The calculations and memory for generating the intermediate volume are thus saved. The deformation continuity is achieved by adaptive ray division which matches the amplitude of local deformation. We proposed approaches for shading and opacit adjustment which guarantee the visual plausibility of deformation results. We achieve an additional deformation speedup factor of 2.34~6.58 by incorporating early-ray-termination, space-leaping and the coherency acceleration technique in the new deformation algorithm

MAnnheim DOCument Server