Search CORE

346 research outputs found

FFT for the APE Parallel Computer

Author: Davies C. T. H.
Federico Toschi
Katz G.
Klaus Schilling
Lippert Th.
Raffaele Tripiccione
Sven Trentmann
Thomas Lippert
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/1997
Field of study

We present a parallel FFT algorithm for SIMD systems following the `Transpose Algorithm' approach. The method is based on the assignment of the data field onto a 1-dimensional ring of systolic cells. The systolic array can be universally mapped onto any parallel system. In particular for systems with next-neighbour connectivity our method has the potential to improve the efficiency of matrix transposition by use of hyper-systolic communication. We have realized a scalable parallel FFT on the APE100/Quadrics massively parallel computer, where our implementation is part of a 2-dimensional hydrodynamics code for turbulence studies. A possible generalization to 4-dimensional FFT is presented, having in mind QCD applications.Comment: 17 pages, 13 figures, figures include

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

Juelich Shared Electronic Resources

CERN Document Server

Automatic visual recognition using parallel machines

Author: Chen Yui-Liang
Publication venue: Digital Commons @ NJIT
Publication date: 31/10/1995
Field of study

Invariant features and quick matching algorithms are two major concerns in the area of automatic visual recognition. The former reduces the size of an established model database, and the latter shortens the computation time. This dissertation, will discussed both line invariants under perspective projection and parallel implementation of a dynamic programming technique for shape recognition. The feasibility of using parallel machines can be demonstrated through the dramatically reduced time complexity. In this dissertation, our algorithms are implemented on the AP1000 MIMD parallel machines. For processing an object with a features, the time complexity of the proposed parallel algorithm is O(n), while that of a uniprocessor is O(n2). The two applications, one for shape matching and the other for chain-code extraction, are used in order to demonstrate the usefulness of our methods. Invariants from four general lines under perspective projection are also discussed in here. In contrast to the approach which uses the epipolar geometry, we investigate the invariants under isotropy subgroups. Theoretically speaking, two independent invariants can be found for four general lines in 3D space. In practice, we show how to obtain these two invariants from the projective images of four general lines without the need of camera calibration. A projective invariant recognition system based on a hypothesis-generation-testing scheme is run on the hypercube parallel architecture. Object recognition is achieved by matching the scene projective invariants to the model projective invariants, called transfer. Then a hypothesis-generation-testing scheme is implemented on the hypercube parallel architecture

Digital Commons @ New Jersey Institute of Technology (NJIT)

DiFi: Fast Distance Field Computation using Graphics Hardware

Author: Avneesh Sud
Dinesh Manocha
Publication venue
Publication date: 23/04/2020
Field of study

Abstract We present a novel algorithm for fast computation of discretized distance fields using graphics hardware

CiteSeerX

Optimal expression evaluation for data parallel architectures

Author: Gilbert John R.
Schreiber Robert
Publication venue
Publication date
Field of study

A data parallel machine represents an array or other composite data structure by allocating one processor (at least conceptually) per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of the operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum-cost way to evaluate an expression, for several different data parallel architectures. This algorithm applies to any architecture in which the metric describing the cost of moving an array is robust. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes. Remarks are made on several variations of the problem, some of which are solved and some of which remain open

NASA Technical Reports Server

Sparse Volumetric Deformation

Author: WILLCOCKS CHRISTOPHER,GEORGE
Publication venue
Publication date: 01/01/2013
Field of study

Volume rendering is becoming increasingly popular as applications require realistic solid shape representations with seamless texture mapping and accurate filtering. However rendering sparse volumetric data is difficult because of the limited memory and processing capabilities of current hardware. To address these limitations, the volumetric information can be stored at progressive resolutions in the hierarchical branches of a tree structure, and sampled according to the region of interest. This means that only a partial region of the full dataset is processed, and therefore massive volumetric scenes can be rendered efficiently. The problem with this approach is that it currently only supports static scenes. This is because it is difficult to accurately deform massive amounts of volume elements and reconstruct the scene hierarchy in real-time. Another problem is that deformation operations distort the shape where more than one volume element tries to occupy the same location, and similarly gaps occur where deformation stretches the elements further than one discrete location. It is also challenging to efficiently support sophisticated deformations at hierarchical resolutions, such as character skinning or physically based animation. These types of deformation are expensive and require a control structure (for example a cage or skeleton) that maps to a set of features to accelerate the deformation process. The problems with this technique are that the varying volume hierarchy reflects different feature sizes, and manipulating the features at the original resolution is too expensive; therefore the control structure must also hierarchically capture features according to the varying volumetric resolution. This thesis investigates the area of deforming and rendering massive amounts of dynamic volumetric content. The proposed approach efficiently deforms hierarchical volume elements without introducing artifacts and supports both ray casting and rasterization renderers. This enables light transport to be modeled both accurately and efficiently with applications in the fields of real-time rendering and computer animation. Sophisticated volumetric deformation, including character animation, is also supported in real-time. This is achieved by automatically generating a control skeleton which is mapped to the varying feature resolution of the volume hierarchy. The output deformations are demonstrated in massive dynamic volumetric scenes

Durham e-Theses

Hierarchical N-Body problem on graphics processor unit

Author: Siddiqui Mohammad Faisal
Publication venue: RIT Scholar Works
Publication date: 15/03/2006
Field of study

Galactic simulation is an important cosmological computation, and represents a classical N-body problem suitable for implementation on vector processors. Barnes-Hut algorithm is a hierarchical N-Body method used to simulate such galactic evolution systems. Stream processing architectures expose data locality and concurrency available in multimedia applications. On the other hand, there are numerous compute-intensive scientific or engineering applications that can potentially benefit from such computational and communication models. These applications are traditionally implemented on vector processors. Stream architecture based graphics processor units (GPUs) present a novel computational alternative for efficiently implementing such high-performance applications. Rendering on a stream architecture sustains high performance, while user-programmable modules allow implementing complex algorithms efficiently. GPUs have evolved over the years, from being fixed-function pipelines to user programmable processors. In this thesis, we focus on the implementation of Barnes-Hut algorithm on typical current-generation programmable GPUs. We exploit computation and communication requirements present in Barnes-Hut algorithm to expose their suitability for user-programmable GPUs. Our implementation of the Barnes-Hut algorithm is formulated as a fragment shader targeting the selected GPU. We discuss implementation details, design issues, results, and challenges encountered in programming the fragment shader

RIT Scholar Works

Recommended from our members

Image Understanding and Robotics Research at Columbia University

Author: Allen Peter K.
Boult Terrance E.
Kender John R.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1989
Field of study

Over the past year, the research investigations of the Vision/Robotics Laboratory at Columbia University have reflected the interests of its four faculty members, two staff programmers, and 16 Ph.D. students. Several of the projects involve other faculty members in the department or the university, or researchers at AT&T, IBM, or Philips. We list below a summary of our interests and results, together with the principal researchers associated with them. Since it is difficult to separate those aspects of robotic research that are purely visual from those that are vision-like (for example, tactile sensing) or vision-related (for example, integrated vision-robotic systems), we have listed all robotic research that is not purely manipulative. The majority of our current investigations are deepenings of work reported last year; this was the second year of both our basic Image Understanding contract and our Strategic Computing contract. Therefore, the form of this year's report closely resembles last year's. Although there are a few new initiatives, mainly we report the new results we have obtained in the same five basic research areas. Much of this work is summarized on a video tape that is available on request. We also note two service contributions this past year. The Special Issue on Computer Vision of the Proceedings of the IEEE, August, 1988, was co-edited by one of us (John Kender [27]). And, the upcoming IEEE Computer Society Conference on Computer Vision and Pattem Recognition, June, 1989, is co-program chaired by one of us (John Kender [23])

Columbia University Academic Commons

Doctor of Philosophy

Author: Fu Zhisong
Publication venue: University of Utah
Publication date: 01/12/2013
Field of study

dissertationPartial differential equations (PDEs) are widely used in science and engineering to model phenomena such as sound, heat, and electrostatics. In many practical science and engineering applications, the solutions of PDEs require the tessellation of computational domains into unstructured meshes and entail computationally expensive and time-consuming processes. Therefore, efficient and fast PDE solving techniques on unstructured meshes are important in these applications. Relative to CPUs, the faster growth curves in the speed and greater power efficiency of the SIMD streaming processors, such as GPUs, have gained them an increasingly important role in the high-performance computing area. Combining suitable parallel algorithms and these streaming processors, we can develop very efficient numerical solvers of PDEs. The contributions of this dissertation are twofold: proposal of two general strategies to design efficient PDE solvers on GPUs and the specific applications of these strategies to solve different types of PDEs. Specifically, this dissertation consists of four parts. First, we describe the general strategies, the domain decomposition strategy and the hybrid gathering strategy. Next, we introduce a parallel algorithm for solving the eikonal equation on fully unstructured meshes efficiently. Third, we present the algorithms and data structures necessary to move the entire FEM pipeline to the GPU. Fourth, we propose a parallel algorithm for solving the levelset equation on fully unstructured 2D or 3D meshes or manifolds. This algorithm combines a narrowband scheme with domain decomposition for efficient levelset equation solving

The University of Utah: J. Willard Marriott Digital Library

A Relaxation Scheme for Mesh Locality in Computer Vision.

Author: Deng Weian
Publication venue: LSU Digital Commons
Publication date: 01/01/1994
Field of study

Parallel processing has been considered as the key to build computer systems of the future and has become a mainstream subject in Computer Science. Computer Vision applications are computationally intensive that require parallel approaches to exploit the intrinsic parallelism. This research addresses this problem for low-level and intermediate-level vision problems. The contributions of this dissertation are a unified scheme based on probabilistic relaxation labeling that captures localities of image data and the ability of using this scheme to develop efficient parallel algorithms for Computer Vision problems. We begin with investigating the problem of skeletonization. The technique of pattern match that exhausts all the possible interaction patterns between a pixel and its neighboring pixels captures the locality of this problem, and leads to an efficient One-pass Parallel Asymmetric Thinning Algorithm (OPATA\sb8). The use of 8-distance in this algorithm, or chessboard distance, not only improves the quality of the resulting skeletons, but also improves the efficiency of the computation. This new algorithm plays an important role in a hierarchical route planning system to extract high level typological information of cross-country mobility maps which greatly speeds up the route searching over large areas. We generalize the neighborhood interaction description method to include more complicated applications such as edge detection and image restoration. The proposed probabilistic relaxation labeling scheme exploit parallelism by discovering local interactions in neighboring areas and by describing them effectively. The proposed scheme consists of a transformation function and a dictionary construction method. The non-linear transformation function is derived from Markov Random Field theory. It efficiently combines evidences from neighborhood interactions. The dictionary construction method provides an efficient way to encode these localities. A case study applies the scheme to the problem of edge detection. The relaxation step of this edge-detection algorithm greatly reduces noise effects, gets better edge localization such as line ends and corners, and plays a crucial rule in refining edge outputs. The experiments on both synthetic and natural images show that our algorithm converges quickly, and is robust in noisy environment

Louisiana State University

Power assignment problems in wireless communication

Author: Funke S.
Laue S.
Naujoks R.
Zvi L.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2006
Field of study

A fundamental class of problems in wireless communication is concerned with the assignment of suitable transmission powers to wireless devices/stations such that the resulting communication graph satisfies certain desired properties and the overall energy consumed is minimized. Many concrete communication tasks in a wireless network like broadcast, multicast, point-to-point routing, creation of a communication backbone, etc. can be regarded as such a power assignment problem. This paper considers several problems of that kind; for example one problem studied before in (Vittorio Bil{\`o} et al: Geometric Clustering to Minimize the Sum of Cluster Sizes, ESA 2005) and (Helmut Alt et al.: Minimum-cost coverage of point sets by disks, SCG 2006) aims to select and assign powers to

k

of the stations such that all other stations are within reach of at least one of the selected stations. We improve the running time for obtaining a

(1+\epsilon)

-approximate solution for this problem from

n^{((\alpha/\epsilon)^{O(d)})}

as reported by Bil{\`o} et al. (see Vittorio Bil{\`o} et al: Geometric Clustering to Minimize the Sum of Cluster Sizes, ESA 2005) to

O\left( n+ {\left(\frac{k^{2d+1}}{\epsilon^d}\right)}^{ \min{\{\; 2k,\;\; (\alpha/\epsilon)^{O(d)} \;\}} } \right)

that is, we obtain a running time that is \emph{linear} in the network size. Further results include a constant approximation algorithm for the TSP problem under squared (non-metric!) edge costs, which can be employed to implement a novel data aggregation protocol, as well as efficient schemes to perform

k

-hop multicasts

MPG.PuRe