346 research outputs found

    FFT for the APE Parallel Computer

    Get PDF
    We present a parallel FFT algorithm for SIMD systems following the `Transpose Algorithm' approach. The method is based on the assignment of the data field onto a 1-dimensional ring of systolic cells. The systolic array can be universally mapped onto any parallel system. In particular for systems with next-neighbour connectivity our method has the potential to improve the efficiency of matrix transposition by use of hyper-systolic communication. We have realized a scalable parallel FFT on the APE100/Quadrics massively parallel computer, where our implementation is part of a 2-dimensional hydrodynamics code for turbulence studies. A possible generalization to 4-dimensional FFT is presented, having in mind QCD applications.Comment: 17 pages, 13 figures, figures include

    Automatic visual recognition using parallel machines

    Get PDF
    Invariant features and quick matching algorithms are two major concerns in the area of automatic visual recognition. The former reduces the size of an established model database, and the latter shortens the computation time. This dissertation, will discussed both line invariants under perspective projection and parallel implementation of a dynamic programming technique for shape recognition. The feasibility of using parallel machines can be demonstrated through the dramatically reduced time complexity. In this dissertation, our algorithms are implemented on the AP1000 MIMD parallel machines. For processing an object with a features, the time complexity of the proposed parallel algorithm is O(n), while that of a uniprocessor is O(n2). The two applications, one for shape matching and the other for chain-code extraction, are used in order to demonstrate the usefulness of our methods. Invariants from four general lines under perspective projection are also discussed in here. In contrast to the approach which uses the epipolar geometry, we investigate the invariants under isotropy subgroups. Theoretically speaking, two independent invariants can be found for four general lines in 3D space. In practice, we show how to obtain these two invariants from the projective images of four general lines without the need of camera calibration. A projective invariant recognition system based on a hypothesis-generation-testing scheme is run on the hypercube parallel architecture. Object recognition is achieved by matching the scene projective invariants to the model projective invariants, called transfer. Then a hypothesis-generation-testing scheme is implemented on the hypercube parallel architecture

    DiFi: Fast Distance Field Computation using Graphics Hardware

    Get PDF
    Abstract We present a novel algorithm for fast computation of discretized distance fields using graphics hardware

    Optimal expression evaluation for data parallel architectures

    Get PDF
    A data parallel machine represents an array or other composite data structure by allocating one processor (at least conceptually) per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of the operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum-cost way to evaluate an expression, for several different data parallel architectures. This algorithm applies to any architecture in which the metric describing the cost of moving an array is robust. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes. Remarks are made on several variations of the problem, some of which are solved and some of which remain open

    Sparse Volumetric Deformation

    Get PDF
    Volume rendering is becoming increasingly popular as applications require realistic solid shape representations with seamless texture mapping and accurate filtering. However rendering sparse volumetric data is difficult because of the limited memory and processing capabilities of current hardware. To address these limitations, the volumetric information can be stored at progressive resolutions in the hierarchical branches of a tree structure, and sampled according to the region of interest. This means that only a partial region of the full dataset is processed, and therefore massive volumetric scenes can be rendered efficiently. The problem with this approach is that it currently only supports static scenes. This is because it is difficult to accurately deform massive amounts of volume elements and reconstruct the scene hierarchy in real-time. Another problem is that deformation operations distort the shape where more than one volume element tries to occupy the same location, and similarly gaps occur where deformation stretches the elements further than one discrete location. It is also challenging to efficiently support sophisticated deformations at hierarchical resolutions, such as character skinning or physically based animation. These types of deformation are expensive and require a control structure (for example a cage or skeleton) that maps to a set of features to accelerate the deformation process. The problems with this technique are that the varying volume hierarchy reflects different feature sizes, and manipulating the features at the original resolution is too expensive; therefore the control structure must also hierarchically capture features according to the varying volumetric resolution. This thesis investigates the area of deforming and rendering massive amounts of dynamic volumetric content. The proposed approach efficiently deforms hierarchical volume elements without introducing artifacts and supports both ray casting and rasterization renderers. This enables light transport to be modeled both accurately and efficiently with applications in the fields of real-time rendering and computer animation. Sophisticated volumetric deformation, including character animation, is also supported in real-time. This is achieved by automatically generating a control skeleton which is mapped to the varying feature resolution of the volume hierarchy. The output deformations are demonstrated in massive dynamic volumetric scenes

    Hierarchical N-Body problem on graphics processor unit

    Get PDF
    Galactic simulation is an important cosmological computation, and represents a classical N-body problem suitable for implementation on vector processors. Barnes-Hut algorithm is a hierarchical N-Body method used to simulate such galactic evolution systems. Stream processing architectures expose data locality and concurrency available in multimedia applications. On the other hand, there are numerous compute-intensive scientific or engineering applications that can potentially benefit from such computational and communication models. These applications are traditionally implemented on vector processors. Stream architecture based graphics processor units (GPUs) present a novel computational alternative for efficiently implementing such high-performance applications. Rendering on a stream architecture sustains high performance, while user-programmable modules allow implementing complex algorithms efficiently. GPUs have evolved over the years, from being fixed-function pipelines to user programmable processors. In this thesis, we focus on the implementation of Barnes-Hut algorithm on typical current-generation programmable GPUs. We exploit computation and communication requirements present in Barnes-Hut algorithm to expose their suitability for user-programmable GPUs. Our implementation of the Barnes-Hut algorithm is formulated as a fragment shader targeting the selected GPU. We discuss implementation details, design issues, results, and challenges encountered in programming the fragment shader

    Doctor of Philosophy

    Get PDF
    dissertationPartial differential equations (PDEs) are widely used in science and engineering to model phenomena such as sound, heat, and electrostatics. In many practical science and engineering applications, the solutions of PDEs require the tessellation of computational domains into unstructured meshes and entail computationally expensive and time-consuming processes. Therefore, efficient and fast PDE solving techniques on unstructured meshes are important in these applications. Relative to CPUs, the faster growth curves in the speed and greater power efficiency of the SIMD streaming processors, such as GPUs, have gained them an increasingly important role in the high-performance computing area. Combining suitable parallel algorithms and these streaming processors, we can develop very efficient numerical solvers of PDEs. The contributions of this dissertation are twofold: proposal of two general strategies to design efficient PDE solvers on GPUs and the specific applications of these strategies to solve different types of PDEs. Specifically, this dissertation consists of four parts. First, we describe the general strategies, the domain decomposition strategy and the hybrid gathering strategy. Next, we introduce a parallel algorithm for solving the eikonal equation on fully unstructured meshes efficiently. Third, we present the algorithms and data structures necessary to move the entire FEM pipeline to the GPU. Fourth, we propose a parallel algorithm for solving the levelset equation on fully unstructured 2D or 3D meshes or manifolds. This algorithm combines a narrowband scheme with domain decomposition for efficient levelset equation solving

    A Relaxation Scheme for Mesh Locality in Computer Vision.

    Get PDF
    Parallel processing has been considered as the key to build computer systems of the future and has become a mainstream subject in Computer Science. Computer Vision applications are computationally intensive that require parallel approaches to exploit the intrinsic parallelism. This research addresses this problem for low-level and intermediate-level vision problems. The contributions of this dissertation are a unified scheme based on probabilistic relaxation labeling that captures localities of image data and the ability of using this scheme to develop efficient parallel algorithms for Computer Vision problems. We begin with investigating the problem of skeletonization. The technique of pattern match that exhausts all the possible interaction patterns between a pixel and its neighboring pixels captures the locality of this problem, and leads to an efficient One-pass Parallel Asymmetric Thinning Algorithm (OPATA\sb8). The use of 8-distance in this algorithm, or chessboard distance, not only improves the quality of the resulting skeletons, but also improves the efficiency of the computation. This new algorithm plays an important role in a hierarchical route planning system to extract high level typological information of cross-country mobility maps which greatly speeds up the route searching over large areas. We generalize the neighborhood interaction description method to include more complicated applications such as edge detection and image restoration. The proposed probabilistic relaxation labeling scheme exploit parallelism by discovering local interactions in neighboring areas and by describing them effectively. The proposed scheme consists of a transformation function and a dictionary construction method. The non-linear transformation function is derived from Markov Random Field theory. It efficiently combines evidences from neighborhood interactions. The dictionary construction method provides an efficient way to encode these localities. A case study applies the scheme to the problem of edge detection. The relaxation step of this edge-detection algorithm greatly reduces noise effects, gets better edge localization such as line ends and corners, and plays a crucial rule in refining edge outputs. The experiments on both synthetic and natural images show that our algorithm converges quickly, and is robust in noisy environment

    Power assignment problems in wireless communication

    No full text
    A fundamental class of problems in wireless communication is concerned with the assignment of suitable transmission powers to wireless devices/stations such that the resulting communication graph satisfies certain desired properties and the overall energy consumed is minimized. Many concrete communication tasks in a wireless network like broadcast, multicast, point-to-point routing, creation of a communication backbone, etc. can be regarded as such a power assignment problem. This paper considers several problems of that kind; for example one problem studied before in (Vittorio Bil{\`o} et al: Geometric Clustering to Minimize the Sum of Cluster Sizes, ESA 2005) and (Helmut Alt et al.: Minimum-cost coverage of point sets by disks, SCG 2006) aims to select and assign powers to kk of the stations such that all other stations are within reach of at least one of the selected stations. We improve the running time for obtaining a (1+ϵ)(1+\epsilon)-approximate solution for this problem from n((α/ϵ)O(d))n^{((\alpha/\epsilon)^{O(d)})} as reported by Bil{\`o} et al. (see Vittorio Bil{\`o} et al: Geometric Clustering to Minimize the Sum of Cluster Sizes, ESA 2005) to O(n+(k2d+1ϵd)min{  2k,    (α/ϵ)O(d)  })O\left( n+ {\left(\frac{k^{2d+1}}{\epsilon^d}\right)}^{ \min{\{\; 2k,\;\; (\alpha/\epsilon)^{O(d)} \;\}} } \right) that is, we obtain a running time that is \emph{linear} in the network size. Further results include a constant approximation algorithm for the TSP problem under squared (non-metric!) edge costs, which can be employed to implement a novel data aggregation protocol, as well as efficient schemes to perform kk-hop multicasts
    corecore