Search CORE

4,945 research outputs found

Accelerating SIFT on Parallel Architectures

Author: Apon Amy
Cothren Jackson
Emeneker Wesley
Warn Seth
Publication venue: Clemson University Libraries
Publication date: 01/09/2009
Field of study

SIFT is a widely-used algorithm that extracts features from images; using it to extract information from hundreds of terabytes of aerial and satellite photographs requires parallelization in order to be feasible. We explore accelerating an existing serial SIFT implementation with OpenMP parallelization and GPU execution

Clemson University: TigerPrints

Some fast elliptic solvers on parallel architectures and their complexities

Author: Gallopoulos E.
Saad Youcef
Publication venue
Publication date
Field of study

The discretization of separable elliptic partial differential equations leads to linear systems with special block triangular matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconsistant coefficients. A method was recently proposed to parallelize and vectorize BCR. Here, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches, including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational complexity lower than that of parallel BCR

NASA Technical Reports Server

High Performance Issues on Parallel Architectures

Author: Looges Peter J.
Publication venue: ODU Digital Commons
Publication date: 01/07/1992
Field of study

In an effort to reduce communication latency in mesh-type architectures, these architectures have been augmented by various types of global and reconfigurable bus structures. The static bus structures provide excellent performance in many areas of computation especially structured numerical computations, but they lack the flexibility required of many large numerical and non-numerical applications. Reconfigurable bus systems have the dynamic adaptability to handle a much wider range of applications. While reconfigurable meshes can often yield constant time results for many problems, the cost of this performance is paid in the number of processors required. While in actuality the majority of these processors are employed as switching elements for the bus system and often do little actual computation. In an effort to reduce the processor cost while maintaining performance and communication flexibility, we present a new hybrid parallel array architecture with the goal of optimizing the best features of arrays with global buses and arrays with reconfigurable bus systems. The result is an architecture of n processing elements and a bus interconnection network which requires very basic circuitry to construct and control. This architecture allows prefix computations, such as prefix sum, prefix maximum(minimum) to be accomplished in O(log n) time. These functions then form the building blocks for complex procedures, which more fully exploit the communication flexibility of the architecture. Application of the architecture to graph theory produces optimal algorithms for graph properties such as spanning forest bipartiteness, fundamental cycles, bridges and biconnected components. Other optimal algorithms for the more complex least common ancestor and the connected component problems are also presented. By design, all algorithms maintain optimality for very large sparse graphs. We further examine the architecture\u27s ability to handle basic image processing tasks as well as its potential to simulate other parallel architectures and theoretic models

Old Dominion University

Tree-Searching Algorithms on Parallel Architectures

Author: Mebrotra Mala
Publication venue: W&M ScholarWorks
Publication date: 01/01/1985
Field of study

College of William & Mary: W&M Publish

Multi-Dimensional Numerical Integration on Parallel Architectures

Author: Paterno Marc
Ranjan Desh
Sakiotis Ioannis
Terzic Balsa
Zubair Mohammad
Publication venue: ODU Digital Commons
Publication date: 01/04/2021
Field of study

Multi-dimensional numerical integration is a challenging computational problem that is encountered in many scientific computing applications. Despite extensive research and the development of efficient techniques such as adaptive and Monte Carlo methods, many complex high-dimensional integrands can be too computationally intense even for state-of-the-art numerical libraries such as CUBA, QUADPACK, NAG, and MSL. However, adaptive integration has few dependencies and is very well suited for parallel architectures where processors can operate on different partitions of the integration-space. While existing parallel methods exist, most are simple extensions of their sequential versions. This results in moderate speedup and in many cases failure to significantly surpass the precision capabilities of the sequential methods. We propose a new algorithm for adaptive multi-dimensional integration of challenging integrands for execution on highly parallel architectures. We avoid the common sequential scheme of adaptive-methods in favor of a high-throughput approach better suited for parallel architectures. Experimental results show orders of magnitude speedup over sequential methods and improved performance in terms of maximum attainable precision.https://digitalcommons.odu.edu/gradposters2021_sciences/1002/thumbnail.jp

Old Dominion University

High performance graph analysis on parallel architectures

Author: Grivas Athanasios K
Publication venue: Newcastle University
Publication date: 01/01/2016
Field of study

PhD ThesisOver the last decade pharmacology has been developing computational methods to enhance drug development and testing. A computational method called network pharmacology uses graph analysis tools to determine protein target sets that can lead on better targeted drugs for diseases as Cancer. One promising area of network-based pharmacology is the detection of protein groups that can produce better e ects if they are targeted together by drugs. However, the e cient prediction of such protein combinations is still a bottleneck in the area of computational biology. The computational burden of the algorithms used by such protein prediction strategies to characterise the importance of such proteins consists an additional challenge for the eld of network pharmacology. Such computationally expensive graph algorithms as the all pairs shortest path (APSP) computation can a ect the overall drug discovery process as needed network analysis results cannot be given on time. An ideal solution for these highly intensive computations could be the use of super-computing. However, graph algorithms have datadriven computation dictated by the structure of the graph and this can lead to low compute capacity utilisation with execution times dominated by memory latency. Therefore, this thesis seeks optimised solutions for the real-world graph problems of critical node detection and e ectiveness characterisation emerged from the collaboration with a pioneer company in the eld of network pharmacology as part of a Knowledge Transfer Partnership (KTP) / Secondment (KTS). In particular, we examine how genetic algorithms could bene t the prediction of protein complexes where their removal could produce a more e ective 'druggable' impact. Furthermore, we investigate how the problem of all pairs shortest path (APSP) computation can be bene ted by the use of emerging parallel hardware architectures as GPU- and FPGA- desktop-based accelerators. In particular, we address the problem of critical node detection with the development of a heuristic search method. It is based on a genetic algorithm that computes optimised node combinations where their removal causes greater impact than common impact analysis strategies. Furthermore, we design a general pattern for parallel network analysis on multi-core architectures that considers graph's embedded properties. It is a divide and conquer approach that decomposes a graph into smaller subgraphs based on its strongly connected components and computes the all pairs shortest paths concurrently on GPU. Furthermore, we use linear algebra to design an APSP approach based on the BFS algorithm. We use algebraic expressions to transform the problem of path computation to multiple independent matrix-vector multiplications that are executed concurrently on FPGA. Finally, we analyse how the optimised solutions of perturbation analysis and parallel graph processing provided in this thesis will impact the drug discovery process.This research was part of a Knowledge Transfer Partnership (KTP) and Knowledge Transfer Secondment (KTS) between e-therapeutics PLC and Newcastle University. It was supported as a collaborative project by e-therapeutics PLC and Technology Strategy boar

Newcastle University eTheses

Explicit Cache Management for Volume Ray-Casting on Parallel Architectures

Author: Doggett Michael
Ganestam Per
Jönsson Daniel
Ropinski Timo
Ynnerman Anders
Publication venue: Eurographics - European Association for Computer Graphics
Publication date: 01/01/2012
Field of study

A major challenge when designing general purpose graphics hardware is to allow efficient access to texture data. Although different rendering paradigms vary with respect to their data access patterns, there is no flexibility when it comes to data caching provided by the graphics architecture. In this paper we focus on volume ray-casting, and show the benefits of algorithm-aware data caching. Our Marching Caches method exploits inter-ray coherence and thus utilizes the memory layout of the highly parallel processors by allowing them to share data through a cache which marches along with the ray front. By exploiting Marching Caches we can apply higher-order reconstruction and enhancement filters to generate more accurate and enriched renderings with an improved rendering performance. We have tested our Marching Caches with seven different filters, e. g., Catmul-Rom, B- spline, ambient occlusion projection, and could show that a speed up of four times can be achieved compared to using the caching implicitly provided by the graphics hardware, and that the memory bandwidth to global memory can be reduced by orders of magnitude. Throughout the paper, we will introduce the Marching Cache concept, provide implementation details and discuss the performance and memory bandwidth impact when using different filters

Lund University Publications

Designing Tissue-like P Systems for Image Segmentation on Parallel Architectures

Author: Carnero Iglesias Javier
Díaz Pernil Daniel
Gutiérrez Naranjo Miguel Ángel
Publication venue: Fénix Editora
Publication date: 01/01/2011
Field of study

Problems associated with the treatment of digital images have several interesting features from a bio-inspired point of view. One of them is that they can be suitable for parallel processing, since the same sequential algorithm is usually applied in different regions of the image. In this paper we report a work-in-progress of a hardware implementation in Field Programmable Gate Arrays (FPGAs) of a family of tissue-like P systems which solves the segmentation problem in digital images.Ministerio de Ciencia e Innovación TIN-2009-13192Junta de Andalucía P08-TIC-04200Junta de Andalucía PO6-TIC-02268Ministerio de Educación y Ciencia MTM2009-1271

idUS. Depósito de Investigación Universidad de Sevilla

Efficient list manipulation in combinator-based functional languages on parallel architectures

Author: Sarwar Syed Mansoor
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1988
Field of study

A set of new combinator families and a new list representation are proposed in this dissertation. Each of the proposed combinators expresses certain commonly occurring combinations of lists as functions of those lists or elements of lists. Most of these combinators reshape the structure of the input list so that its elements can be manipulated concurrently. The reduction semantics and proofs of correctness for these combinators are given in the form of strings of already known combinators. It has been shown that the proposed combinators and list structure make the execution of functional programs faster on both sequential and parallel architectures

Digital Repository @ Iowa State University (ISU)