855 research outputs found

    Extending Static Synchronization Beyond SIMD and VLIW

    Get PDF
    A key advantage of SIMD (Single Instruction stream, Multiple Data stream) architectures is that synchronization is effected statically at compile-time, hence the execution-time cost of synchronization between ā€œprocessesā€ is essentially zero. VLIW (Very Long Instruction Word) machines are successful in large part because they preserve this property while providing more flexibility in terms of what kinds of operations can be parallelized. In this paper, we propose a new kind of architecture ā€”- the ā€œstatic barrier MIMDā€ or SBM ā€” which can be viewed as a further generalization of the parallel execution abilities of static synchronization machines. Barrier MIMDs are asynchronous Multiple Instruction stream Multiple Data stream architectures capable of parallel execution of loops, subprogram calls, and variable execution- time instructions; however, little or no run-time synchronization is needed. When a group of processors within a barrier MIMD has just encountered a barrier, any conceptual synchronizations between the processors are statically accomplished with zero cost ā€” as in a SIMD or VLIW and using similar compiler technology. Unlike these machines, however, as execution continues the relative timing of processors may become less precisely knowable as a static, compile-time, quantity. Where this imprecision becomes too large, the compiler simply inserts a synchronization barrier to insure that timing imprecision at that point is zero, and again employs purely static, implicit, synchronization. Both the architecture and the supporting compiler technology are discussed in detail

    A computationally efficient framework for large-scale distributed fingerprint matching

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science, School of Computer Science and Applied Mathematics. May 2017.Biometric features have been widely implemented to be utilized for forensic and civil applications. Amongst many diļ¬€erent kinds of biometric characteristics, the ļ¬ngerprint is globally accepted and remains the mostly used biometric characteristic by commercial and industrial societies due to its easy acquisition, uniqueness, stability and reliability. There are currently various eļ¬€ective solutions available, however the ļ¬ngerprint identiļ¬cation is still not considered a fully solved problem mainly due to accuracy and computational time requirements. Although many of the ļ¬ngerprint recognition systems based on minutiae provide good accuracy, the systems with very large databases require fast and real time comparison of ļ¬ngerprints, they often either fail to meet the high performance speed requirements or compromise the accuracy. For ļ¬ngerprint matching that involves databases containing millions of ļ¬ngerprints, real time identiļ¬cation can only be obtained through the implementation of optimal algorithms that may utilize the given hardware as robustly and efļ¬ciently as possible. There are currently no known distributed database and computing framework available that deal with real time solution for ļ¬ngerprint recognition problem involving databases containing as many as sixty million ļ¬ngerprints, the size which is close to the size of the South African population. This research proposal intends to serve two main purposes: 1) exploit and scale the best known minutiae matching algorithm for a minimum of sixty million ļ¬ngerprints; and 2) design a framework for distributed database to deal with large ļ¬ngerprint databases based on the results obtained in the former item.GR201

    Algorithms for Order-Preserving Matching

    Get PDF
    String matching is a widely studied problem in Computer Science. ThereĀ have been many recent developments in this field. One fascinating problemĀ considered lately is the order-preserving matching (OPM) problem. TheĀ task is to find all the substrings in the text which have the same lengthĀ and relative order as the pattern, where the relative order is the numericalĀ order of the numbers in a string. The problem finds its applications inĀ the areas involving time series or series of numbers. More specifically, it isĀ useful for those who are interested in the relative order of the pattern andĀ not in the pattern itself. For example, it can be used by analysts in a stockĀ market to study movements of prices.Ā Ā In addition to the OPM problem, we also studied its approximate variation.Ā In approximate order-preserving matching, we search for those substringsĀ in the text which have relative order similar to the pattern, i.e.,Ā relative order of the pattern matches with at most k mismatches. With respectĀ to applications of order-preserving matching, approximate search isĀ more meaningful than exact search.Ā We developed various advanced solutions for the problem and its variant.Ā Special emphasis was laid on the practical efficiency of the solutions. Particularly,Ā we introduced a simple solution for the OPM problem using filtration.Ā We proved experimentally that our method was effective and fasterĀ than the previous solutions for the problem. In addition, we combined theĀ Single Instruction Multiple Data (SIMD) instruction set architecture with filtration to develop competent solutions which were faster than our previousĀ solution. Moreover, we proposed another efficient solution withoutĀ filtration using the SIMD architecture. We also presented an offline solutionĀ based on the FM-index scheme. Furthermore, we proposed practicalĀ solutions for the approximate order-preserving matching problem and oneĀ of the solutions was the first sublinear solution on average for the problem

    GPU-Accelerated nearest neighbor search for 3d registration

    Get PDF
    Abstract. Nearest Neighbor Search (NNS) is employed by many computer vision algorithms. The computational complexity is large and constitutes a challenge for real-time capability. The basic problem is in rapidly processing a huge amount of data, which is often addressed by means of highly sophisticated search methods and parallelism. We show that NNS based vision algorithms like the Iterative Closest Points algorithm (ICP) can achieve real-time capability while preserving compact size and moderate energy consumption as it is needed in robotics and many other domains. The approach exploits the concept of general purpose computation on graphics processing units (GPGPU) and is compared to parallel processing on CPU. We apply this approach to the 3D scan registration problem, for which a speed-up factor of 88 compared to a sequential CPU implementation is reported

    Concurrent use of two programming tools for heterogeneous supercomputers

    Get PDF
    In this thesis, a demostration of the heterogeneous use of two programming paradigms for heterogeneous computing called Cluster-M and HAsC is presented. Both paradigms can efficiently support heterogeneous networks by preserving a level of abstraction which does not include any architecture mapping details. Furthermore, they are both machine independent and hence are scalable. Unlike, almost all existing heterogeneous orchestration tools which are MIMD based, HAsC is based on the fundamental concepts of SIMD associative computing. HAsC models a heterogeneous network as a coarse grained associative computer and is designed to optimize the execution of problems with large ratios of computations to instructions. Ease of programming and execution speed, not the utilization of idle resources are the primary goals of HAsC On the other hand, Cluster-M is a generic technique that can be applied to both coarse grained as well as fine grained networks. Cluster-M provides an environment for porting various tasks onto the machines in a heterogeneous suite such that resources utilization is maximized and the overall execution time is minimized. An illustration of how these two paradigms can be used together to provide an efficient medium for heterogeneous programming is included. Finally, their scalability is discussed

    Mapping of portable parallel programs

    Get PDF
    An efficient parallel program designed for a parallel architecture includes a detailed outline of accurate assignments of concurrent computations onto processors, and data transfers onto communication links, such that the overall execution time is minimized. This process may be complex depending on the application task and the target multiprocessor architecture. Furthermore, this process is to be repeated for every different architecture even though the application task may be the same. Consequently, this has a major impact on the ever increasing cost of software development for multiprocessor systems. A remedy for this problem would be to design portable parallel programs which can be mapped efficiently onto any computer system. In this dissertation, we present a portable programming tool called Cluster-M. The three components of Cluster-M are the Specification Module, the Representation Module, and the Mapping Module. In the Specification Module, for a given problem, a machine-independent program is generated and represented in the form of a clustered task graph called Spec graph. Similarly, in the Representation Module, for a given architecture or heterogeneous suite of computers, a clustered system graph called Rep graph is generated. The Mapping Module is responsible for efficient mapping of Spec graphs onto Rep graphs. As part of this module, we present the first algorithm which produces a near-optimal mapping of an arbitrary non-uniform machine-independent task graph with M modules, onto an arbitrary non-uniform task-independent system graph having N processors, in 0(M P) time, where P = max(M, N). Our experimental results indicate that Cluster-M produces better or similar mapping results compared to other leading techniques which work only for restricted task or system graphs
    • ā€¦
    corecore