Search CORE

299 research outputs found

Parallel Computers and Complex Systems

Author: G.C. Fox
P.D. Coddington
Publication venue: University Press
Publication date
Field of study

We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex problems in computational science. The use of high performance parallel computers can help improve our understanding of complex systems, and the converse is also true --- we can apply techniques used for the study of complex systems to improve our understanding of parallel computing. We consider parallel computing as the mapping of one complex system --- typically a model of the world --- into another complex system --- the parallel computer. We study static, dynamic, spatial and temporal properties of both the complex systems and the map between them. The result is a better understanding of which computer architectures are good for which problems, and of software structure, automatic partitioning of data, and the performance of parallel machines

CiteSeerX

Customizable vector acceleration in extreme-edge computing. A risc-v software/hardware architecture study on VGG-16 implementation

Author: Cheikh A.
Mastrandrea A.
Menichelli F.
Olivieri M.
Sordillo S.
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

Computing in the cloud-edge continuum, as opposed to cloud computing, relies on high performance processing on the extreme edge of the Internet of Things (IoT) hierarchy. Hardware acceleration is a mandatory solution to achieve the performance requirements, yet it can be tightly tied to particular computation kernels, even within the same application. Vector-oriented hardware acceleration has gained renewed interest to support artificial intelligence (AI) applications like convolutional networks or classification algorithms. We present a comprehensive investigation of the performance and power efficiency achievable by configurable vector acceleration subsystems, obtaining evidence of both the high potential of the proposed microarchitecture and the advantage of hardware customization in total transparency to the software program

Directory of Open Access Journals

Archivio della ricerca- Università di Roma La Sapienza

The development of a multi-layer architecture for image processing

Author: Fung Yu Fai
Publication venue: UCL (University College London)
Publication date: 01/01/1991
Field of study

The extraction of useful information from an image involves a series of operations, which can be functionally divided into low-level, intermediate-level and high- level processing. Because different amounts of computing power may be demanded by each level, a system which can simultaneously carry out operations at different levels is desirable. A multi-layer system which embodies both functional and spatial parallelism is envisioned. This thesis describes the development of a three-layer architecture which is designed to tackle vision problems embodying operations in each processing level. A survey of various multi-layer and multi-processor systems is carried out and a set of guidelines for the design of a multi-layer image processing system is established. The linear array is proposed as a possible basis for multi-layer systems and a significant part of the thesis is concerned with a study of this structure. The CLIP7A system, which is a linear array with 256 processing elements, is examined in depth. The CLIP7A system operates under SIMD control, enhanced by local autonomy. In order to examine the possible benefits of this arrangement, image processing algorithms which exploit the autonomous functions are implemented. Additionally, the structural properties of linear arrays are also studied. Information regarding typical computing requirements in each layer and the communication networks between elements in different layers is obtained by applying the CLIP7A system to solve an integrated vision problem. From the results obtained, a three layer architecture is proposed. The system has 256, 16 and 4 processing elements in the low, intermediate and high level layer respectively. The processing elements will employ a 16-bit microprocessor as the computing unit, which is selected from off-the-shelf components. Communication between elements in consecutive layers is via two different networks, which are designed so that efficient data transfer is achieved. Additionally, the networks enable the system to maintain fault tolerance and to permit expansion in the second and third layers

UCL Discovery

Computer vision algorithms on reconfigurable logic arrays

Author: A.K. Jain
N.K. Ratha
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Massively Parallel Associative String Processor (ASP) for High Energy Physics

Author: Vesztergombi G
Publication venue: CERN
Publication date: 01/01/1995
Field of study

CERN Document Server

Meteorological modelling on the ICL distributed array processor and other parallel computers

Author: Carver Glenn Derek
Publication venue: The University of Edinburgh
Publication date: 01/01/1990
Field of study

Edinburgh Research Archive

Mapping Signal Processing Algorithms on Parallel Arcidtectures

Author: Sammur Nidal M.
Publication venue: 'Oklahoma State University Library'
Publication date: 01/07/1992
Field of study

Electrical Engineerin

SHAREOK repository

A unified programming system for a multi-paradigm parallel architecture

Author: Vaudin John
Publication venue
Publication date
Field of study

Real time image understanding and image generation require very large amounts of computing power. A possible way to meet these requirements is to make use of the power available from parallel computing systems. However parallel machines exhibit performance which is highly dependent on the algorithms being executed. Both image understanding and image generation involve the use of a wide variety of algorithms. A parallel machine suited to some of these algorithms may be unsuited to others. This thesis describes a novel heterogeneous parallel architecture optimised for image based applications. It achieves its performance by combining two different forms of parallel architecture, namely fine grain SIMD and course grain MIMD, into a single architecture. In this way it is possible to match the most appropriate computing resource to each algorithm in a given application. As important as the architecture itself is a method for programming it. This thesis describes a novel multi-paradigm programming language based on C++, which allows programs which make use of both control and data parallelism to be expressed in a single coherent framework, based on object oriented programming. To demonstrate the utility of both the architecture and the programming system, two applications, one from the field of image understanding the other image generation are examined. These applications combine some novel algorithms with other novel implementation approaches to provide the most effective mapping onto this architecture

Warwick Research Archives Portal Repository

Automatic visual recognition using parallel machines

Author: Chen Yui-Liang
Publication venue: Digital Commons @ NJIT
Publication date: 31/10/1995
Field of study

Invariant features and quick matching algorithms are two major concerns in the area of automatic visual recognition. The former reduces the size of an established model database, and the latter shortens the computation time. This dissertation, will discussed both line invariants under perspective projection and parallel implementation of a dynamic programming technique for shape recognition. The feasibility of using parallel machines can be demonstrated through the dramatically reduced time complexity. In this dissertation, our algorithms are implemented on the AP1000 MIMD parallel machines. For processing an object with a features, the time complexity of the proposed parallel algorithm is O(n), while that of a uniprocessor is O(n2). The two applications, one for shape matching and the other for chain-code extraction, are used in order to demonstrate the usefulness of our methods. Invariants from four general lines under perspective projection are also discussed in here. In contrast to the approach which uses the epipolar geometry, we investigate the invariants under isotropy subgroups. Theoretically speaking, two independent invariants can be found for four general lines in 3D space. In practice, we show how to obtain these two invariants from the projective images of four general lines without the need of camera calibration. A projective invariant recognition system based on a hypothesis-generation-testing scheme is run on the hypercube parallel architecture. Object recognition is achieved by matching the scene projective invariants to the model projective invariants, called transfer. Then a hypothesis-generation-testing scheme is implemented on the hypercube parallel architecture

Digital Commons @ New Jersey Institute of Technology (NJIT)