Search CORE

70 research outputs found

Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications

Author: Fischer James R.
Grosch Chester
Mcanulty Michael
Odonnell John
Storey Owen
Publication venue
Publication date
Field of study

NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era

NASA Technical Reports Server

One machine, one minute, three billion tetrahedra

Author: Marot Célestin
Pellerin Jeanne
Remacle Jean-François
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

This paper presents a new scalable parallelization scheme to generate the 3D Delaunay triangulation of a given set of points. Our first contribution is an efficient serial implementation of the incremental Delaunay insertion algorithm. A simple dedicated data structure, an efficient sorting of the points and the optimization of the insertion algorithm have permitted to accelerate reference implementations by a factor three. Our second contribution is a multi-threaded version of the Delaunay kernel that is able to concurrently insert vertices. Moore curve coordinates are used to partition the point set, avoiding heavy synchronization overheads. Conflicts are managed by modifying the partitions with a simple rescaling of the space-filling curve. The performances of our implementation have been measured on three different processors, an Intel core-i7, an Intel Xeon Phi and an AMD EPYC, on which we have been able to compute 3 billion tetrahedra in 53 seconds. This corresponds to a generation rate of over 55 million tetrahedra per second. We finally show how this very efficient parallel Delaunay triangulation can be integrated in a Delaunay refinement mesh generator which takes as input the triangulated surface boundary of the volume to mesh

arXiv.org e-Print Archive

DIAL UCLouvain

Multi-dimensional image-space visualisation on data-parallel computers

Author: Vezina Guy
Publication venue
Publication date
Field of study

The Australian National University

Computational Methods and Graphical Processing Units for Real-time Control of Tomographic Adaptive Optics on Extremely Large Telescopes.

Author: DIMOUDI SOFIA
Publication venue
Publication date: 01/01/2015
Field of study

Ground based optical telescopes suffer from limited imaging resolution as a result of the effects of atmospheric turbulence on the incoming light. Adaptive optics technology has so far been very successful in correcting these effects, providing nearly diffraction limited images. Extremely Large Telescopes will require more complex Adaptive Optics configurations that introduce the need for new mathematical models and optimal solvers. In addition, the amount of data to be processed in real time is also greatly increased, making the use of conventional computational methods and hardware inefficient, which motivates the study of advanced computational algorithms, and implementations on parallel processors. Graphical Processing Units (GPUs) are massively parallel processors that have so far demonstrated a very high increase in speed compared to CPUs and other devices, and they have a high potential to meet the real-time restrictions of adaptive optics systems. This thesis focuses on the study and evaluation of existing proposed computational algorithms with respect to computational performance, and their implementation on GPUs. Two basic methods, one direct and one iterative are implemented and tested and the results presented provide an evaluation of the basic concept upon which other algorithms are based, and demonstrate the benefits of using GPUs for adaptive optics

Durham e-Theses

Comparative Benchmarking Analysis of Next-Generation Space Processors

Author: Gretok Evan
Publication venue
Publication date: 18/06/2019
Field of study

Researchers, corporations, and government entities are seeking to deploy increasingly compute-intensive workloads on space platforms. This need is driving the development of two new radiation-hardened, multi-core space processors, the BAE Systems RAD5545(TM) processor and the Boeing High-Performance Spaceflight Computing (HPSC) processor. As these systems are in the development phase as of this writing, the Freescale P5020DS and P5040DS systems, based on the same PowerPC e5500 architecture as the RAD5545 processor, and the Hardkernel ODROID-C2, sharing the same ARM Cortex-A53 core as the HPSC processor, were selected as facsimiles for evaluation. Several OpenMP-parallelized applications, including a color search, Sobel filter, Mandelbrot set generator, hyperspectral-imaging target classifier, and image thumbnailer, were benchmarked on these processing platforms. Performance and energy consumption results on these facsimiles were scaled to forecasted frequencies of the radiationhardened devices in development. In these studies, the RAD5545 achieved the highest and most consistent parallel efficiency, up to 99%. The HPSC processor achieved lower execution times, averaging about half that of the RAD5545 processor, with lower energy consumption. The evaluated applications achieved a speedup of 3.9 times across four cores. The frequency-scaling methods were validated by comparing the set of scaled measures with data points from an underclocked facsimile, which yielded an average accuracy of 97% between estimated and measured results. These performance outcomes help to quantify the capabilities of both the RAD5545 and HPSC processors for on-board parallel processing of computationally-demanding applications for future space missions

D-Scholarship@Pitt

GPU accelerated procedural terrain generation : a thesis presented in partial fulfilment of the requirements for the degree Master of Science in Computer Science at Massey University, Albany, New Zealand

Author: Kim Richard Changwoo
Publication venue: 'Massey University'
Publication date: 01/01/2021
Field of study

Virtual terrain is often used as the large scale background of computer graphics scenes. While virtual terrain is essential for representing landscapes, manual reproduction of such large-scale objects from scratch is time-consuming and costly for human artists. Many algorithmic generation methods have been proposed as an alternative solution to manual reproduction. However, those methods are still limited when needing them to be employed in a wide range of applications. Alternatively, simulation of the stream power equation can effectively model landscape evolution at large temporal and spatial scales by simulating the land-forming process. This equation was successfully employed by a previous study in terrain generation. However, the unoptimised pipeline implementation of the method suffers from long computation time on the increased simulation size. Graphics processing units (GPUs) provide significantly higher computational throughput for massively parallel problems over conventional multi-core CPUs. The previous study proposed a general parallel algorithm to compute the simulation pipeline, but is design for any multi-core hardware and does not fully utilise the computing power of GPUs. This study seeks to develop an optimised pipeline of the original stream power equation method for GPUs. Results showed that the new parallel GPU algorithm consistently had higher performance (about 300% for GTX 780 and 900% for RTX 2070 Super) recent octa-core CPU (Intel i7 9700k 4.9 Ghz). It also consistently showed a 300% improvement in performance over the previous parallel algorithm on GPUs. The new algorithm significantly outperformed the fastest parallel algorithm available, while still being able to produce the same terrain result as the original stream power equation method. This advancement in computational performance allows the algorithm method to generate precise geological details of terrain while providing reasonable computation time for the method to be employed in a broader range of applications

Massey Research Online

Implementing intersection calculations of the ray tracing algorithm with systolic arrays

Author: Stankus Andrea
Publication venue: RIT Scholar Works
Publication date: 01/01/1987
Field of study

Ray tracing is one technique that has been used to synthesize realistic images with a computer. Unfortunately, this technique, when implemented in software, is slow and expensive. The trend in computer graphics has been toward the use of special purpose hardware, to speed up the calculations, and, hence, the generation of the synthesized image. This paper describes the design and the operation of a systolic based architecture, tailored to speed up the intersection calculations, that must be performed as a part of the ray tracing algorithm

RIT Scholar Works

Parallel Mesh Processing

Author: Derzapf Evgenij
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2012
Field of study

Die aktuelle Forschung im Bereich der Computergrafik versucht den zunehmenden Ansprüchen der Anwender gerecht zu werden und erzeugt immer realistischer wirkende Bilder. Dementsprechend werden die Szenen und Verfahren, die zur Darstellung der Bilder genutzt werden, immer komplexer. So eine Entwicklung ist unweigerlich mit der Steigerung der erforderlichen Rechenleistung verbunden, da die Modelle, aus denen eine Szene besteht, aus Milliarden von Polygonen bestehen können und in Echtzeit dargestellt werden müssen. Die realistische Bilddarstellung ruht auf drei Säulen: Modelle, Materialien und Beleuchtung. Heutzutage gibt es einige Verfahren für effiziente und realistische Approximation der globalen Beleuchtung. Genauso existieren Algorithmen zur Erstellung von realistischen Materialien. Es gibt zwar auch Verfahren für das Rendering von Modellen in Echtzeit, diese funktionieren aber meist nur für Szenen mittlerer Komplexität und scheitern bei sehr komplexen Szenen. Die Modelle bilden die Grundlage einer Szene; deren Optimierung hat unmittelbare Auswirkungen auf die Effizienz der Verfahren zur Materialdarstellung und Beleuchtung, so dass erst eine optimierte Modellrepräsentation eine Echtzeitdarstellung ermöglicht. Viele der in der Computergrafik verwendeten Modelle werden mit Hilfe der Dreiecksnetze repräsentiert. Das darin enthaltende Datenvolumen ist enorm, um letztlich den Detailreichtum der jeweiligen Objekte darstellen bzw. den wachsenden Realitätsanspruch bewältigen zu können. Das Rendern von komplexen, aus Millionen von Dreiecken bestehenden Modellen stellt selbst für moderne Grafikkarten eine große Herausforderung dar. Daher ist es insbesondere für die Echtzeitsimulationen notwendig, effiziente Algorithmen zu entwickeln. Solche Algorithmen sollten einerseits Visibility Culling1, Level-of-Detail, (LOD), Out-of-Core Speicherverwaltung und Kompression unterstützen. Anderseits sollte diese Optimierung sehr effizient arbeiten, um das Rendering nicht noch zusätzlich zu behindern. Dies erfordert die Entwicklung paralleler Verfahren, die in der Lage sind, die enorme Datenflut effizient zu verarbeiten. Der Kernbeitrag dieser Arbeit sind neuartige Algorithmen und Datenstrukturen, die speziell für eine effiziente parallele Datenverarbeitung entwickelt wurden und in der Lage sind sehr komplexe Modelle und Szenen in Echtzeit darzustellen, sowie zu modellieren. Diese Algorithmen arbeiten in zwei Phasen: Zunächst wird in einer Offline-Phase die Datenstruktur erzeugt und für parallele Verarbeitung optimiert. Die optimierte Datenstruktur wird dann in der zweiten Phase für das Echtzeitrendering verwendet. Ein weiterer Beitrag dieser Arbeit ist ein Algorithmus, welcher in der Lage ist, einen sehr realistisch wirkenden Planeten prozedural zu generieren und in Echtzeit zu rendern

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg

Web based 3D graphics using Dart : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, New Zealand

Author: McMullen Timothy
Publication venue: 'Massey University'
Publication date: 01/01/2019
Field of study

The proportion of the population that has grown up with unlimited access to the internet and portable digital devices is ever increasing. Accompanying this growth are advances in web-based and mobile technologies that make platform independent applications more viable. Graphical applications, in particular, are popular with users but as of yet have remained relatively underdeveloped for platform independence due to their complex nature, and device requirements. This research combines web-based technologies to create a framework for developing scalable graphical environments while ensuring a suitable level of performance across all device types. The web programming language Dart provides a method for achieving execution across a range of devices with a single implementation. Working alongside Dart, WebGL manages the processing needs for the graphical elements, which are provided by content generative algorithms: the diamond square algorithm, Perlin noise, and the shallow water simulation. The content algorithms allow for some exibility in the scale of the application, which is expanded upon by benchmarking device performance and the inclusion of the asset controller that manages what algorithm is used to generate content, and at what quality and size. This allows the application to achieve optimal performance on a range of devices from low-end mobile devices to high-end PCs. An input controller further supports platform independence by allowing for a range of input types and the addition of new input types as technology develops. The combination of these technologies and functionalities result in a framework that generates 3d scenes on any given device, and can alter automatically for optimal performance, or according to prede ned developer metrics for emphasis on particular criteria. Input management functionality and web-based computing mean that as technology advances and new devices are developed and improved, applications do not need redevelopment, and compromises in features and functionality are only limited by device processing power and on an individual basis. This framework serves as an example of how a range of technologies and algorithms can be knitted together to design performant solutions for platform independent applications

Massey Research Online

Real-time, image-based motion estimation for the Dervish landmine-clearance vehicle

Author: Haworth Christopher D.
Publication venue: The University of Edinburgh
Publication date: 01/01/2002
Field of study

Edinburgh Research Archive