Search CORE

4,705 research outputs found

Performance Analysis of a Novel GPU Computation-to-core Mapping Scheme for Robust Facet Image Modeling

Author: Cao Yong
Park Seung In
Quek Francis
Watson Layne T.
Publication venue
Publication date: 01/01/2012
Field of study

Though the GPGPU concept is well-known in image processing, much more work remains to be done to fully exploit GPUs as an alternative computation engine. This paper investigates the computation-to-core mapping strategies to probe the efficiency and scalability of the robust facet image modeling algorithm on GPUs. Our fine-grained computation-to-core mapping scheme shows a significant performance gain over the standard pixel-wise mapping scheme. With in-depth performance comparisons across the two different mapping schemes, we analyze the impact of the level of parallelism on the GPU computation and suggest two principles for optimizing future image processing applications on the GPU platform

Computer Science Technical Reports @Virginia Tech

Control of a free-flying robot manipulator system

Author: Alexander H.
Publication venue
Publication date
Field of study

The development of and test control strategies for self-contained, autonomous free flying space robots are discussed. Such a robot would perform operations in space similar to those currently handled by astronauts during extravehicular activity (EVA). Use of robots should reduce the expense and danger attending EVA both by providing assistance to astronauts and in many cases by eliminating altogether the need for human EVA, thus greatly enhancing the scope and flexibility of space assembly and repair activities. The focus of the work is to develop and carry out a program of research with a series of physical Satellite Robot Simulator Vehicles (SRSV's), two-dimensionally freely mobile laboratory models of autonomous free-flying space robots such as might perform extravehicular functions associated with operation of a space station or repair of orbiting satellites. It is planned, in a later phase, to extend the research to three dimensions by carrying out experiments in the Space Shuttle cargo bay

NASA Technical Reports Server

Parallel Architectures for Planetary Exploration Requirements (PAPER)

Author: Cezzar Ruknet
Sen Ranjan K.
Publication venue
Publication date
Field of study

The Parallel Architectures for Planetary Exploration Requirements (PAPER) project is essentially research oriented towards technology insertion issues for NASA's unmanned planetary probes. It was initiated to complement and augment the long-term efforts for space exploration with particular reference to NASA/LaRC's (NASA Langley Research Center) research needs for planetary exploration missions of the mid and late 1990s. The requirements for space missions as given in the somewhat dated Advanced Information Processing Systems (AIPS) requirements document are contrasted with the new requirements from JPL/Caltech involving sensor data capture and scene analysis. It is shown that more stringent requirements have arisen as a result of technological advancements. Two possible architectures, the AIPS Proof of Concept (POC) configuration and the MAX Fault-tolerant dataflow multiprocessor, were evaluated. The main observation was that the AIPS design is biased towards fault tolerance and may not be an ideal architecture for planetary and deep space probes due to high cost and complexity. The MAX concepts appears to be a promising candidate, except that more detailed information is required. The feasibility for adding neural computation capability to this architecture needs to be studied. Key impact issues for architectural design of computing systems meant for planetary missions were also identified

NASA Technical Reports Server

Parallel processing and expert systems

Author: Lau Sonie
Yan Jerry C.
Publication venue
Publication date
Field of study

Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited

NASA Technical Reports Server

The AXIOM software layers

AXIOM project aims at developing a heterogeneous computing board (SMP-FPGA).The Software Layers developed at the AXIOM project are explained.OmpSs provides an easy way to execute heterogeneous codes in multiple cores. People and objects will soon share the same digital network for information exchange in a world named as the age of the cyber-physical systems. The general expectation is that people and systems will interact in real-time. This poses pressure onto systems design to support increasing demands on computational power, while keeping a low power envelop. Additionally, modular scaling and easy programmability are also important to ensure these systems to become widespread. The whole set of expectations impose scientific and technological challenges that need to be properly addressed.The AXIOM project (Agile, eXtensible, fast I/O Module) will research new hardware/software architectures for cyber-physical systems to meet such expectations. The technical approach aims at solving fundamental problems to enable easy programmability of heterogeneous multi-core multi-board systems. AXIOM proposes the use of the task-based OmpSs programming model, leveraging low-level communication interfaces provided by the hardware. Modular scalability will be possible thanks to a fast interconnect embedded into each module. To this aim, an innovative ARM and FPGA-based board will be designed, with enhanced capabilities for interfacing with the physical world. Its effectiveness will be demonstrated with key scenarios such as Smart Video-Surveillance and Smart Living/Home (domotics).Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Archivio della Ricerca - Università degli Studi di Siena

An Automated Design-flow for FPGA-based Sequential Simulation

Author: Blumink R.
Holzenspies P.K.F.
Rutgers J.H.
Smit Gerardus Johannes Maria
Westmijze M.
Westmijze M.
Wolkotte P.T.
Publication venue: Technologiestichting STW
Publication date: 01/01/2008
Field of study

In this paper we describe the automated design flow that will transform and map a given homogeneous or heterogeneous hardware design into an FPGA that performs a cycle accurate simulation. The flow replaces the required manually performed transformation and can be embedded in existing standard synthesis flows. Compared to the earlier manually translated designs, this automated flow resulted in a reduced number of FPGA hardware resources and higher simulation frequencies. The implementation of the complete design flow is work in progress.\u

CiteSeerX

University of Twente Research Information

A parallel implementation of a multisensor feature-based range-estimation method

Author: Sridhar Banavar
Suorsa Raymond E.
Publication venue
Publication date
Field of study

There are many proposed vision based methods to perform obstacle detection and avoidance for autonomous or semi-autonomous vehicles. All methods, however, will require very high processing rates to achieve real time performance. A system capable of supporting autonomous helicopter navigation will need to extract obstacle information from imagery at rates varying from ten frames per second to thirty or more frames per second depending on the vehicle speed. Such a system will need to sustain billions of operations per second. To reach such high processing rates using current technology, a parallel implementation of the obstacle detection/ranging method is required. This paper describes an efficient and flexible parallel implementation of a multisensor feature-based range-estimation algorithm, targeted for helicopter flight, realized on both a distributed-memory and shared-memory parallel computer

NASA Technical Reports Server

An O(n) time discrete relaxation architecture for real-time processing of the consistent labeling problem

Author: Gu Jun
Henderson Thomas C.
Publication venue: University of Utah
Publication date: 01/01/1986
Field of study

technical reportDiscrete relaxation techniques have proven useful in solving a wide range of problems in digital signal and digital image processing, artificial intelligence, operations research, and machine vision. Much work has been devoted to finding efficient hardware architectures. This paper shows that a conventional hardware design for a Discrete Relaxation Algorithm (DRA) suffers from 0(n2m3 ) time complexity and Oinhn2) space complexity. By reformulating DRA into a parallel computational tree and using a multiple tree-root pipelining scheme, time complexity is reduced to O(nm), while the space complexity is reduced by a factor of 2. For certain relaxation processing, the space complexity can even be decreased to O(nm). Furthermore, a technique for dynamic configuring an architectural wavefront is used which leads to an O(n) time highly configurable DRA3 architecture

The University of Utah: J. Willard Marriott Digital Library