170,797 research outputs found
A Multi-GPU Programming Library for Real-Time Applications
We present MGPU, a C++ programming library targeted at single-node multi-GPU
systems. Such systems combine disproportionate floating point performance with
high data locality and are thus well suited to implement real-time algorithms.
We describe the library design, programming interface and implementation
details in light of this specific problem domain. The core concepts of this
work are a novel kind of container abstraction and MPI-like communication
methods for intra-system communication. We further demonstrate how MGPU is used
as a framework for porting existing GPU libraries to multi-device
architectures. Putting our library to the test, we accelerate an iterative
non-linear image reconstruction algorithm for real-time magnetic resonance
imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs
and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us
to conclude that multi-GPU systems are a viable solution for real-time MRI
reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure
Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs
Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded BlockRAM. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design framework from GPU for FPGA designs as a pseudo-automatic development solution. In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times
Ground Systems Development Environment (GSDE) interface requirements analysis
A set of procedural and functional requirements are presented for the interface between software development environments and software integration and test systems used for space station ground systems software. The requirements focus on the need for centralized configuration management of software as it is transitioned from development to formal, target based testing. This concludes the GSDE Interface Requirements study. A summary is presented of findings concerning the interface itself, possible interface and prototyping directions for further study, and results of the investigation of the Cronus distributed applications environment
The Orbital Maneuvering Vehicle Training Facility visual system concept
The purpose of the Orbital Maneuvering Vehicle (OMV) Training Facility (OTF) is to provide effective training for OMV pilots. A critical part of the training environment is the Visual System, which will simulate the video scenes produced by the OMV Closed-Circuit Television (CCTV) system. The simulation will include camera models, dynamic target models, moving appendages, and scene degradation due to the compression/decompression of video signal. Video system malfunctions will also be provided to ensure that the pilot is ready to meet all challenges the real-world might provide. One possible visual system configuration for the training facility that will meet existing requirements is described
Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs
Deep learning has significantly advanced the state of the
art in artificial intelligence, gaining wide popularity from both industry
and academia. Special interest is around Convolutional Neural Networks
(CNN), which take inspiration from the hierarchical structure
of the visual cortex, to form deep layers of convolutional operations,
along with fully connected classifiers. Hardware implementations of these
deep CNN architectures are challenged with memory bottlenecks that
require many convolution and fully-connected layers demanding large
amount of communication for parallel computation. Multi-core CPU
based solutions have demonstrated their inadequacy for this problem
due to the memory wall and low parallelism. Many-core GPU architectures
show superior performance but they consume high power and also
have memory constraints due to inconsistencies between cache and main
memory. OpenCL is commonly used to describe these architectures for
their execution on GPGPUs or FPGAs. FPGA design solutions are also
actively being explored, which allow implementing the memory hierarchy
using embedded parallel BlockRAMs. This boosts the parallel use
of shared memory elements between multiple processing units, avoiding
data replicability and inconsistencies. This makes FPGAs potentially
powerful solutions for real-time classification of CNNs. In this
paper both Altera and Xilinx adopted OpenCL co-design frameworks
for pseudo-automatic development solutions are evaluated. A comprehensive
evaluation and comparison for a 5-layer deep CNN is presented.
Hardware resources, temporal performance and the OpenCL architecture
for CNNs are discussed. Xilinx demonstrates faster synthesis, better
FPGA resource utilization and more compact boards. Altera provides
multi-platforms tools, mature design community and better execution
times.Ministerio de Economía y Competitividad TEC2016-77785-
Sixth Annual Users' Conference
Conference papers and presentation outlines which address the use of the Transportable Applications Executive (TAE) and its various applications programs are compiled. Emphasis is given to the design of the user interface and image processing workstation in general. Alternate ports of TAE and TAE subsystems are also covered
Archival Legacy Investigations of Circumstellar Environments: Overview and First Results
We are currently conducting a comprehensive and consistent re-processing of
archival HST-NICMOS coronagraphic surveys using advanced PSF subtraction
methods, entitled the Archival Legacy Investigations of Circumstellar
Environments program (ALICE, HST/AR 12652). This virtual campaign of about 400
targets has already produced numerous new detections of previously unidentified
point sources and circumstellar structures. We present five newly spatially
resolved debris disks revealed in scattered light by our analysis of the
archival data. These images provide new views of material around young
solar-type stars at ages corresponding to the period of terrestrial planet
formation in our solar system. We have also detected several new candidate
substellar companions, for which there are ongoing followup campaigns (HST/WFC3
and VLT/SINFONI in ADI mode). Since the methods developed as part of ALICE are
directly applicable to future missions (JWST, AFTA coronagraph) we emphasize
the importance of devising optimal PSF subtraction methods for upcoming
coronagraphic imaging missions. We describe efforts in defining direct imaging
high-level science products (HLSP) standards that can be applicable to other
coronagraphic campaigns, including ground-based (e.g., Gemini Planet Imager),
and future space instruments (e.g., JWST). ALICE will deliver a first release
of HLSPs to the community through the MAST archive at STScI in 2014.Comment: Proceedings of the SPIE, 9143-199. 17 pages, 11 figure
Combining high performance simulation, data acquisition, and graphics display computers
Issues involved in the continuing development of an advanced simulation complex are discussed. This approach provides the capability to perform the majority of tests on advanced systems, non-destructively. The controlled test environments can be replicated to examine the response of the systems under test to alternative treatments of the system control design, or test the function and qualification of specific hardware. Field tests verify that the elements simulated in the laboratories are sufficient. The digital computer is hosted by a Digital Equipment Corp. MicroVAX computer with an Aptec Computer Systems Model 24 I/O computer performing the communication function. An Applied Dynamics International AD100 performs the high speed simulation computing and an Evans and Sutherland PS350 performs on-line graphics display. A Scientific Computer Systems SCS40 acts as a high performance FORTRAN program processor to support the complex, by generating numerous large files from programs coded in FORTRAN that are required for the real time processing. Four programming languages are involved in the process, FORTRAN, ADSIM, ADRIO, and STAPLE. FORTRAN is employed on the MicroVAX host to initialize and terminate the simulation runs on the system. The generation of the data files on the SCS40 also is performed with FORTRAN programs. ADSIM and ADIRO are used to program the processing elements of the AD100 and its IOCP processor. STAPLE is used to program the Aptec DIP and DIA processors
- …