170,797 research outputs found

    A Multi-GPU Programming Library for Real-Time Applications

    Full text link
    We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work are a novel kind of container abstraction and MPI-like communication methods for intra-system communication. We further demonstrate how MGPU is used as a framework for porting existing GPU libraries to multi-device architectures. Putting our library to the test, we accelerate an iterative non-linear image reconstruction algorithm for real-time magnetic resonance imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us to conclude that multi-GPU systems are a viable solution for real-time MRI reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure

    Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs

    Get PDF
    Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded BlockRAM. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design framework from GPU for FPGA designs as a pseudo-automatic development solution. In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times

    Ground Systems Development Environment (GSDE) interface requirements analysis

    Get PDF
    A set of procedural and functional requirements are presented for the interface between software development environments and software integration and test systems used for space station ground systems software. The requirements focus on the need for centralized configuration management of software as it is transitioned from development to formal, target based testing. This concludes the GSDE Interface Requirements study. A summary is presented of findings concerning the interface itself, possible interface and prototyping directions for further study, and results of the investigation of the Cronus distributed applications environment

    The Orbital Maneuvering Vehicle Training Facility visual system concept

    Get PDF
    The purpose of the Orbital Maneuvering Vehicle (OMV) Training Facility (OTF) is to provide effective training for OMV pilots. A critical part of the training environment is the Visual System, which will simulate the video scenes produced by the OMV Closed-Circuit Television (CCTV) system. The simulation will include camera models, dynamic target models, moving appendages, and scene degradation due to the compression/decompression of video signal. Video system malfunctions will also be provided to ensure that the pilot is ready to meet all challenges the real-world might provide. One possible visual system configuration for the training facility that will meet existing requirements is described

    Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs

    Get PDF
    Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. OpenCL is commonly used to describe these architectures for their execution on GPGPUs or FPGAs. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded parallel BlockRAMs. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. In this paper both Altera and Xilinx adopted OpenCL co-design frameworks for pseudo-automatic development solutions are evaluated. A comprehensive evaluation and comparison for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.Ministerio de Economía y Competitividad TEC2016-77785-

    Sixth Annual Users' Conference

    Get PDF
    Conference papers and presentation outlines which address the use of the Transportable Applications Executive (TAE) and its various applications programs are compiled. Emphasis is given to the design of the user interface and image processing workstation in general. Alternate ports of TAE and TAE subsystems are also covered

    Archival Legacy Investigations of Circumstellar Environments: Overview and First Results

    Get PDF
    We are currently conducting a comprehensive and consistent re-processing of archival HST-NICMOS coronagraphic surveys using advanced PSF subtraction methods, entitled the Archival Legacy Investigations of Circumstellar Environments program (ALICE, HST/AR 12652). This virtual campaign of about 400 targets has already produced numerous new detections of previously unidentified point sources and circumstellar structures. We present five newly spatially resolved debris disks revealed in scattered light by our analysis of the archival data. These images provide new views of material around young solar-type stars at ages corresponding to the period of terrestrial planet formation in our solar system. We have also detected several new candidate substellar companions, for which there are ongoing followup campaigns (HST/WFC3 and VLT/SINFONI in ADI mode). Since the methods developed as part of ALICE are directly applicable to future missions (JWST, AFTA coronagraph) we emphasize the importance of devising optimal PSF subtraction methods for upcoming coronagraphic imaging missions. We describe efforts in defining direct imaging high-level science products (HLSP) standards that can be applicable to other coronagraphic campaigns, including ground-based (e.g., Gemini Planet Imager), and future space instruments (e.g., JWST). ALICE will deliver a first release of HLSPs to the community through the MAST archive at STScI in 2014.Comment: Proceedings of the SPIE, 9143-199. 17 pages, 11 figure

    Combining high performance simulation, data acquisition, and graphics display computers

    Get PDF
    Issues involved in the continuing development of an advanced simulation complex are discussed. This approach provides the capability to perform the majority of tests on advanced systems, non-destructively. The controlled test environments can be replicated to examine the response of the systems under test to alternative treatments of the system control design, or test the function and qualification of specific hardware. Field tests verify that the elements simulated in the laboratories are sufficient. The digital computer is hosted by a Digital Equipment Corp. MicroVAX computer with an Aptec Computer Systems Model 24 I/O computer performing the communication function. An Applied Dynamics International AD100 performs the high speed simulation computing and an Evans and Sutherland PS350 performs on-line graphics display. A Scientific Computer Systems SCS40 acts as a high performance FORTRAN program processor to support the complex, by generating numerous large files from programs coded in FORTRAN that are required for the real time processing. Four programming languages are involved in the process, FORTRAN, ADSIM, ADRIO, and STAPLE. FORTRAN is employed on the MicroVAX host to initialize and terminate the simulation runs on the system. The generation of the data files on the SCS40 also is performed with FORTRAN programs. ADSIM and ADIRO are used to program the processing elements of the AD100 and its IOCP processor. STAPLE is used to program the Aptec DIP and DIA processors
    corecore