14,222 research outputs found

    Adaptive guidance and control for future remote sensing systems

    Get PDF
    A unique approach to onboard processing was developed that is capable of acquiring high quality image data for users in near real time. The approach is divided into two steps: the development of an onboard cloud detection system; and the development of a landmark tracker. The results of these two developments are outlined and the requirements of an operational guidance and control system capable of providing continuous estimation of the sensor boresight position are summarized

    Making FPGAs Accessible to Scientists and Engineers as Domain Expert Software Programmers with LabVIEW

    Full text link
    In this paper we present a graphical programming framework, LabVIEW, and associated language and libraries, as well as programming techniques and patterns that we have found useful in making FPGAs accessible to scientists and engineers as domain expert software programmers.Comment: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423

    Vectorized OpenCL implementation of numerical integration for higher order finite elements

    Full text link
    In our work we analyze computational aspects of the problem of numerical integration in finite element calculations and consider an OpenCL implementation of related algorithms for processors with wide vector registers. As a platform for testing the implementation we choose the PowerXCell processor, being an example of the Cell Broadband Engine (CellBE) architecture. Although the processor is considered old for today's standards (its design dates back to year 2001), we investigate its performance due to two features that it shares with recent Xeon Phi family of coprocessors: wide vector units and relatively slow connection of computing cores with main global memory. The performed analysis of parallelization options can also be used for designing numerical integration algorithms for other processors with vector registers, such as contemporary x86 microprocessors.Comment: published online in Computers and Mathematics with Applications: http://www.sciencedirect.com/science/article/pii/S089812211300521

    The electronic interface for quantum processors

    Full text link
    Quantum computers can potentially provide an unprecedented speed-up with respect to traditional computers. However, a significant increase in the number of quantum bits (qubits) and their performance is required to demonstrate such quantum supremacy. While scaling up the underlying quantum processor is extremely challenging, building the electronics required to interface such large-scale processor is just as relevant and arduous. This paper discusses the challenges in designing a scalable electronic interface for quantum processors. To that end, we discuss the requirements dictated by different qubit technologies and present existing implementations of the electronic interface. The limitations in scaling up such state-of-the-art implementations are analyzed, and possible solutions to overcome those hurdles are reviewed. The benefits offered by operating the electronic interface at cryogenic temperatures in close proximity to the low-temperature qubits are discussed. Although several significant challenges must still be faced by researchers in the field of cryogenic control for quantum processors, a cryogenic electronic interface appears the viable solution to enable large-scale quantum computers able to address world-changing computational problems

    Parallel Programming Models for Heterogeneous Many-Cores : A Survey

    Full text link
    Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to match the underlying heterogeneous platform. In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability. We examine various software optimization techniques for minimizing the communicating overhead between heterogeneous computing devices. We provide a road map for a wide variety of different research areas. We conclude with a discussion on open issues in the area and potential research directions. This article provides both an accessible introduction to the fast-moving area of heterogeneous programming and a detailed bibliography of its main achievements.Comment: Accepted to be published at CCF Transactions on High Performance Computin

    Performance Models for Split-execution Computing Systems

    Full text link
    Split-execution computing leverages the capabilities of multiple computational models to solve problems, but splitting program execution across different computational models incurs costs associated with the translation between domains. We analyze the performance of a split-execution computing system developed from conventional and quantum processing units (QPUs) by using behavioral models that track resource usage. We focus on asymmetric processing models built using conventional CPUs and a family of special-purpose QPUs that employ quantum computing principles. Our performance models account for the translation of a classical optimization problem into the physical representation required by the quantum processor while also accounting for hardware limitations and conventional processor speed and memory. We conclude that the bottleneck in this split-execution computing system lies at the quantum-classical interface and that the primary time cost is independent of quantum processor behavior.Comment: Presented at 18th Workshop on Advances in Parallel and Distributed Computational Models [APDCM2016] on 23 May 2016; 10 page

    NBSymple, a double parallel, symplectic N-body code running on Graphic Processing Units

    Full text link
    We present and discuss the characteristics and performances, both in term of computational speed and precision, of a numerical code which numerically integrates the equation of motions of N 'particles' interacting via Newtonian gravitation and move in an external galactic smooth field. The force evaluation on every particle is done by mean of direct summation of the contribution of all the other system's particle, avoiding truncation error. The time integration is done with second-order and sixth-order symplectic schemes. The code, NBSymple, has been parallelized twice, by mean of the Computer Unified Device Architecture to make the all-pair force evaluation as fast as possible on high-performance Graphic Processing Units NVIDIA TESLA C 1060, while the O(N) computations are distributed on various CPUs by mean of OpenMP Application Program. The code works both in single precision floating point arithmetics or in double precision. The use of single precision allows the use at best of the GPU performances but, of course, limits the precision of simulation in some critical situations. We find a good compromise in using a software reconstruction of double precision for those variables that are most critical for the overall precision of the code. The code is available on the web site astrowww.phys.uniroma1.it/dolcetta/nbsymple.htmlComment: Paper composed by 29 pages, including 9 figures. Submitted to New Astronomy

    Exploiting graphic processing units parallelism to improve intelligent data acquisition system performance in JET's correlation reflectometer

    Get PDF
    The performance of intelligent data acquisition systems relies heavily on their processing capabilities and local bus bandwidth, especially in applications with high sample rates or high number of channels. This is the case of the self adaptive sampling rate data acquisition system installed as a pilot experiment in KG8B correlation reflectometer at JET. The system, which is based on the ITMS platform, continuously adapts the sample rate during the acquisition depending on the signal bandwidth. In order to do so it must transfer acquired data to a memory buffer in the host processor and run heavy computational algorithms for each data block. The processing capabilities of the host CPU and the bandwidth of the PXI bus limit the maximum sample rate that can be achieved, therefore limiting the maximum bandwidth of the phenomena that can be studied. Graphic processing units (GPU) are becoming an alternative for speeding up compute intensive kernels of scientific, imaging and simulation applications. However, integrating this technology into data acquisition systems is not a straight forward step, not to mention exploiting their parallelism efficiently. This paper discusses the use of GPUs with new high speed data bus interfaces to improve the performance of the self adaptive sampling rate data acquisition system installed on JET. Integration issues are discussed and performance evaluations are presente

    High-Performance Computing with Quantum Processing Units

    Full text link
    The prospects of quantum computing have driven efforts to realize fully functional quantum processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the question of how to integrate these emerging processors into modern high-performance computing (HPC) systems. We examine how QPUs can be integrated into current and future HPC system architectures by accounting for functional and physical design requirements. We identify two integration pathways that are differentiated by infrastructure constraints on the QPU and the use cases expected for the HPC system. This includes a tight integration that assumes infrastructure bottlenecks can be overcome as well as a loose integration that assumes they cannot. We find that the performance of both approaches is likely to depend on the quantum interconnect that serves to entangle multiple QPUs. We also identify several challenges in assessing QPU performance for HPC, and we consider new metrics that capture the interplay between system architecture and the quantum parallelism underlying computational performance.Comment: 8 pages, 5 figure

    The Power of Spreadsheet Computations

    Full text link
    We investigate the expressive power of spreadsheets. We consider spreadsheets which contain only formulas, and assume that they are small templates, which can be filled to a larger area of the grid to process input data of variable size. Therefore we can compare them to well-known machine models of computation. We consider a number of classes of spreadsheets defined by restrictions on their reference structure. Two of the classes correspond closely to parallel complexity classes: we prove a direct correspondence between the dimensions of the spreadsheet and amount of hardware and time used by a parallel computer to compute the same function. As a tool, we produce spreadsheets which are universal in these classes, i.e. can emulate any other spreadsheet from them. In other cases we implement in the spreadsheets in question instances of a polynomial-time complete problem, which indicates that the the spreadsheets are unlikely to have efficient parallel evaluation algorithms. Thus we get a picture how the computational power of spreadsheets depends on their dimensions and structure of references.Comment: 36 pages. Electronic appendices in Excel's xlsx format available from author's Web pag
    • …
    corecore