Search CORE

14,222 research outputs found

Adaptive guidance and control for future remote sensing systems

Author: Lowrie J. W.
Myers J. E.
Publication venue
Publication date: 01/10/1980
Field of study

A unique approach to onboard processing was developed that is capable of acquiring high quality image data for users in near real time. The approach is divided into two steps: the development of an onboard cloud detection system; and the development of a landmark tracker. The results of these two developments are outlined and the requirements of an operational guidance and control system capable of providing continuous estimation of the sensor boresight position are summarized

Making FPGAs Accessible to Scientists and Engineers as Domain Expert Software Programmers with LabVIEW

Author: Ahrends Stephan
Andrade Hugo A.
Hogg Simon
Publication venue
Publication date: 20/08/2014
Field of study

In this paper we present a graphical programming framework, LabVIEW, and associated language and libraries, as well as programming techniques and patterns that we have found useful in making FPGAs accessible to scientists and engineers as domain expert software programmers.Comment: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423

arXiv.org e-Print Archive

Vectorized OpenCL implementation of numerical integration for higher order finite elements

Author: Banaś Krzysztof
Krużel Filip
Publication venue: 'Elsevier BV'
Publication date: 04/10/2013
Field of study

In our work we analyze computational aspects of the problem of numerical integration in finite element calculations and consider an OpenCL implementation of related algorithms for processors with wide vector registers. As a platform for testing the implementation we choose the PowerXCell processor, being an example of the Cell Broadband Engine (CellBE) architecture. Although the processor is considered old for today's standards (its design dates back to year 2001), we investigate its performance due to two features that it shares with recent Xeon Phi family of coprocessors: wide vector units and relatively slow connection of computing cores with main global memory. The performed analysis of parallelization options can also be used for designing numerical integration algorithms for other processors with vector registers, such as contemporary x86 microprocessors.Comment: published online in Computers and Mathematics with Applications: http://www.sciencedirect.com/science/article/pii/S089812211300521

arXiv.org e-Print Archive

The electronic interface for quantum processors

Author: Charbon Edoardo
Sebastiano Fabio
van Dijk Jeroen P. G.
Publication venue
Publication date: 15/03/2019
Field of study

Quantum computers can potentially provide an unprecedented speed-up with respect to traditional computers. However, a significant increase in the number of quantum bits (qubits) and their performance is required to demonstrate such quantum supremacy. While scaling up the underlying quantum processor is extremely challenging, building the electronics required to interface such large-scale processor is just as relevant and arduous. This paper discusses the challenges in designing a scalable electronic interface for quantum processors. To that end, we discuss the requirements dictated by different qubit technologies and present existing implementations of the electronic interface. The limitations in scaling up such state-of-the-art implementations are analyzed, and possible solutions to overcome those hurdles are reviewed. The benefits offered by operating the electronic interface at cryogenic temperatures in close proximity to the low-temperature qubits are discussed. Although several significant challenges must still be faced by researchers in the field of cryogenic control for quantum processors, a cryogenic electronic interface appears the viable solution to enable large-scale quantum computers able to address world-changing computational problems

arXiv.org e-Print Archive

Parallel Programming Models for Heterogeneous Many-Cores : A Survey

Author: Fang Jianbin
Huang Chun
Tang Tao
Wang Zheng
Publication venue
Publication date: 05/05/2020
Field of study

Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to match the underlying heterogeneous platform. In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability. We examine various software optimization techniques for minimizing the communicating overhead between heterogeneous computing devices. We provide a road map for a wide variety of different research areas. We conclude with a discussion on open issues in the area and potential research directions. This article provides both an accessible introduction to the fast-moving area of heterogeneous programming and a detailed bibliography of its main achievements.Comment: Accepted to be published at CCF Transactions on High Performance Computin

arXiv.org e-Print Archive

Performance Models for Split-execution Computing Systems

Author: Britt Keith A.
Humble Travis S.
Imam Neena
McCaskey Alexander J.
Schrock Jonathan
Seddiqi Hadayat
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/07/2016
Field of study

Split-execution computing leverages the capabilities of multiple computational models to solve problems, but splitting program execution across different computational models incurs costs associated with the translation between domains. We analyze the performance of a split-execution computing system developed from conventional and quantum processing units (QPUs) by using behavioral models that track resource usage. We focus on asymmetric processing models built using conventional CPUs and a family of special-purpose QPUs that employ quantum computing principles. Our performance models account for the translation of a classical optimization problem into the physical representation required by the quantum processor while also accounting for hardware limitations and conventional processor speed and memory. We conclude that the bottleneck in this split-execution computing system lies at the quantum-classical interface and that the primary time cost is independent of quantum processor behavior.Comment: Presented at 18th Workshop on Advances in Parallel and Distributed Computational Models [APDCM2016] on 23 May 2016; 10 page

arXiv.org e-Print Archive

NBSymple, a double parallel, symplectic N-body code running on Graphic Processing Units

Author: A. Mastrobuono-Battisti
Aarseth
Abraham
Allen
Barnes
Belleman
Bok
Bok
Capuzzo-Dolcetta
Capuzzo-Dolcetta
Cartwright
D. Maschietti
Di Matteo
Gaburov
Heggie
Hockney
King
Kinoshita
Lienhart
Makino
Menyuk
Odenkirchen
Owens
Plummer
Portegies Zwart
R. Capuzzo-Dolcetta
Spitzer
Publication venue: 'Elsevier BV'
Publication date: 19/03/2010
Field of study

We present and discuss the characteristics and performances, both in term of computational speed and precision, of a numerical code which numerically integrates the equation of motions of N 'particles' interacting via Newtonian gravitation and move in an external galactic smooth field. The force evaluation on every particle is done by mean of direct summation of the contribution of all the other system's particle, avoiding truncation error. The time integration is done with second-order and sixth-order symplectic schemes. The code, NBSymple, has been parallelized twice, by mean of the Computer Unified Device Architecture to make the all-pair force evaluation as fast as possible on high-performance Graphic Processing Units NVIDIA TESLA C 1060, while the O(N) computations are distributed on various CPUs by mean of OpenMP Application Program. The code works both in single precision floating point arithmetics or in double precision. The use of single precision allows the use at best of the GPU performances but, of course, limits the precision of simulation in some critical situations. We find a good compromise in using a software reconstruction of double precision for those variables that are most critical for the overall precision of the code. The code is available on the web site astrowww.phys.uniroma1.it/dolcetta/nbsymple.htmlComment: Paper composed by 29 pages, including 9 figures. Submitted to New Astronomy

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Exploiting graphic processing units parallelism to improve intelligent data acquisition system performance in JET's correlation reflectometer

Author: Arcas Castro Guillermo de
Barrera Lopez de Turiso Eduardo
Fonseca A.
López Navarro Juan Manuel
Murari Andrea
Nieto J.
Ruiz González Mariano
Vega Jesús
Publication venue: E.U.I.T. Telecomunicación (UPM)
Publication date: 01/04/2011
Field of study

The performance of intelligent data acquisition systems relies heavily on their processing capabilities and local bus bandwidth, especially in applications with high sample rates or high number of channels. This is the case of the self adaptive sampling rate data acquisition system installed as a pilot experiment in KG8B correlation reflectometer at JET. The system, which is based on the ITMS platform, continuously adapts the sample rate during the acquisition depending on the signal bandwidth. In order to do so it must transfer acquired data to a memory buffer in the host processor and run heavy computational algorithms for each data block. The processing capabilities of the host CPU and the bandwidth of the PXI bus limit the maximum sample rate that can be achieved, therefore limiting the maximum bandwidth of the phenomena that can be studied. Graphic processing units (GPU) are becoming an alternative for speeding up compute intensive kernels of scientific, imaging and simulation applications. However, integrating this technology into data acquisition systems is not a straight forward step, not to mention exploiting their parallelism efficiently. This paper discusses the use of GPUs with new high speed data bus interfaces to improve the performance of the self adaptive sampling rate data acquisition system installed on JET. Integration issues are discussed and performance evaluations are presente

High-Performance Computing with Quantum Processing Units

Author: Britt Keith A.
Humble Travis S.
Publication venue
Publication date: 13/11/2015
Field of study

The prospects of quantum computing have driven efforts to realize fully functional quantum processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the question of how to integrate these emerging processors into modern high-performance computing (HPC) systems. We examine how QPUs can be integrated into current and future HPC system architectures by accounting for functional and physical design requirements. We identify two integration pathways that are differentiated by infrastructure constraints on the QPU and the use cases expected for the HPC system. This includes a tight integration that assumes infrastructure bottlenecks can be overcome as well as a loose integration that assumes they cannot. We find that the performance of both approaches is likely to depend on the quantum interconnect that serves to entangle multiple QPUs. We also identify several challenges in assessing QPU performance for HPC, and we consider new metrics that capture the interplay between system architecture and the quantum parallelism underlying computational performance.Comment: 8 pages, 5 figure

arXiv.org e-Print Archive

The Power of Spreadsheet Computations

Author: Tyszkiewicz Jerzy
Publication venue
Publication date: 27/07/2013
Field of study

We investigate the expressive power of spreadsheets. We consider spreadsheets which contain only formulas, and assume that they are small templates, which can be filled to a larger area of the grid to process input data of variable size. Therefore we can compare them to well-known machine models of computation. We consider a number of classes of spreadsheets defined by restrictions on their reference structure. Two of the classes correspond closely to parallel complexity classes: we prove a direct correspondence between the dimensions of the spreadsheet and amount of hardware and time used by a parallel computer to compute the same function. As a tool, we produce spreadsheets which are universal in these classes, i.e. can emulate any other spreadsheet from them. In other cases we implement in the spreadsheets in question instances of a polynomial-time complete problem, which indicates that the the spreadsheets are unlikely to have efficient parallel evaluation algorithms. Thus we get a picture how the computational power of spreadsheets depends on their dimensions and structure of references.Comment: 36 pages. Electronic appendices in Excel's xlsx format available from author's Web pag

arXiv.org e-Print Archive