14,222 research outputs found
Adaptive guidance and control for future remote sensing systems
A unique approach to onboard processing was developed that is capable of acquiring high quality image data for users in near real time. The approach is divided into two steps: the development of an onboard cloud detection system; and the development of a landmark tracker. The results of these two developments are outlined and the requirements of an operational guidance and control system capable of providing continuous estimation of the sensor boresight position are summarized
Making FPGAs Accessible to Scientists and Engineers as Domain Expert Software Programmers with LabVIEW
In this paper we present a graphical programming framework, LabVIEW, and
associated language and libraries, as well as programming techniques and
patterns that we have found useful in making FPGAs accessible to scientists and
engineers as domain expert software programmers.Comment: Presented at First International Workshop on FPGAs for Software
Programmers (FSP 2014) (arXiv:1408.4423
Vectorized OpenCL implementation of numerical integration for higher order finite elements
In our work we analyze computational aspects of the problem of numerical
integration in finite element calculations and consider an OpenCL
implementation of related algorithms for processors with wide vector registers.
As a platform for testing the implementation we choose the PowerXCell
processor, being an example of the Cell Broadband Engine (CellBE) architecture.
Although the processor is considered old for today's standards (its design
dates back to year 2001), we investigate its performance due to two features
that it shares with recent Xeon Phi family of coprocessors: wide vector units
and relatively slow connection of computing cores with main global memory. The
performed analysis of parallelization options can also be used for designing
numerical integration algorithms for other processors with vector registers,
such as contemporary x86 microprocessors.Comment: published online in Computers and Mathematics with Applications:
http://www.sciencedirect.com/science/article/pii/S089812211300521
The electronic interface for quantum processors
Quantum computers can potentially provide an unprecedented speed-up with
respect to traditional computers. However, a significant increase in the number
of quantum bits (qubits) and their performance is required to demonstrate such
quantum supremacy. While scaling up the underlying quantum processor is
extremely challenging, building the electronics required to interface such
large-scale processor is just as relevant and arduous. This paper discusses the
challenges in designing a scalable electronic interface for quantum processors.
To that end, we discuss the requirements dictated by different qubit
technologies and present existing implementations of the electronic interface.
The limitations in scaling up such state-of-the-art implementations are
analyzed, and possible solutions to overcome those hurdles are reviewed. The
benefits offered by operating the electronic interface at cryogenic
temperatures in close proximity to the low-temperature qubits are discussed.
Although several significant challenges must still be faced by researchers in
the field of cryogenic control for quantum processors, a cryogenic electronic
interface appears the viable solution to enable large-scale quantum computers
able to address world-changing computational problems
Parallel Programming Models for Heterogeneous Many-Cores : A Survey
Heterogeneous many-cores are now an integral part of modern computing systems
ranging from embedding systems to supercomputers. While heterogeneous many-core
design offers the potential for energy-efficient high-performance, such
potential can only be unlocked if the application programs are suitably
parallel and can be made to match the underlying heterogeneous platform. In
this article, we provide a comprehensive survey for parallel programming models
for heterogeneous many-core architectures and review the compiling techniques
of improving programmability and portability. We examine various software
optimization techniques for minimizing the communicating overhead between
heterogeneous computing devices. We provide a road map for a wide variety of
different research areas. We conclude with a discussion on open issues in the
area and potential research directions. This article provides both an
accessible introduction to the fast-moving area of heterogeneous programming
and a detailed bibliography of its main achievements.Comment: Accepted to be published at CCF Transactions on High Performance
Computin
Performance Models for Split-execution Computing Systems
Split-execution computing leverages the capabilities of multiple
computational models to solve problems, but splitting program execution across
different computational models incurs costs associated with the translation
between domains. We analyze the performance of a split-execution computing
system developed from conventional and quantum processing units (QPUs) by using
behavioral models that track resource usage. We focus on asymmetric processing
models built using conventional CPUs and a family of special-purpose QPUs that
employ quantum computing principles. Our performance models account for the
translation of a classical optimization problem into the physical
representation required by the quantum processor while also accounting for
hardware limitations and conventional processor speed and memory. We conclude
that the bottleneck in this split-execution computing system lies at the
quantum-classical interface and that the primary time cost is independent of
quantum processor behavior.Comment: Presented at 18th Workshop on Advances in Parallel and Distributed
Computational Models [APDCM2016] on 23 May 2016; 10 page
NBSymple, a double parallel, symplectic N-body code running on Graphic Processing Units
We present and discuss the characteristics and performances, both in term of
computational speed and precision, of a numerical code which numerically
integrates the equation of motions of N 'particles' interacting via Newtonian
gravitation and move in an external galactic smooth field. The force evaluation
on every particle is done by mean of direct summation of the contribution of
all the other system's particle, avoiding truncation error. The time
integration is done with second-order and sixth-order symplectic schemes. The
code, NBSymple, has been parallelized twice, by mean of the Computer Unified
Device Architecture to make the all-pair force evaluation as fast as possible
on high-performance Graphic Processing Units NVIDIA TESLA C 1060, while the
O(N) computations are distributed on various CPUs by mean of OpenMP Application
Program. The code works both in single precision floating point arithmetics or
in double precision. The use of single precision allows the use at best of the
GPU performances but, of course, limits the precision of simulation in some
critical situations. We find a good compromise in using a software
reconstruction of double precision for those variables that are most critical
for the overall precision of the code. The code is available on the web site
astrowww.phys.uniroma1.it/dolcetta/nbsymple.htmlComment: Paper composed by 29 pages, including 9 figures. Submitted to New
Astronomy
Exploiting graphic processing units parallelism to improve intelligent data acquisition system performance in JET's correlation reflectometer
The performance of intelligent data acquisition systems relies heavily on their processing capabilities and local bus bandwidth, especially in applications with high sample rates or high number of channels. This is the case of the self adaptive sampling rate data acquisition system installed as a pilot experiment in KG8B correlation reflectometer at JET. The system, which is based on the ITMS platform, continuously adapts the sample rate during the acquisition depending on the signal bandwidth. In order to do so it must transfer acquired data to a memory buffer in the host processor and run heavy computational algorithms for each data block. The processing capabilities of the host CPU and the bandwidth of the PXI bus limit the maximum sample rate that can be achieved, therefore limiting the maximum bandwidth of the phenomena that can be studied. Graphic processing units (GPU) are becoming an alternative for speeding up compute intensive kernels of scientific, imaging and simulation applications. However, integrating this technology into data acquisition systems is not a straight forward step, not to mention exploiting their parallelism efficiently. This paper discusses the use of GPUs with new high speed data bus interfaces to improve the performance of the self adaptive sampling rate data acquisition system installed on JET. Integration issues are discussed and performance evaluations are presente
High-Performance Computing with Quantum Processing Units
The prospects of quantum computing have driven efforts to realize fully
functional quantum processing units (QPUs). Recent success in developing
proof-of-principle QPUs has prompted the question of how to integrate these
emerging processors into modern high-performance computing (HPC) systems. We
examine how QPUs can be integrated into current and future HPC system
architectures by accounting for functional and physical design requirements. We
identify two integration pathways that are differentiated by infrastructure
constraints on the QPU and the use cases expected for the HPC system. This
includes a tight integration that assumes infrastructure bottlenecks can be
overcome as well as a loose integration that assumes they cannot. We find that
the performance of both approaches is likely to depend on the quantum
interconnect that serves to entangle multiple QPUs. We also identify several
challenges in assessing QPU performance for HPC, and we consider new metrics
that capture the interplay between system architecture and the quantum
parallelism underlying computational performance.Comment: 8 pages, 5 figure
The Power of Spreadsheet Computations
We investigate the expressive power of spreadsheets. We consider spreadsheets
which contain only formulas, and assume that they are small templates, which
can be filled to a larger area of the grid to process input data of variable
size. Therefore we can compare them to well-known machine models of
computation. We consider a number of classes of spreadsheets defined by
restrictions on their reference structure. Two of the classes correspond
closely to parallel complexity classes: we prove a direct correspondence
between the dimensions of the spreadsheet and amount of hardware and time used
by a parallel computer to compute the same function. As a tool, we produce
spreadsheets which are universal in these classes, i.e. can emulate any other
spreadsheet from them. In other cases we implement in the spreadsheets in
question instances of a polynomial-time complete problem, which indicates that
the the spreadsheets are unlikely to have efficient parallel evaluation
algorithms. Thus we get a picture how the computational power of spreadsheets
depends on their dimensions and structure of references.Comment: 36 pages. Electronic appendices in Excel's xlsx format available from
author's Web pag
- …