39,218 research outputs found
Reducing branch delay to zero in pipelined processors
A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a branch target instruction memory is described. An analytical model of the performance of this implementation makes it possible to measure the efficiency of the mechanism with a very low computational cost. The model is used to determine the size of cache lines that maximizes the processor performance, to compare the performance of the mechanism with that of other schemes, and to analyze the performance of the mechanism with two alternative cache organizations.Peer ReviewedPostprint (published version
Frequency-modulated continuous-wave LiDAR compressive depth-mapping
We present an inexpensive architecture for converting a frequency-modulated
continuous-wave LiDAR system into a compressive-sensing based depth-mapping
camera. Instead of raster scanning to obtain depth-maps, compressive sensing is
used to significantly reduce the number of measurements. Ideally, our approach
requires two difference detectors. % but can operate with only one at the cost
of doubling the number of measurments. Due to the large flux entering the
detectors, the signal amplification from heterodyne detection, and the effects
of background subtraction from compressive sensing, the system can obtain
higher signal-to-noise ratios over detector-array based schemes while scanning
a scene faster than is possible through raster-scanning. %Moreover, we show how
a single total-variation minimization and two fast least-squares minimizations,
instead of a single complex nonlinear minimization, can efficiently recover
high-resolution depth-maps with minimal computational overhead. Moreover, by
efficiently storing only data points from measurements of an
pixel scene, we can easily extract depths by solving only two linear equations
with efficient convex-optimization methods
Accelerated Modeling of Near and Far-Field Diffraction for Coronagraphic Optical Systems
Accurately predicting the performance of coronagraphs and tolerancing optical
surfaces for high-contrast imaging requires a detailed accounting of
diffraction effects. Unlike simple Fraunhofer diffraction modeling, near and
far-field diffraction effects, such as the Talbot effect, are captured by
plane-to-plane propagation using Fresnel and angular spectrum propagation. This
approach requires a sequence of computationally intensive Fourier transforms
and quadratic phase functions, which limit the design and aberration
sensitivity parameter space which can be explored at high-fidelity in the
course of coronagraph design. This study presents the results of optimizing the
multi-surface propagation module of the open source Physical Optics Propagation
in PYthon (POPPY) package. This optimization was performed by implementing and
benchmarking Fourier transforms and array operations on graphics processing
units, as well as optimizing multithreaded numerical calculations using the
NumExpr python library where appropriate, to speed the end-to-end simulation of
observatory and coronagraph optical systems. Using realistic systems, this
study demonstrates a greater than five-fold decrease in wall-clock runtime over
POPPY's previous implementation and describes opportunities for further
improvements in diffraction modeling performance.Comment: Presented at SPIE ASTI 2018, Austin Texas. 11 pages, 6 figure
Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID
Evaluating the performance of scientific data processing systems is a
difficult task considering the plethora of application-specific solutions
available in this landscape and the lack of a generally-accepted benchmark. The
dual structure of scientific data coupled with the complex nature of processing
complicate the evaluation procedure further. SS-DB is the first attempt to
define a general benchmark for complex scientific processing over raw and
derived data. It fails to draw sufficient attention though because of the
ambiguous plain language specification and the extraordinary SciDB results. In
this paper, we remedy the shortcomings of the original SS-DB specification by
providing a formal representation in terms of ArrayQL algebra operators and
ArrayQL/SciQL constructs. These are the first formal representations of the
SS-DB benchmark. Starting from the formal representation, we give a reference
implementation and present benchmark results in EXTASCID, a novel system for
scientific data processing. EXTASCID is complete in providing native support
both for array and relational data and extensible in executing any user code
inside the system by the means of a configurable metaoperator. These features
result in an order of magnitude improvement over SciDB at data loading,
extracting derived data, and operations over derived data.Comment: 32 pages, 3 figure
Landau Gauge Fixing on GPUs and String Tension
We explore the performance of CUDA in performing Landau gauge fixing in
Lattice QCD, using the steepest descent method with Fourier acceleration. The
code performance was tested in a Tesla C2070, Fermi architecture. We also
present a study of the string tension at finite temperature in the confined
phase. The string tension is extracted from the color averaged free energy and
from the color singlet using Landau gauge fixing.Comment: 7 pages, 4 figures, 1 table. Contribution to the International
Meeting "Excited QCD", Peniche, Portugal, 06 - 12 May 201
A committee machine gas identification system based on dynamically reconfigurable FPGA
This paper proposes a gas identification system based on the committee machine (CM) classifier, which combines various gas identification algorithms, to obtain a unified decision with improved accuracy. The CM combines five different classifiers: K nearest neighbors (KNNs), multilayer perceptron (MLP), radial basis function (RBF), Gaussian mixture model (GMM), and probabilistic principal component analysis (PPCA). Experiments on real sensors' data proved the effectiveness of our system with an improved accuracy over individual classifiers. Due to the computationally intensive nature of CM, its implementation requires significant hardware resources. In order to overcome this problem, we propose a novel time multiplexing hardware implementation using a dynamically reconfigurable field programmable gate array (FPGA) platform. The processing is divided into three stages: sampling and preprocessing, pattern recognition, and decision stage. Dynamically reconfigurable FPGA technique is used to implement the system in a sequential manner, thus using limited hardware resources of the FPGA chip. The system is successfully tested for combustible gas identification application using our in-house tin-oxide gas sensors
- …