31,572 research outputs found
Research in the design of high-performance reconfigurable systems
An initial design for the Bit Processor (BP) referred to in prior reports as the Processing Element or PE has been completed. Eight BP's, together with their supporting random-access memory, a 64 k x 9 ROM to perform addition, routing logic, and some additional logic, constitute the components of a single stage. An initial stage design is given. Stages may be combined to perform high-speed fixed or floating point arithmetic. Stages can be configured into a range of arithmetic modules that includes bit-serial one or two-dimensional arrays; one or two dimensional arrays fixed or floating point processors; and specialized uniprocessors, such as long-word arithmetic units. One to eight BP's represent a likely initial chip level. The Stage would then correspond to a first-level pluggable module. As both this project and VLSI CAD/CAM progress, however, it is expected that the chip level would migrate upward to the stage and, perhaps, ultimately the box level. The BP RAM, consisting of two banks, holds only operands and indices. Programs are at the box (high-level function) and system level. At the system level initial effort has been concentrated on specifying the tools needed to evaluate design alternatives
Configurable 3D-integrated focal-plane sensor-processor array architecture
A mixed-signal Cellular Visual Microprocessor architecture with digital processors is
described. An ASIC implementation is also demonstrated. The architecture is composed of a
regular sensor readout circuit array, prepared for 3D face-to-face type integration, and one or
several cascaded array of mainly identical (SIMD) processing elements. The individual array
elements derived from the same general HDL description and could be of different in size, aspect
ratio, and computing resources
Recommended from our members
Two-dimensional DCT/IDCT architecture
A fully parallel architecture for the computation of a two-dimensional (2-D) discrete cosine transform (DCT), based on row-column decomposition is presented. It uses the same one dimensional (1-D) DCT unit for the row and column computations and (N2+N) registers to perform the transposition. It possesses features of regularity and modularity, and is thus well suited for VLSI implementation. It can be used for the computation of either the forward or the inverse 2-D DCT. Each 1-D DCT unit uses N fully parallel vector inner product (VIP) units. The design of the VIP units is based on a systematic design methodology using radix-2â arithmetic, which allows partitioning of the elements of each vector into small groups. Array multipliers without the final adder are used to produce the different partial product terms. This allows a more efficient use of 4:2 compressors for the accumulation of the products in the intermediate stages and reduces the number of accumulators from N to one. Using this procedure, the 2-D DCT architecture requires less than N2 multipliers (in terms of area occupied) and only 2N adders. It can compute a N x N-point DCT at a rate of one complete transform per N cycles after an appropriate initial delay
Developing numerical libraries in Java
The rapid and widespread adoption of Java has created a demand for reliable
and reusable mathematical software components to support the growing number of
compute-intensive applications now under development, particularly in science
and engineering. In this paper we address practical issues of the Java language
and environment which have an effect on numerical library design and
development. Benchmarks which illustrate the current levels of performance of
key numerical kernels on a variety of Java platforms are presented. Finally, a
strategy for the development of a fundamental numerical toolkit for Java is
proposed and its current status is described.Comment: 11 pages. Revised version of paper presented to the 1998 ACM
Conference on Java for High Performance Network Computing. To appear in
Concurrency: Practice and Experienc
Accelerated Modeling of Near and Far-Field Diffraction for Coronagraphic Optical Systems
Accurately predicting the performance of coronagraphs and tolerancing optical
surfaces for high-contrast imaging requires a detailed accounting of
diffraction effects. Unlike simple Fraunhofer diffraction modeling, near and
far-field diffraction effects, such as the Talbot effect, are captured by
plane-to-plane propagation using Fresnel and angular spectrum propagation. This
approach requires a sequence of computationally intensive Fourier transforms
and quadratic phase functions, which limit the design and aberration
sensitivity parameter space which can be explored at high-fidelity in the
course of coronagraph design. This study presents the results of optimizing the
multi-surface propagation module of the open source Physical Optics Propagation
in PYthon (POPPY) package. This optimization was performed by implementing and
benchmarking Fourier transforms and array operations on graphics processing
units, as well as optimizing multithreaded numerical calculations using the
NumExpr python library where appropriate, to speed the end-to-end simulation of
observatory and coronagraph optical systems. Using realistic systems, this
study demonstrates a greater than five-fold decrease in wall-clock runtime over
POPPY's previous implementation and describes opportunities for further
improvements in diffraction modeling performance.Comment: Presented at SPIE ASTI 2018, Austin Texas. 11 pages, 6 figure
- âŠ