3,814 research outputs found
CMOS VLSI correlator design for radio-astronomical signal processing : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Engineering at Massey University, Auckland, New Zealand
Multi-element radio telescopes employ methods of indirect imaging to capture the image of the sky. These methods are in contrast to direct imaging methods whereby the image is constructed from sensor measurements directly and involve extensive
signal processing on antenna signals. The Square Kilometre Array, or the SKA, is a future radio telescope of this type that, once built, will become the largest telescope in the world. The unprecedented scale of the SKA requires novel solutions to be
developed for its signal processing pipeline one of the most resource-consuming parts of which is the correlator. The SKA uses the FX correlator construction that consists of two parts: the F part that translates antenna signals into frequency domain and the X part that cross-correlates these signals between each other. This research focuses on the integrated circuit design and VLSI implementation issues of the X part of a very large FX correlator in 28 nm and 130 nm CMOS. The correlator’s main processing operation is the complex multiply-accumulation (CMAC) for which custom 28 nm CMAC designs are presented and evaluated. Performance of various memories inside the correlator also affects overall efficiency, and input-buffered and output-buffered approaches are considered with the goal of improving upon it. For output-buffered designs, custom memory control circuits have been designed and prototyped in 130 nm that improve upon eDRAM by taking advantage of sequential access patterns. For the input-buffered architecture, a new scheme is proposed that decreases the usage of the input-buffer memory by a third by making use of multiple accumulators in every CMAC. Because cross-correlation is a very data-intensive process, high-performance SerDes I/O is essential to any practical ASIC implementation. On the I/O design, the 28 nm full-rate transmitter delivering 15 Gbps per lane is presented. This design consists of the scrambler, the serialiser, the digital VCO with analog fine-tuning and the SST driver including features of a 4-tap FFE, impedance tuning and amplitude tuning
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
The rising popularity of intelligent mobile devices and the daunting
computational cost of deep learning-based models call for efficient and
accurate on-device inference schemes. We propose a quantization scheme that
allows inference to be carried out using integer-only arithmetic, which can be
implemented more efficiently than floating point inference on commonly
available integer-only hardware. We also co-design a training procedure to
preserve end-to-end model accuracy post quantization. As a result, the proposed
quantization scheme improves the tradeoff between accuracy and on-device
latency. The improvements are significant even on MobileNets, a model family
known for run-time efficiency, and are demonstrated in ImageNet classification
and COCO detection on popular CPUs.Comment: 14 pages, 12 figure
- …