Search CORE

32,804 research outputs found

Stream Fusion, to Completeness

Author: ACM
Biboudis A.
Biboudis A.
Jones S. Peyton
Kiselyov O.
Pouzet M.
Prokopec A.
Taha W.
Waters R. C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/12/2016
Field of study

Stream processing is mainstream (again): Widely-used stream libraries are now available for virtually all modern OO and functional languages, from Java to C# to Scala to OCaml to Haskell. Yet expressivity and performance are still lacking. For instance, the popular, well-optimized Java 8 streams do not support the zip operator and are still an order of magnitude slower than hand-written loops. We present the first approach that represents the full generality of stream processing and eliminates overheads, via the use of staging. It is based on an unusually rich semantic model of stream interaction. We support any combination of zipping, nesting (or flat-mapping), sub-ranging, filtering, mapping-of finite or infinite streams. Our model captures idiosyncrasies that a programmer uses in optimizing stream pipelines, such as rate differences and the choice of a "for" vs. "while" loops. Our approach delivers hand-written-like code, but automatically. It explicitly avoids the reliance on black-box optimizers and sufficiently-smart compilers, offering highest, guaranteed and portable performance. Our approach relies on high-level concepts that are then readily mapped into an implementation. Accordingly, we have two distinct implementations: an OCaml stream library, staged via MetaOCaml, and a Scala library for the JVM, staged via LMS. In both cases, we derive libraries richer and simultaneously many tens of times faster than past work. We greatly exceed in performance the standard stream libraries available in Java, Scala and OCaml, including the well-optimized Java 8 streams

arXiv.org e-Print Archive

Crossref

High-level synthesis optimization for blocked floating-point matrix multiplication

Author: D'Hollander Erik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

In the last decade floating-point matrix multiplication on FPGAs has been studied extensively and efficient architectures as well as detailed performance models have been developed. By design these IP cores take a fixed footprint which not necessarily optimizes the use of all available resources. Moreover, the low-level architectures are not easily amenable to a parameterized synthesis. In this paper high-level synthesis is used to fine-tune the configuration parameters in order to achieve the highest performance with maximal resource utilization. An\ exploration strategy is presented to optimize the use of critical resources (DSPs, memory) for any given FPGA. To account for the limited memory size on the FPGA, a block-oriented matrix multiplication is organized such that the block summation is done on the CPU while the block multiplication occurs on the logic fabric simultaneously. The communication overhead between the CPU and the FPGA is minimized by streaming the blocks in a Gray code ordering scheme which maximizes the data reuse for consecutive block matrix product calculations. Using high-level synthesis optimization, the programmable logic operates at 93% of the theoretical peak performance and the combined CPU-FPGA design achieves 76% of the available hardware processing speed for the floating-point multiplication of 2K by 2K matrices

Ghent University Academic Bibliography

First Season QUIET Observations: Measurements of Cosmic Microwave Background Polarization Power Spectra at 43 GHz in the Multipole Range 25 ≤ ℓ ≤ 475

Author: Bischoff C.
Cleary K.
Pearson T. J.
Radford S. J. E.
Readhead A. C. S.
Reeves R.
Richards J. L.
Sheperd M. C.
Publication venue: 'American Astronomical Society'
Publication date: 10/11/2011
Field of study

The Q/U Imaging ExperimenT (QUIET) employs coherent receivers at 43 GHz and 94 GHz, operating on the Chajnantor plateau in the Atacama Desert in Chile, to measure the anisotropy in the polarization of the cosmic microwave background (CMB). QUIET primarily targets the B modes from primordial gravitational waves. The combination of these frequencies gives sensitivity to foreground contributions from diffuse Galactic synchrotron radiation. Between 2008 October and 2010 December, over 10,000 hr of data were collected, first with the 19 element 43 GHz array (3458 hr) and then with the 90 element 94 GHz array. Each array observes the same four fields, selected for low foregrounds, together covering ≈1000 deg^2. This paper reports initial results from the 43 GHz receiver, which has an array sensitivity to CMB fluctuations of 69 μK√s. The data were extensively studied with a large suite of null tests before the power spectra, determined with two independent pipelines, were examined. Analysis choices, including data selection, were modified until the null tests passed. Cross-correlating maps with different telescope pointings is used to eliminate a bias. This paper reports the EE, BB, and EB power spectra in the multipole range ℓ = 25-475. With the exception of the lowest multipole bin for one of the fields, where a polarized foreground, consistent with Galactic synchrotron radiation, is detected with 3σ significance, the E-mode spectrum is consistent with the ΛCDM model, confirming the only previous detection of the first acoustic peak. The B-mode spectrum is consistent with zero, leading to a measurement of the tensor-to-scalar ratio of r = 0.35^(+1.06)_(–0.87). The combination of a new time-stream "double-demodulation" technique, side-fed Dragonian optics, natural sky rotation, and frequent boresight rotation leads to the lowest level of systematic contamination in the B-mode power so far reported, below the level of r = 0.1

Caltech Authors

The Design And Vlsi Implementation Of Digital Arithmatic Processors - A Case Study Of A Generalized Pipeline Cellular Array

Author: Xie Yudi
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2015
Field of study

A generalized pipeline array appeared in IEEE transaction in 1974. The array appeared in a few textbooks on computer arithmetic. From time to time, a number of papers appeared which reflected the modifications of this array. The objective of this thesis is to present the design and VLSI implementation of this array, which can add, subtract, multiply, divide, square and square root of binary numbers. In this thesis, we suggest a step-by-step procedure by which the design can be sent to MOSIS and to get the fabricated chip back. The array has been extended from 5 rows to 7 rows so that the extended operations can be performed. In particular, a procedure is developed by which the design and the implementation methodologies are suitable for 40 pin and 500 nm technologies. An algorithm has been developed by which one can predict and advance the maximum size and performance of the array. In addition, to increase data processing throughput, the extension of pipelining is conducted based on the original design. It is hoped that the design and implementation done here will go a long way in the development of advanced processors

Digital Commons@Wayne State University

Development of a strategy for calibrating the novel SiPM camera of the SST-1M telescope proposed for the Cherenkov Telescope Array

CTA will comprise a sub-array of up to 70 small size telescopes (SSTs) at the southern array. The SST-1M project, a 4 m-diameter Davies Cotton telescope with 9 degrees FoV and a 1296 pixels SiPM camera, is designed to meet the requirements of the next generation ground based gamma-ray observatory CTA in the energy range above 3 TeV. Silicon photomultipliers (SiPM) cameras of gamma-ray telescopes can achieve good performance even during high night sky background conditions. Defining a fully automated calibration strategy of SiPM cameras is of great importance for large scale production validation and online calibration. The SST-1M sub-consortium developed a software compatible with CTA pipeline software (CTApipe). The calibration of the SST-1M camera is based on the Camera Test Setup (CTS), a set of LED boards mounted in front of the camera. The CTS LEDs are operated in pulsed or continuous mode to emulate signal and night sky background respectively. Continuous and pulsed light data analysis allows us to extract single pixel calibration parameters to be used during CTA operation.Comment: All CTA contributions at arXiv:1709.0348

arXiv.org e-Print Archive

DESY Publication Database

Crossref

DESY