Search CORE

40 research outputs found

FourierPIM: High-Throughput In-Memory Fast Fourier Transform and Polynomial Multiplication

Author: Boneh Yahav
Gazit Gonen
Kvatinsky Shahar
Leitersdorf Orian
Ronen Ronny
Publication venue: 'Elsevier BV'
Publication date: 05/04/2023
Field of study

The Discrete Fourier Transform (DFT) is essential for various applications ranging from signal processing to convolution and polynomial multiplication. The groundbreaking Fast Fourier Transform (FFT) algorithm reduces DFT time complexity from the naive O(n^2) to O(n log n), and recent works have sought further acceleration through parallel architectures such as GPUs. Unfortunately, accelerators such as GPUs cannot exploit their full computing capabilities as memory access becomes the bottleneck. Therefore, this paper accelerates the FFT algorithm using digital Processing-in-Memory (PIM) architectures that shift computation into the memory by exploiting physical devices capable of storage and logic (e.g., memristors). We propose an O(log n) in-memory FFT algorithm that can also be performed in parallel across multiple arrays for high-throughput batched execution, supporting both fixed-point and floating-point numbers. Through the convolution theorem, we extend this algorithm to O(log n) polynomial multiplication - a fundamental task for applications such as cryptography. We evaluate FourierPIM on a publicly-available cycle-accurate simulator that verifies both correctness and performance, and demonstrate 5-15x throughput and 4-13x energy improvement over the NVIDIA cuFFT library on state-of-the-art GPUs for FFT and polynomial multiplication

arXiv.org e-Print Archive

Directory of Open Access Journals

Best of both latency and throughput

Author: Ed Grochowski
Hong Wang
John Shen
Ronny Ronen
Publication venue
Publication date: 01/01/2004
Field of study

Abstrac

CiteSeerX

A Model of Technological Progress in the Microprocessor Industry

Author: Ana Aizcorbe
Ariel Pakes
Avinash K Dixit
Brett R Gordon
C Berglund
Charles R Hulten
Dale W
Dan Hutcheson
David Blackwell
Gordon E Moore
Heidrun C Hoppe
John Sutton
Kenneth Flamm
Nancy L Stokey
Robert J Gordon
Ronald Goettler
Ronny Ronen
Samuel Kortum
Semiconductor-International
Shekhar Borkar
Stephanie Condon
Stephen D Oliner
Ulrich Doraszelski
Unni Pillai
William D Nordhaus
Y H Farzin
Yves Balcer
Zvi Griliches
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Crossref

Impact of age and sex on left ventricular function determined by coronary computed tomographic angiography

__Background__ Left ventricular (LV) volumetric and functional parameters measured with cardiac computed tomography (cardiac CT) augment risk prediction and discrimination for future mortality. Gender- and age-specific standard values for LV dimensions and systolic function obtained by 64-slice cardiac CT are lacking __Methods and results __ 1155 patients from the Coronary CT Angiography EvaluatioN For Clinical Outcomes: An InteRnational Multicenterregistry (54.5% males, mean age 53.1 + 12.4 years, range: 18 – 92 years) without known coronary artery disease (CAD), structural heart disease, diabetes, or hypertension who underwent cardiac CT for various indications were categorized according to age and sex. A cardiac CT data acquisition protocol was used that allowed volumetric measuring of LV function. Image interpretation was performed at each site. Patients with significant CAD (.50% stenosis) on cardiac CT were excluded from the analysis. Overall, mean left ventricular ejection fraction (LVEF) was higher in women when compared with men (66.6 + 7.7% vs. 64.6 + 8.1%, P, 0.001). This gender-difference in overall LVEF was caused by a significantly higher LVEF in women ≥70 years when compared with men ≥70 years (69.95 + 8.89% vs. 65.50 + 9.42%, P ¼ 0.004). Accordingly, a significant increase in LVEF was observed with age (P ¼ 0.005 for males and P, 0.001 for females), which was more pronounced in females (5.21%) than in males (2.6%). LV end-diastolic volume decreased in females from 122.48+27.87 (,40 years) to 95.56+23.17 (.70 years; P, 0.001) and in males from 155.22+35.07 (,40 years) to 130.26+27.18 (.70 years; P, 0.001). __Conclusion__ Our findings indicate that the LV undergoes a lifelong remodelling and highlight the need for age and gender adjusted reference values

Erasmus University Digital Repository

Filtering Techniques to Improve Trace-Cache Efficiency

Author: Avi Mendelson
Roni Rosner
Ronny Ronen
Publication venue
Publication date
Field of study

The trace cache is becoming an important building block of modern, wide-issue, processors. So far, trace cache related research has been focused on increasing fetch bandwidth. Trace-caches have been shown to effectively increase the number of “useful ” instructions that can be fetched into the machine, thus enabling more instructions to be executed each cycle. However, trace cache has another important benefit that got less attention in recent research: especially for variable length ISA, such as Intel’s IA-32 architecture (X86), reducing instruction decoding power is particularly attractive. Keeping the instruction traces in decoded format, implies the decoding power is only paid upon the build of a trace, thus reducing the overall power consumption of th

CiteSeerX

PIMDB: Understanding Bulk-Bitwise Processing In-Memory Through Database Analytics

Author: Kimelfeld Benny
Kvatinsky Shahar
Perach Ben
Ronen Ronny
Publication venue
Publication date: 20/03/2022
Field of study

Bulk-bitwise processing-in-memory (PIM), where large bitwise operations are performed in parallel by the memory array itself, is an emerging form of computation with the potential to mitigate the memory wall problem. This paper examines the capabilities of bulk-bitwise PIM by constructing PIMDB, a fully-digital system based on memristive stateful logic, utilizing and focusing on in-memory bulk-bitwise operations, designed to accelerate a real-life workload: analytical processing of relational databases. We introduce a host processor programming model to support bulk-bitwise PIM in virtual memory, develop techniques to efficiently perform in-memory filtering and aggregation operations, and present the mapping of the application into the memory. When tested against an equivalent in-memory database on the same host system, our approach substantially lowers the number of required memory read operations, thus accelerating TPC-H filter operations by 1.6