40 research outputs found

    FourierPIM: High-Throughput In-Memory Fast Fourier Transform and Polynomial Multiplication

    Full text link
    The Discrete Fourier Transform (DFT) is essential for various applications ranging from signal processing to convolution and polynomial multiplication. The groundbreaking Fast Fourier Transform (FFT) algorithm reduces DFT time complexity from the naive O(n^2) to O(n log n), and recent works have sought further acceleration through parallel architectures such as GPUs. Unfortunately, accelerators such as GPUs cannot exploit their full computing capabilities as memory access becomes the bottleneck. Therefore, this paper accelerates the FFT algorithm using digital Processing-in-Memory (PIM) architectures that shift computation into the memory by exploiting physical devices capable of storage and logic (e.g., memristors). We propose an O(log n) in-memory FFT algorithm that can also be performed in parallel across multiple arrays for high-throughput batched execution, supporting both fixed-point and floating-point numbers. Through the convolution theorem, we extend this algorithm to O(log n) polynomial multiplication - a fundamental task for applications such as cryptography. We evaluate FourierPIM on a publicly-available cycle-accurate simulator that verifies both correctness and performance, and demonstrate 5-15x throughput and 4-13x energy improvement over the NVIDIA cuFFT library on state-of-the-art GPUs for FFT and polynomial multiplication

    Best of both latency and throughput

    Get PDF
    Abstrac

    Impact of age and sex on left ventricular function determined by coronary computed tomographic angiography

    Get PDF
    __Background__ Left ventricular (LV) volumetric and functional parameters measured with cardiac computed tomography (cardiac CT) augment risk prediction and discrimination for future mortality. Gender- and age-specific standard values for LV dimensions and systolic function obtained by 64-slice cardiac CT are lacking __Methods and results __ 1155 patients from the Coronary CT Angiography EvaluatioN For Clinical Outcomes: An InteRnational Multicenterregistry (54.5% males, mean age 53.1 + 12.4 years, range: 18 – 92 years) without known coronary artery disease (CAD), structural heart disease, diabetes, or hypertension who underwent cardiac CT for various indications were categorized according to age and sex. A cardiac CT data acquisition protocol was used that allowed volumetric measuring of LV function. Image interpretation was performed at each site. Patients with significant CAD (.50% stenosis) on cardiac CT were excluded from the analysis. Overall, mean left ventricular ejection fraction (LVEF) was higher in women when compared with men (66.6 + 7.7% vs. 64.6 + 8.1%, P, 0.001). This gender-difference in overall LVEF was caused by a significantly higher LVEF in women ≥70 years when compared with men ≥70 years (69.95 + 8.89% vs. 65.50 + 9.42%, P ¼ 0.004). Accordingly, a significant increase in LVEF was observed with age (P ¼ 0.005 for males and P, 0.001 for females), which was more pronounced in females (5.21%) than in males (2.6%). LV end-diastolic volume decreased in females from 122.48+27.87 (,40 years) to 95.56+23.17 (.70 years; P, 0.001) and in males from 155.22+35.07 (,40 years) to 130.26+27.18 (.70 years; P, 0.001). __Conclusion__ Our findings indicate that the LV undergoes a lifelong remodelling and highlight the need for age and gender adjusted reference values

    Filtering Techniques to Improve Trace-Cache Efficiency

    No full text
    The trace cache is becoming an important building block of modern, wide-issue, processors. So far, trace cache related research has been focused on increasing fetch bandwidth. Trace-caches have been shown to effectively increase the number of “useful ” instructions that can be fetched into the machine, thus enabling more instructions to be executed each cycle. However, trace cache has another important benefit that got less attention in recent research: especially for variable length ISA, such as Intel’s IA-32 architecture (X86), reducing instruction decoding power is particularly attractive. Keeping the instruction traces in decoded format, implies the decoding power is only paid upon the build of a trace, thus reducing the overall power consumption of th

    PIMDB: Understanding Bulk-Bitwise Processing In-Memory Through Database Analytics

    Full text link
    Bulk-bitwise processing-in-memory (PIM), where large bitwise operations are performed in parallel by the memory array itself, is an emerging form of computation with the potential to mitigate the memory wall problem. This paper examines the capabilities of bulk-bitwise PIM by constructing PIMDB, a fully-digital system based on memristive stateful logic, utilizing and focusing on in-memory bulk-bitwise operations, designed to accelerate a real-life workload: analytical processing of relational databases. We introduce a host processor programming model to support bulk-bitwise PIM in virtual memory, develop techniques to efficiently perform in-memory filtering and aggregation operations, and present the mapping of the application into the memory. When tested against an equivalent in-memory database on the same host system, our approach substantially lowers the number of required memory read operations, thus accelerating TPC-H filter operations by 1.6×\times--18×\times and full queries by 56×\times--608×\times, while reducing the energy consumption by 1.7×\times--18.6×\times and 0.81×\times--12×\times for these benchmarks, respectively. Our extensive evaluation uses the gem5 full-system simulation environment
    corecore