Search CORE

31 research outputs found

FPGA-accelerated group-by aggregation using synchronizing caches

Author: Absalyamov I
Budhkar P
Halstead RJ
Najjar WA
Tsotras VJ
Windh S
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Recent trends in hardware have dramatically dropped the price of RAM and shifted focus from systems operating on disk-resident data to in-memory solutions. In this environment high memory access latency, also known as memory wall, becomes the biggest data processing bottleneck. Traditional CPU-based architectures solved this problem by introducing large cache hierarchies. However algorithms which experience poor locality can limit the benefits of caching. In turn, hardware multithreading provides a generic solution that does not rely on algorithm-specific locality properties. In this paper we present an FPGA-accelerated implementation of in-memory group-by hash aggregation. Our design relies on hardware multithreading to efficiently mask long memory access latency by implementing a custom operation datapath on FPGA. We propose using CAMs (Content Addressable Memories) as a mechanism of synchronization and local pre-aggregation. To the best of our knowledge this is the first work, which uses CAMs as a synchronizing cache. We evaluate aggregation throughput against the state-of-the-art multithreaded software implementations and demonstrate that the FPGA-accelerated approach significantly outperforms them on large grouping key cardinalities and yields speedup up to 10x

Crossref

eScholarship - University of California

A population-specific material model for sagittal craniosynostosis to predict surgical shape outcomes

Author: A Borghi
A Borghi
A Borghi
A Carriero
A Marghoub
AL Greensmith
Alessandro Borghi
B Coats
C Davis
C Davis
C Lauritzen
CAA Beaumont
CGK Lauritzen
D Larysz
David Dunaway
E Arnaud
EM Lillie
F Cosentino
F Zhang
Federica Ruggiero
G Zhang
GM Bosi
GM Zakhary
Greg James
J Libby
J O’Hara
JA Fearon
JA Taylor
JB Wang
Juling Ong
Justine O’Hara
KA Eley
KA Eley
L Imbert
LA Opperman
M Moazen
M Tenhagen
M-LC van Veelen
M-LC van Veelen
N Rodriguez-Florez
Naiara Rodriguez Florez
O Malde
O Trabelsi
Owase Jeelani
P Windh
PGM Knoops
PGM Knoops
PJ Anderson
PJ Pickhardt
RM Garza
RS Lakes
RV Ocampo
RWF Breakey
S Ghadimi
S Wall
SC Aung
Silvia Schievano
SS Margulies
T de Jong
T Nagasao
W Rodgers
W Wolański
W Yan
X Li
X Qian
X Yue
Z Li
Z Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Sagittal craniosynostosis consists of premature fusion (ossification) of the sagittal suture during infancy, resulting in head deformity and brain growth restriction. Spring-assisted cranioplasty (SAC) entails skull incisions to free the fused suture and insertion of two springs (metallic distractors) to promote cranial reshaping. Although safe and effective, SAC outcomes remain uncertain. We aimed hereby to obtain and validate a skull material model for SAC outcome prediction. Computed tomography data relative to 18 patients were processed to simulate surgical cuts and spring location. A rescaling model for age matching was created using retrospective data and validated. Design of experiments was used to assess the effect of different material property parameters on the model output. Subsequent material optimization—using retrospective clinical spring measurements—was performed for nine patients. A population-derived material model was obtained and applied to the whole population. Results showed that bone Young’s modulus and relaxation modulus had the largest effect on the model predictions: the use of the population-derived material model had a negligible effect on improving the prediction of on-table opening while significantly improved the prediction of spring kinematics at follow-up. The model was validated using on-table 3D scans for nine patients: the predicted head shape approximated within 2 mm the 3D scan model in 80% of the surface points, in 8 out of 9 patients. The accuracy and reliability of the developed computational model of SAC were increased using population data: this tool is now ready for prospective clinical application

Crossref

eBiltegia

Recommended from our members

Performance Improvements and Congestion Reduction for Routing-Based Synthesis for Digital Microfluidic Biochips

Author: Brisk P
Grissom DT
Phung C
Pop P
Windh S
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

Routing-based synthesis for digital microfluidic biochips yields faster assay execution times compared to module-based synthesis. We show that routing-based synthesis can lead to deadlocks and livelocks in specific cases, and that dynamically detecting them and adjusting the probabilities associated with different droplet movements can alleviate the situation. We also introduce methods to improve the efficiency of wash droplet routing during routing-based synthesis, and to support nonreconfigurable modules, such as integrated heaters and detectors. We obtain increases in success rates when dealing with resource-constrained chips and reductions in average assay execution time

eScholarship - University of California

Spring-mediated sagittal craniosynostosis treatment at the Children’s Hospital of Philadelphia: technical notes and literature review

Author: Chaichana KL
Lannelongue M
Shillito J
Windh P
Publication venue: 'Journal of Neurosurgery Publishing Group (JNSPG)'
Publication date
Field of study

Crossref

Recommended from our members

Accelerating in-memory database selections using latency masking hardware threads

Author: Absalyamov I
Budhkar P
Najjar WA
Tsotras VJ
Windh S
Zois V
Publication venue: eScholarship, University of California
Publication date: 01/04/2019
Field of study

Inexpensive DRAMs have created new opportunities for in-memory data analytics. However, the major bottleneck in such systems is high memory access latency. Traditionally, this problem is solved with large cache hierarchies that only benefit regular applications. Alternatively, many data-intensive applications exhibit irregular behavior. Hardware multithreading can better cope with high latency seen in such applications. This article implements a multithreaded prototype (MTP) on FPGAs for the relational selection operator that exhibits control flow irregularity. On a standard TPC-H query evaluation, MTP achieves a bandwidth utilization of 83%, while the CPU and the GPU implementations achieve 61% and 64%, respectively. Besides being bandwidth efficient, MTP is also 14.2× and 4.2× more power efficient than CPU and GPU, respectively

eScholarship - University of California

Recommended from our members

FPGA - Accelerated group-by aggregation using synchronizing caches

Author: Absalyamov I
Budhkar P
Halstead RJ
Najjar WA
Tsotras VJ
Windh S
Publication venue: eScholarship, University of California
Publication date: 26/06/2016
Field of study

eScholarship - University of California

Economic cost analysis of continuous-season-long versus rotational grazing systems

Author: Derner Justin D.
Lee Brian P.
Paisley Steven I.
Ritten John P.
Windh Jessica L.
Publication venue
Publication date: 10/04/2019
Field of study

AgEcon Search - Research in Agricultural & Applied Economics

AgEcon Search: Research in Agricultural and Applied Economics

Recommended from our members

High-level language tools for reconfigurable computing

Author: Budhkar P
Halstead RJ
Hussaini O
Luna Z
Ma X
Najjar WA
Windh S
Publication venue: eScholarship, University of California
Publication date: 01/03/2015
Field of study

In the past decade or so we have witnessed a steadily increasing interest in FPGAs as hardware accelerators: they provide an excellent mid-point between the reprogrammability of software devices (CPUs, DSPs, and GPUs) and the performance and low energy consumption of ASICs. However, the programmability of FPGA-based accelerators remains one of the biggest obstacles to their wider adoption. Developing FPGA programs requires extensive familiarity with hardware design and experience with a tedious and complex tool chain. For half a century, layers of abstractions have been developed that simplify the software development process: languages, compilers, dynamically linked libraries, operating systems, APIs, etc. Very little, if any, such abstractions exist in the development of FPGA programs. In this paper, we review the history of using FPGAs as hardware accelerators and summarize the challenges facing the raising of the programming abstraction layers. We survey five High-Level Language tools for the development of FPGA programs: Xilinx Vivado, Altera OpenCL, BluespecBSV, ROCCC, and LegUp to provide an overview of their tool flow, the optimizations they provide, and a qualitative analysis of their hardware implementations of high level code

eScholarship - University of California

Recommended from our members

High-level language tools for reconfigurable computing

Author: Budhkar P
Halstead RJ
Hussaini O
Luna Z
Ma X
Najjar WA
Windh S
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

eScholarship - University of California

Sphingosine 1‐phosphate regulates myogenic differentiation: a major role for S1P 2

Author: Charg S. B. P.
Davaille J.
Ishii I.
Ishii I.
Kaliman P.
Meacci E.
Pyne S.
Windh R. T.
Publication venue: 'FASEB'
Publication date
Field of study

Crossref