Search CORE

10 research outputs found

SPICE²: A Spatial, Parallel Architecture for Accelerating the Spice Circuit Simulator

Author: Kapre Nachiket Ganesh
Publication venue
Publication date: 01/01/2011
Field of study

Spatial processing of sparse, irregular floating-point computation using a single FPGA enables up to an order of magnitude speedup (mean 2.8X speedup) over a conventional microprocessor for the SPICE circuit simulator. We deliver this speedup using a hybrid parallel architecture that spatially implements the heterogeneous forms of parallelism available in SPICE. We decompose SPICE into its three constituent phases: Model-Evaluation, Sparse Matrix-Solve, and Iteration Control and parallelize each phase independently. We exploit data-parallel device evaluations in the Model-Evaluation phase, sparse dataflow parallelism in the Sparse Matrix-Solve phase and compose the complete design in streaming fashion. We name our parallel architecture SPICE²: Spatial Processors Interconnected for Concurrent Execution for accelerating the SPICE circuit simulator. We program the parallel architecture with a high-level, domain-specific framework that identifies, exposes and exploits parallelism available in the SPICE circuit simulator. This design is optimized with an auto-tuner that can scale the design to use larger FPGA capacities without expert intervention and can even target other parallel architectures with the assistance of automated code-generation. This FPGA architecture is able to outperform conventional processors due to a combination of factors including high utilization of statically-scheduled resources, low-overhead dataflow scheduling of fine-grained tasks, and overlapped processing of the control algorithms. We demonstrate that we can independently accelerate Model-Evaluation by a mean factor of 6.5X(1.4--23X) across a range of non-linear device models and Matrix-Solve by 2.4X(0.6--13X) across various benchmark matrices while delivering a mean combined speedup of 2.8X(0.2--11X) for the two together when comparing a Xilinx Virtex-6 LX760 (40nm) with an Intel Core i7 965 (45nm). With our high-level framework, we can also accelerate Single-Precision Model-Evaluation on NVIDIA GPUs, ATI GPUs, IBM Cell, and Sun Niagara 2 architectures. We expect approaches based on exploiting spatial parallelism to become important as frequency scaling slows down and modern processing architectures turn to parallelism (\eg multi-core, GPUs) due to constraints of power consumption. This thesis shows how to express, exploit and optimize spatial parallelism for an important class of problems that are challenging to parallelize.</p

Caltech Theses and Dissertations

Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

Author: Becker Jürgen
Hübner Michael
Lagadec Loïc
Sander Oliver
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2010
Field of study

ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

KITopen

Recommended from our members

Efficient FPGA implementation and power modelling of image and signal processing IP cores

Author: Chandrasekaran Shrutisagar
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2007
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Field Programmable Gate Arrays (FPGAs) are the technology of choice in a number ofimage and signal processing application areas such as consumer electronics, instrumentation, medical data processing and avionics due to their reasonable energy consumption, high performance, security, low design-turnaround time and reconfigurability. Low power FPGA devices are also emerging as competitive solutions for mobile and thermally constrained platforms. Most computationally intensive image and signal processing algorithms also consume a lot of power leading to a number of issues including reduced mobility, reliability concerns and increased design cost among others. Power dissipation has become one of the most important challenges, particularly for FPGAs. Addressing this problem requires optimisation and awareness at all levels in the design flow. The key achievements of the work presented in this thesis are summarised here. Behavioural level optimisation strategies have been used for implementing matrix product and inner product through the use of mathematical techniques such as Distributed Arithmetic (DA) and its variations including offset binary coding, sparse factorisation and novel vector level transformations. Applications to test the impact of these algorithmic and arithmetic transformations include the fast Hadamard/Walsh transforms and Gaussian mixture models. Complete design space exploration has been performed on these cores, and where appropriate, they have been shown to clearly outperform comparable existing implementations. At the architectural level, strategies such as parallelism, pipelining and systolisation have been successfully applied for the design and optimisation of a number of cores including colour space conversion, finite Radon transform, finite ridgelet transform and circular convolution. A pioneering study into the influence of supply voltage scaling for FPGA based designs, used in conjunction with performance enhancing strategies such as parallelism and pipelining has been performed. Initial results are very promising and indicated significant potential for future research in this area. A key contribution of this work includes the development of a novel high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called Functional Level Power Analysis and Modelling (FLPAM). FLPAM is scalable, platform independent and compares favourably with existing approaches. A hybrid, top-down design flow paradigm integrating FLPAM with commercially available design tools for systematic optimisation of IP cores has also been developed

Brunel University Research Archive

Quantum-centric Supercomputing for Materials Science: A Perspective on Challenges and Future Directions

Author: Alexeev Yuri
Amsler Maximilian
Baity Paul
Barroca Marco Antonio
Bassini Sanzio
Battelle Torey
Camps Daan
Casanova David
Choi Young jai
Chong Frederic T.
Chung Charles
Codella Chris
Corcoles Antonio D.
Cruise James
de Jong Wibe Albert
Di Meglio Alberto
Dubois Jonathan
Duran Ivan
Eckl Thomas
Economou Sophia
Eidenbenz Stephan
Elmegreen Bruce
Fare Clyde
Faro Ismael
Fernández Cristina Sanz
Ferreira Rodrigo Neumann Barros
Fuji Keisuke
Fuller Bryce
Gagliardi Laura
Galli Giulia
Glick Jennifer R.
Gobbi Isacco
Gokhale Pranav
Gonzalez Salvador de la Puente
Greiner Johannes
Gropp Bill
Grossi Michele
Gull Emmanuel
Healy Burns
Huang Benchen
Humble Travis S.
Ito Nobuyasu
Izmaylov Artur F.
Javadi-Abhari Ali
Jennewein Douglas
Jha Shantenu
Jiang Liang
Jones Barbara
Jurcevic Petar
Kirby William
Kister Stefan
Kitagawa Masahiro
Klassen Joel
Klymko Katherine
Koh Kwangwon
Kondo Masaaki
Kurkcuoglu Doga Murat
Kurowski Krzysztof
Laino Teodoro
Landfield Ryan
Leininger Matt
Leyton-Ortega Vicente
Li Ang
Lin Meifeng
Liu Junyu
Lorente Nicolas
Luckow Andre
Martiel Simon
Martin-Fernandez Francisco
Martonosi Margaret
Marvinney Claire
Medina Arcesio Castaneda
Merten Dirk
Mezzacapo Antonio
Michielsen Kristel
Mitra Abhishek
Mittal Tushar
Moon Kyungsun
Moore Joel
Motta Mario
Na Young-Hye
Nam Yunseong
Narang Prineha
Ohnishi Yu-ya
Ottaviani Daniele
Otten Matthew
Pakin Scott
Pascuzzi Vincent R.
Penault Ed
Piontek Tomasz
Pitera Jed
Rall Patrick
Ravi Gokul Subramanian
Robertson Niall
Rossi Matteo
Rydlichowski Piotr
Ryu Hoon
Samsonidze Georgy
Sato Mitsuhisa
Saurabh Nishant
Sharma Kunal
Sharma Vidushi
Shin Soyoung
Sitdikov Iskandar
Slessman George
Steiner Mathias
Suh In-Saeng
Switzer Eric
Tang Wei
Thompson Joel
Todo Synge
Tran Minh
Trenev Dimitar
Trott Christian
Tseng Huan-Hsin
Tureci Esin
Valinas David García
Vallecorsa Sofia
Wever Christopher
Wojciechowski Konrad
Wu Xiaodi
Yoo Shinjae
Yoshioka Nobuyuki
Yu Victor Wen-zhe
Yunoki Seiji
Zhuk Sergiy
Zubarev Dmitry
Publication venue
Publication date: 14/12/2023
Field of study

Computational models are an essential tool for the design, characterization, and discovery of novel materials. Hard computational tasks in materials science stretch the limits of existing high-performance supercomputing centers, consuming much of their simulation, analysis, and data resources. Quantum computing, on the other hand, is an emerging technology with the potential to accelerate many of the computational tasks needed for materials science. In order to do that, the quantum technology must interact with conventional high-performance computing in several ways: approximate results validation, identification of hard problems, and synergies in quantum-centric supercomputing. In this paper, we provide a perspective on how quantum-centric supercomputing can help address critical computational problems in materials science, the challenges to face in order to solve representative use cases, and new suggested directions.Comment: 60 pages, 14 figures; comments welcom

arXiv.org e-Print Archive

Efficient FPGA implementation and power modelling of image and signal processing IP cores

Author: Amira A
Chandrasekaran Shrutisagar
Publication venue
Publication date: 01/01/2007
Field of study

Field Programmable Gate Arrays (FPGAs) are the technology of choice in a number ofimage and signal processing application areas such as consumer electronics, instrumentation, medical data processing and avionics due to their reasonable energy consumption, high performance, security, low design-turnaround time and reconfigurability. Low power FPGA devices are also emerging as competitive solutions for mobile and thermally constrained platforms. Most computationally intensive image and signal processing algorithms also consume a lot of power leading to a number of issues including reduced mobility, reliability concerns and increased design cost among others. Power dissipation has become one of the most important challenges, particularly for FPGAs. Addressing this problem requires optimisation and awareness at all levels in the design flow. The key achievements of the work presented in this thesis are summarised here. Behavioural level optimisation strategies have been used for implementing matrix product and inner product through the use of mathematical techniques such as Distributed Arithmetic (DA) and its variations including offset binary coding, sparse factorisation and novel vector level transformations. Applications to test the impact of these algorithmic and arithmetic transformations include the fast Hadamard/Walsh transforms and Gaussian mixture models. Complete design space exploration has been performed on these cores, and where appropriate, they have been shown to clearly outperform comparable existing implementations. At the architectural level, strategies such as parallelism, pipelining and systolisation have been successfully applied for the design and optimisation of a number of cores including colour space conversion, finite Radon transform, finite ridgelet transform and circular convolution. A pioneering study into the influence of supply voltage scaling for FPGA based designs, used in conjunction with performance enhancing strategies such as parallelism and pipelining has been performed. Initial results are very promising and indicated significant potential for future research in this area. A key contribution of this work includes the development of a novel high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called Functional Level Power Analysis and Modelling (FLPAM). FLPAM is scalable, platform independent and compares favourably with existing approaches. A hybrid, top-down design flow paradigm integrating FLPAM with commercially available design tools for systematic optimisation of IP cores has also been developed.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization

Author: Kapre Nachiket
Siddhartha
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Performance of FPGA-based token dataflow architectures is often limited by the long tail distribution of parallelism in the compute paths of the dataflow graphs. This is known to limit speedup of dataflow processing of Sparse LU factorization to only 3-10x over CPUs. One reason behind the limitations is the serialization penalty of processing high-fanout nodes in the dataflow graph on traditional dataflow processing architectures. In this paper, we show how to perform one-time static fanout decomposition and selective node replication transformations to input dataflow graphs. These transformations are one-time static compute costs that are typically amortized over millions of iterations. For dataflow graphs extracted for sparse LU factorization, we demonstrate up to 2.3x speedup (1.2x geomean average) with this technique across a range of benchmark problems.Accepted versio

Crossref

DR-NTU (Digital Repository of NTU)

The Fifth NASA Symposium on VLSI Design

Author
Publication venue
Publication date
Field of study

The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

NASA Technical Reports Server

Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022

Author
Publication venue: TU Wien Academic Press
Publication date: 18/10/2022
Field of study

The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing

Directory of Open Access Books (DOAB)

Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022

Author
Publication venue
Publication date
Field of study

OAPEN Library

CMS The TriDAS Project: Technical Design Report, Volume 2: Data Acquisition and High-Level Trigger

Author: Cittolin Sergio
Rácz Attila
Sphicas Paris
Publication venue: Union of Concerned Scientists
Publication date: 01/01/2002
Field of study

CERN Document Server