Search CORE

216 research outputs found

Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

Author: Boppu Srinivas
Cenkeramaddi Linga Reddy
Dhilleswararao Pudi
Manikandan M. Sabarimalai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio

Agder University Research Archive

Turbo Bayesian Compressed Sensing

Author: Yang Depeng
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2011
Field of study

Compressed sensing (CS) theory specifies a new signal acquisition approach, potentially allowing the acquisition of signals at a much lower data rate than the Nyquist sampling rate. In CS, the signal is not directly acquired but reconstructed from a few measurements. One of the key problems in CS is how to recover the original signal from measurements in the presence of noise. This dissertation addresses signal reconstruction problems in CS. First, a feedback structure and signal recovery algorithm, orthogonal pruning pursuit (OPP), is proposed to exploit the prior knowledge to reconstruct the signal in the noise-free situation. To handle the noise, a noise-aware signal reconstruction algorithm based on Bayesian Compressed Sensing (BCS) is developed. Moreover, a novel Turbo Bayesian Compressed Sensing (TBCS) algorithm is developed for joint signal reconstruction by exploiting both spatial and temporal redundancy. Then, the TBCS algorithm is applied to a UWB positioning system for achieving mm-accuracy with low sampling rate ADCs. Finally, hardware implementation of BCS signal reconstruction on FPGAs and GPUs is investigated. Implementation on GPUs and FPGAs of parallel Cholesky decomposition, which is a key component of BCS, is explored. Simulation results on software and hardware have demonstrated that OPP and TBCS outperform previous approaches, with UWB positioning accuracy improved by 12.8x. The accelerated computation helps enable real-time application of this work

University of Tennessee, Knoxville: Trace

High Performance Free Surface LBM on GPUs

Author: Lehmann Moritz
Publication venue
Publication date: 01/01/2021
Field of study

EPub Bayreuth

A hybrid parallel framework for computational solid mechanics

Author: Fidkowski Piotr
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 95-98).A novel, hybrid parallel C++ framework for computational solid mechanics is developed and presented. The modular and extensible design of this framework allows it to support a wide variety of numerical schemes including discontinuous Galerkin formulations and higher order methods, multiphysics problems, hybrid meshes made of different types of elements and a number of different linear and non-linear solvers. In addition, native, seamless support is included for hardware acceleration by Graphics Processing Units (GPUs) via NVIDIA's CUDA architecture for both single GPU workstations and heterogenous clusters of GPUs. The capabilities of the framework are demonstrated through a series of sample problems, including a laser induced cylindrical shock propagation, a dynamic problem involving a micro-truss array made of millions of elements, and a tension problem involving a shape memory alloy with a multifield formulation to model the superelastic effect.by Piotr Fidkowski.S.M

DSpace@MIT

Real-time registration and simulation in medical imaging

Author: Shams Ramtin
Publication venue
Publication date: 21/11/2018
Field of study

The Australian National University

CUDA OPTIMIZATION OF A CLASS OF ITERATIVE LINEAR ALGEBRA SOLVERS

Author
Publication venue
Publication date
Field of study

KFUPM ePrints