Search CORE

7 research outputs found

Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

Author: Boppu Srinivas
Cenkeramaddi Linga Reddy
Dhilleswararao Pudi
Manikandan M. Sabarimalai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio

Agder University Research Archive

Localization of multi-class on-road and aerial targets using mmwave FMCW radar

Author: Boppu Srinivas
Cenkeramaddi Linga Reddy
Gupta Khushi Anil
Joshi Soumya
Manikandan M. Sabarimalai
Srinivas M.B.
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

publishedVersio

Agder University Research Archive

Methodology for Structured Data-Path Implementation in VLSI Physical Design: A Case Study

Author: Boppu Srinivas
Cenkeramaddi Linga Reddy
Harrison Samuel Jigme
Hemani Ahmed
Pudi Dhilleswararao
Stathis Dimitrios
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

State-of-the-art modern microprocessor and domain-specific accelerator designs are dominated by data-paths composed of regular structures, also known as bit-slices. Random logic placement and routing techniques may not result in an optimal layout for these data-path-dominated designs. As a result, implementation tools such as Cadence’s Innovus include a Structured Data-Path (SDP) feature that allows data-path placement to be completely customized by constraining the placement engine. A relative placement file is used to provide these constraints to the tool. However, the tool neither extracts nor automatically places the regular data-path structures. In other words, the relative placement file is not automatically generated. In this paper, we propose a semi-automated method for extracting bit-slices from the Innovus SDP flow. It has been demonstrated that the proposed method results in 17% less density or use for a pixel buffer design. At the same time, the other performance metrics are unchanged when compared to the traditional place and route flow.publishedVersio

Directory of Open Access Journals

NORA - Norwegian Open Research Archives

Agder University Research Archive

Codegenerierung für eng gekoppelte Prozessorfelder

Author: Boppu Srinivas
Publication venue
Publication date: 01/01/2015
Field of study

In this dissertation, we consider techniques for automatic code generation and code optimization of loop programs for programmable tightly coupled processor array targets. These consist of interconnected small light-weight very long instruction word cores, which can exploit both loop-level parallelism and instruction-level parallelism. These arrays are well suited for executing computeintensive nested loop applications, often providing a higher power and area efficiency compared with commercial off-the-shelf processors. They are ideal candidates for accelerating the computation of nested loop programs in future heterogeneous systems, where energy efficiency is one of the most important design goals for overall system-on-chip design. In order to harness the full compute potential of such an array, we need efficient compiler techniques which can automatically map nested loop programs onto them. Such a compiler framework is essential for increasing the productivity of designers as well as for shortening development cycles. In this context, this dissertation proposes a novel code generation and compaction approach which generates the assembly-level codes for all the processing elements in an array from a scheduled loop nest. The code generation approach itself is independent of the array size, preserves the given schedule, and is independent of the problem size. As part of this compiler framework, we also present a scalable interconnect generation approach where the connections among different processing elements are automatically generated from the same scheduled loop program. Furthermore, we consider the integration of a tightly coupled processor array into a multi-processor systemon- chip: Here, we propose the design of new hardware components such as a global controller, which generates control signals to orchestrate (synchronize) the programs running on the different processing elements, and address generators, which are required to generate the address as well as enable signals for a set of reconfigurable I/O buffers surrounding the processor array. We propose a fully programmable design of these required hardware components and add the required compiler support to generate the configuration data from the same scheduled loop program as well. In summary, the major contributions of this dissertation enable and ease the fully automated mapping of nested loop programs onto tightly coupled processor arrays

OPEN FAU Online-Publikationssystem der Friedrich-Alexander-Universität Erlangen-Nürnberg

Localization of multi-class on-road and aerial targets using mmwave FMCW radar

Author: Boppu Srinivas
Cenkeramaddi Linga Reddy
Gupta Khushi Anil
Joshi Soumya
Manikandan M. Sabarimalai
Srinivas M.B.
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

publishedVersio

NORA - Norwegian Open Research Archives

Agder University Research Archive

Hierarchical power management for adaptive tightly-coupled processor arrays

Author: Bouwens F.
Frank Hannig
Howard J.
Jürgen Teich
Kissler D.
Moritz Schmid
Saito Y.
Shravan Muddasani
Srinivas Boppu
Teich J.
Vahid Lari
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Invasive Tightly-Coupled Processor Arrays

Author: Alexandru Tanase
Feautrier Paul
Frank Hannig
Gwennup Linley
Kahn Gilles
Motomura Masato
Oliver Reiche
Srinivas Boppu
Teich Jürgen
Thiele Lothar
Vahid Lari
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref