Search CORE

801 research outputs found

A Micro Power Hardware Fabric for Embedded Computing

Author: Mehta Gayatri
Publication venue
Publication date: 25/09/2009
Field of study

Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor

D-Scholarship@Pitt

GRIDKIT: Pluggable overlay networks for Grid computing

Author: A. El-Sayed
A. Grimshaw
A. Rowstron
B. Li
F. Dabek
F. Kon
G. Coulson
H. Balakrishnan
K. Czajkowski
L. Mathy
M. Castro
M. Clark
N. Furmento
N. Parlavantzas
P. Grace
S. Floyd
S. Pallickara
Publication venue: SPRINGER-VERLAG BERLIN
Publication date: 01/01/2004
Field of study

A `second generation' approach to the provision of Grid middleware is now emerging which is built on service-oriented architecture and web services standards and technologies. However, advanced Grid applications have significant demands that are not addressed by present-day web services platforms. As one prime example, current platforms do not support the rich diversity of communication `interaction types' that are demanded by advanced applications (e.g. publish-subscribe, media streaming, peer-to-peer interaction). In the paper we describe the Gridkit middleware which augments the basic service-oriented architecture to address this particular deficiency. We particularly focus on the communications infrastructure support required to support multiple interaction types in a unified, principled and extensible manner-which we present in terms of the novel concept of pluggable overlay networks

CiteSeerX

Crossref

Lancaster E-Prints

Greedy Algorithms for Mapping onto a Coarse-grained Reconfigurable Fabric

Author: Alex K. Jones
Brady Hunsaker
Bryan A. Norman
Colin J. Ihrig
Justin Stander
Mustafa Baz
Oleg Prokopyev
Raymond R. Hoare
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Probabilistic Principle Component Analysis based Feature Extraction of Embedded System Applications with Deep Neural Network based Implementation in FPGA

Author: Bachanna Prashant
Chatterjee Sayanti
Gadgay Baswaraj
Publication venue: Auricle Global Society of Education and Research
Publication date: 10/07/2023
Field of study

The study of hardware and software systems is of major are very important advent in new devices for communication and progress in system of security. In fast pace mobile and embedded devices application in every day’s life leads some new emerging area for research in data mining field. In this we have some technologies which have demand and error free using the principle of component of PPCA. For Embedded system the applications of PCA is basically applied initially for the lessen the having different qualities especially being to simple of the data. PPCA which have the updated version of PCA which is surveyed by similarity measure. In this work, experiments are extensively carried out, using a FPGA based light weight cryptographic data set having benchmark set to check and illustrate the viability, competence, litheness which are reconfigurable embedded system which are having data mining . Which have FPGA are reconfigurable for the computing architectures for hardware and in neural network. FPGA using the multilayer Cascaded for neural network which are forward in nature (CFFNN) and Deep Neural Network also called as DNN with a huge neuron is still a thought-provoking task. This shortcoming leads to elect the FPGA capacity for a particular application we have used the method of implementation which has two neural network have been implemented and compared , namely, CFFNN and DNN. It can be shown that for reconfigurable embedded system, PPCA based data mining and Machine learning based realization can give more speed up less iteration and more space savings when we have compared it with the static conventional version

International Journal on Recent and Innovation Trends in Computing and Communication

Self-Organizing Architectures for Digital Signal Processing

Author: Gaglio Salvatore
Peri Daniele
Publication venue: 'IntechOpen'
Publication date: 01/01/2013
Field of study

IntechOpen

Crossref

Archivio istituzionale della ricerca - Università di Palermo

WindMill: A Parameterized and Pluggable CGRA Implemented by DIAG Design Flow

Author: Gu Jiangyuan
Hu Xunbo
Hu Yang
Hui Haojia
Liu Leibo
Wei Shaojun
Yin Shouyi
Publication venue
Publication date: 03/09/2023
Field of study

With the cross-fertilization of applications and the ever-increasing scale of models, the efficiency and productivity of hardware computing architectures have become inadequate. This inadequacy further exacerbates issues in design flexibility, design complexity, development cycle, and development costs (4-d problems) in divergent scenarios. To address these challenges, this paper proposed a flexible design flow called DIAG based on plugin techniques. The proposed flow guides hardware development through four layers: definition(D), implementation(I), application(A), and generation(G). Furthermore, a versatile CGRA generator called WindMill is implemented, allowing for agile generation of customized hardware accelerators based on specific application demands. Applications and algorithm tasks from three aspects is experimented. In the case of reinforcement learning algorithm, a significant performance improvement of

2.3\times

compared to GPU is achieved.Comment: 7 pages, 10 figure

arXiv.org e-Print Archive

Hierarchical Agent-based Adaptation for Self-Aware Embedded Computing Systems

Author: Guang Liang
Publication venue: Annales Universitatis Turkuensis A I 452
Publication date: 10/12/2012
Field of study

Siirretty Doriast

UTUPub

LTE implementation on CGRA based SiLago Platform

Author: Ilyas M. (Muhammad)
Publication venue: University of Oulu
Publication date: 31/05/2017
Field of study

Abstract. This thesis implements long term evolution (LTE) transmission layer on a coarse grained reconfigurable called, dynamically reconfigurable resource array (DRRA). Specifically, we implement physical downlink shared channel baseband signal processing blocks (PDSCH) at high level. The overall implementation follows silicon large grain object (SiLago) design methodology. The methodology employs SiLago blocks instead of mainstream standard cells. The main ambition of this thesis was to prove that a standard as complex as LTE can be implemented using the in-house SiLago framework. The work aims to prove that customized design with efficiency close to application specific integrated circuit (ASIC) for LTE can be generated with the programming ease of MATLAB. During this thesis, we have generated a completely parametrizable LTE standard at high level

University of Oulu Repository - Jultika

Optimizing Dynamic Logic Realizations For Partial Reconfiguration Of Field Programmable Gate Arrays

Author: Parris Matthew
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2008
Field of study

Many digital logic applications can take advantage of the reconfiguration capability of Field Programmable Gate Arrays (FPGAs) to dynamically patch design flaws, recover from faults, or time-multiplex between functions. Partial reconfiguration is the process by which a user modifies one or more modules residing on the FPGA device independently of the others. Partial Reconfiguration reduces the granularity of reconfiguration to be a set of columns or rectangular region of the device. Decreasing the granularity of reconfiguration results in reduced configuration filesizes and, thus, reduced configuration times. When compared to one bitstream of a non-partial reconfiguration implementation, smaller modules resulting in smaller bitstream filesizes allow an FPGA to implement many more hardware configurations with greater speed under similar storage requirements. To realize the benefits of partial reconfiguration in a wider range of applications, this thesis begins with a survey of FPGA fault-handling methods, which are compared using performance-based metrics. Performance analysis of the Genetic Algorithm (GA) Offline Recovery method is investigated and candidate solutions provided by the GA are partitioned by age to improve its efficiency. Parameters of this aging technique are optimized to increase the occurrence rate of complete repairs. Continuing the discussion of partial reconfiguration, the thesis develops a case-study application that implements one partial reconfiguration module to demonstrate the functionality and benefits of time multiplexing and reveal the improved efficiencies of the latest large-capacity FPGA architectures. The number of active partial reconfiguration modules implemented on a single FPGA device is increased from one to eight to implement a dynamic video-processing architecture for Discrete Cosine Transform and Motion Estimation functions to demonstrate a 55-fold reduction in bitstream storage requirements thus improving partial reconfiguration capability

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data Capacity of SRAM-based Processing-In-Memory

Author: Duan Cenlin
He Xiaolin
He Ziyan
Jia Xiaotao
Pan Weitao
Qi Yingjie
Wang Xueyan
Wang Yikun
Wang Yiou
Yan Bonan
Yang Jianlei
Zhao Weisheng
Publication venue
Publication date: 31/10/2023
Field of study

Processing-in-memory (PIM), as a novel computing paradigm, provides significant performance benefits from the aspect of effective data movement reduction. SRAM-based PIM has been demonstrated as one of the most promising candidates due to its endurance and compatibility. However, the integration density of SRAM-based PIM is much lower than other non-volatile memory-based ones, due to its inherent 6T structure for storing a single bit. Within comparable area constraints, SRAM-based PIM exhibits notably lower capacity. Thus, aiming to unleash its capacity potential, we propose DDC-PIM, an efficient algorithm/architecture co-design methodology that effectively doubles the equivalent data capacity. At the algorithmic level, we propose a filter-wise complementary correlation (FCC) algorithm to obtain a bitwise complementary pair. At the architecture level, we exploit the intrinsic cross-coupled structure of 6T SRAM to store the bitwise complementary pair in their complementary states (

Q/\overline{Q}

), thereby maximizing the data capacity of each SRAM cell. The dual-broadcast input structure and reconfigurable unit support both depthwise and pointwise convolution, adhering to the requirements of various neural networks. Evaluation results show that DDC-PIM yields about

2.84\times

speedup on MobileNetV2 and

2.69\times

on EfficientNet-B0 with negligible accuracy loss compared with PIM baseline implementation. Compared with state-of-the-art SRAM-based PIM macros, DDC-PIM achieves up to

8.41\times

and

2.75\times

improvement in weight density and area efficiency, respectively.Comment: 14 pages, to be published in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD

arXiv.org e-Print Archive