801 research outputs found

    A Micro Power Hardware Fabric for Embedded Computing

    Get PDF
    Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor

    GRIDKIT: Pluggable overlay networks for Grid computing

    Get PDF
    A `second generation' approach to the provision of Grid middleware is now emerging which is built on service-oriented architecture and web services standards and technologies. However, advanced Grid applications have significant demands that are not addressed by present-day web services platforms. As one prime example, current platforms do not support the rich diversity of communication `interaction types' that are demanded by advanced applications (e.g. publish-subscribe, media streaming, peer-to-peer interaction). In the paper we describe the Gridkit middleware which augments the basic service-oriented architecture to address this particular deficiency. We particularly focus on the communications infrastructure support required to support multiple interaction types in a unified, principled and extensible manner-which we present in terms of the novel concept of pluggable overlay networks

    Probabilistic Principle Component Analysis based Feature Extraction of Embedded System Applications with Deep Neural Network based Implementation in FPGA

    Get PDF
    The study of hardware and software systems is of major are very important advent in new devices for communication and progress in system of security. In fast pace mobile and embedded devices application in every day’s life leads some new emerging area for research in data mining field. In this we have some technologies which have demand and error free using the principle of component of PPCA. For Embedded system the applications of PCA is basically applied initially for the lessen the having different qualities especially being to simple of the data. PPCA which have the updated version of PCA which is surveyed by similarity measure. In this work, experiments are extensively carried out, using a FPGA based light weight cryptographic data set having benchmark set to check and illustrate the viability, competence, litheness which are reconfigurable embedded system which are having data mining . Which have FPGA are reconfigurable for the computing architectures for hardware and in neural network. FPGA using the multilayer Cascaded for neural network which are forward in nature (CFFNN) and Deep Neural Network also called as DNN with a huge neuron is still a thought-provoking task. This shortcoming leads to elect the FPGA capacity for a particular application we have used the method of implementation which has two neural network have been implemented and compared , namely, CFFNN and DNN. It can be shown that for reconfigurable embedded system, PPCA based data mining and Machine learning based realization can give more speed up less iteration and more space savings when we have compared it with the static conventional version

    WindMill: A Parameterized and Pluggable CGRA Implemented by DIAG Design Flow

    Full text link
    With the cross-fertilization of applications and the ever-increasing scale of models, the efficiency and productivity of hardware computing architectures have become inadequate. This inadequacy further exacerbates issues in design flexibility, design complexity, development cycle, and development costs (4-d problems) in divergent scenarios. To address these challenges, this paper proposed a flexible design flow called DIAG based on plugin techniques. The proposed flow guides hardware development through four layers: definition(D), implementation(I), application(A), and generation(G). Furthermore, a versatile CGRA generator called WindMill is implemented, allowing for agile generation of customized hardware accelerators based on specific application demands. Applications and algorithm tasks from three aspects is experimented. In the case of reinforcement learning algorithm, a significant performance improvement of 2.3Ă—2.3\times compared to GPU is achieved.Comment: 7 pages, 10 figure

    Hierarchical Agent-based Adaptation for Self-Aware Embedded Computing Systems

    Get PDF
    Siirretty Doriast

    LTE implementation on CGRA based SiLago Platform

    Get PDF
    Abstract. This thesis implements long term evolution (LTE) transmission layer on a coarse grained reconfigurable called, dynamically reconfigurable resource array (DRRA). Specifically, we implement physical downlink shared channel baseband signal processing blocks (PDSCH) at high level. The overall implementation follows silicon large grain object (SiLago) design methodology. The methodology employs SiLago blocks instead of mainstream standard cells. The main ambition of this thesis was to prove that a standard as complex as LTE can be implemented using the in-house SiLago framework. The work aims to prove that customized design with efficiency close to application specific integrated circuit (ASIC) for LTE can be generated with the programming ease of MATLAB. During this thesis, we have generated a completely parametrizable LTE standard at high level

    Optimizing Dynamic Logic Realizations For Partial Reconfiguration Of Field Programmable Gate Arrays

    Get PDF
    Many digital logic applications can take advantage of the reconfiguration capability of Field Programmable Gate Arrays (FPGAs) to dynamically patch design flaws, recover from faults, or time-multiplex between functions. Partial reconfiguration is the process by which a user modifies one or more modules residing on the FPGA device independently of the others. Partial Reconfiguration reduces the granularity of reconfiguration to be a set of columns or rectangular region of the device. Decreasing the granularity of reconfiguration results in reduced configuration filesizes and, thus, reduced configuration times. When compared to one bitstream of a non-partial reconfiguration implementation, smaller modules resulting in smaller bitstream filesizes allow an FPGA to implement many more hardware configurations with greater speed under similar storage requirements. To realize the benefits of partial reconfiguration in a wider range of applications, this thesis begins with a survey of FPGA fault-handling methods, which are compared using performance-based metrics. Performance analysis of the Genetic Algorithm (GA) Offline Recovery method is investigated and candidate solutions provided by the GA are partitioned by age to improve its efficiency. Parameters of this aging technique are optimized to increase the occurrence rate of complete repairs. Continuing the discussion of partial reconfiguration, the thesis develops a case-study application that implements one partial reconfiguration module to demonstrate the functionality and benefits of time multiplexing and reveal the improved efficiencies of the latest large-capacity FPGA architectures. The number of active partial reconfiguration modules implemented on a single FPGA device is increased from one to eight to implement a dynamic video-processing architecture for Discrete Cosine Transform and Motion Estimation functions to demonstrate a 55-fold reduction in bitstream storage requirements thus improving partial reconfiguration capability

    DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data Capacity of SRAM-based Processing-In-Memory

    Full text link
    Processing-in-memory (PIM), as a novel computing paradigm, provides significant performance benefits from the aspect of effective data movement reduction. SRAM-based PIM has been demonstrated as one of the most promising candidates due to its endurance and compatibility. However, the integration density of SRAM-based PIM is much lower than other non-volatile memory-based ones, due to its inherent 6T structure for storing a single bit. Within comparable area constraints, SRAM-based PIM exhibits notably lower capacity. Thus, aiming to unleash its capacity potential, we propose DDC-PIM, an efficient algorithm/architecture co-design methodology that effectively doubles the equivalent data capacity. At the algorithmic level, we propose a filter-wise complementary correlation (FCC) algorithm to obtain a bitwise complementary pair. At the architecture level, we exploit the intrinsic cross-coupled structure of 6T SRAM to store the bitwise complementary pair in their complementary states (Q/Q‾Q/\overline{Q}), thereby maximizing the data capacity of each SRAM cell. The dual-broadcast input structure and reconfigurable unit support both depthwise and pointwise convolution, adhering to the requirements of various neural networks. Evaluation results show that DDC-PIM yields about 2.84×2.84\times speedup on MobileNetV2 and 2.69×2.69\times on EfficientNet-B0 with negligible accuracy loss compared with PIM baseline implementation. Compared with state-of-the-art SRAM-based PIM macros, DDC-PIM achieves up to 8.41×8.41\times and 2.75×2.75\times improvement in weight density and area efficiency, respectively.Comment: 14 pages, to be published in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD
    • …
    corecore