13 research outputs found

    Performance Aspects of Synthesizable Computing Systems

    Get PDF

    A polyhedral-based systemc modeling and generation framework for effective low-power design space exploration

    Get PDF
    With the prevalence of systems-on-chips there is a growing need for automation and acceleration of the design process. A classical approach is to take a C/C++ specification of the application, convert it to a SystemC (or equivalent) description of hardware implementing this application, and perform successive refinement of the description to improve various design metrics. In this thesis, we present an automated SystemC generation and design space exploration flow alleviating several productivity and design time issues encountered in the current design process. We first automatically convert a subset of C/C++, namely affine program regions, into a full SystemC de- scription through polyhedral model-based techniques while performing powerful data locality and parallelism transformations. We then leverage key properties of affine computations to design a fast and accurate latency and power characterization flow. Using this flow, we build analytical models of power and performance that can effectively prune away a large amount of inferior design points very fast and generate Pareto-optimal solution points. Experimental results show that (1) our SystemC models can evaluate system performance and power that is only 0.57% and 5.04% away from gate-level evaluation results, respectively; (2) our latency and power analytical models are 3.24% and 5.31% away from the actual Pareto points generated by SystemC simulation, with 2091x faster design-space exploration time on average. The generated Pareto-optimal points provide effective low-power design solutions given different latency constraints

    Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

    Get PDF
    In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

    Design of Apoferritin-Based Nanoparticle MRI Contrast Agents Through Controlled Metal Deposition

    Get PDF
    abstract: Sensitivity is a fundamental challenge for in vivo molecular magnetic resonance imaging (MRI). Here, I improve the sensitivity of metal nanoparticle contrast agents by strategically incorporating pure and doped metal oxides in the nanoparticle core, forming a soluble, monodisperse, contrast agent with adjustable T2 or T1 relaxivity (r2 or r1). I first developed a simplified technique to incorporate iron oxides in apoferritin to form "magnetoferritin" for nM-level detection with T2- and T2* weighting. I then explored whether the crystal could be chemically modified to form a particle with high r1. I first adsorbed Mn2+ ions to metal binding sites in the apoferritin pores. The strategic placement of metal ions near sites of water exchange and within the crystal oxide enhance r1, suggesting a mechanism for increasing relaxivity in porous nanoparticle agents. However, the Mn2+ addition was only possible when the particle was simultaneously filled with an iron oxide, resulting in a particle with a high r1 but also a high r2 and making them undetectable with conventional T1-weighting techniques. To solve this problem and decrease the particle r2 for more sensitive detection, I chemically doped the nanoparticles with tungsten to form a disordered W-Fe oxide composite in the apoferritin core. This configuration formed a particle with a r1 of 4,870mM-1s-1 and r2 of 9,076mM-1s-1. These relaxivities allowed the detection of concentrations ranging from 20nM - 400nM in vivo, both passively injected and targeted to the kidney glomerulus. I further developed an MRI acquisition technique to distinguish particles based on r2/r1, and show that three nanoparticles of similar size can be distinguished in vitro and in vivo with MRI. This work forms the basis for a new, highly flexible inorganic approach to design nanoparticle contrast agents for molecular MRI.Dissertation/ThesisPh.D. Bioengineering 201

    Advanced Applications of Rapid Prototyping Technology in Modern Engineering

    Get PDF
    Rapid prototyping (RP) technology has been widely known and appreciated due to its flexible and customized manufacturing capabilities. The widely studied RP techniques include stereolithography apparatus (SLA), selective laser sintering (SLS), three-dimensional printing (3DP), fused deposition modeling (FDM), 3D plotting, solid ground curing (SGC), multiphase jet solidification (MJS), laminated object manufacturing (LOM). Different techniques are associated with different materials and/or processing principles and thus are devoted to specific applications. RP technology has no longer been only for prototype building rather has been extended for real industrial manufacturing solutions. Today, the RP technology has contributed to almost all engineering areas that include mechanical, materials, industrial, aerospace, electrical and most recently biomedical engineering. This book aims to present the advanced development of RP technologies in various engineering areas as the solutions to the real world engineering problems

    Applications and Techniques for Fast Machine Learning in Science

    Get PDF
    In this community review report, we discuss applications and techniques for fast machine learning (ML) in science - the concept of integrating powerful ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs

    Metal and Metal Oxide Nanoparticles: Design, Characterization, and Biomedical Applications

    Get PDF
    Developing new materials is usually a time-demanding and meticulous process, but at the same time, it is one of the more promising solutions to obtain a cleaner, safer, and smart future. More in detail, referring to nanomaterials, an increasingly successfully tool of nanotechnologies, nanoparticles are categorized as materials in which at least one dimension is less than 100 nm in diameter. Among the various nanoparticles’ categories, metal and metal oxides nanoparticles stand as an emerging nanotechnological solution for a wide range of biological and medical physio/pathological open questions. This Special Issue covers the fundamental science, design, characterization, and biomedical applications of metal and metal oxide nanomaterials. The articles here presented will embrace all the aspects determining the performance of these systems, ranging from their synthesis, design, chemical, physical, and biological functionalization, to their characterization and successful applications
    corecore