Search CORE

192 research outputs found

Automated gateware discovery using open firmware

Author: Rajan Shanly
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2013
Field of study

Includes abstract.Includes bibliographical references.This dissertation describes the design and implementation of a mechanism that automates gateware device detection for reconfigurable hardware. The research facilitates the process of identifying and operating on gateware images by extending the existing infrastructure of probing devices in traditional software by using the chosen technology

Cape Town University OpenUCT

Resource-aware life cycle models for service-oriented applications managed by a component framework

Author: Mak R.H.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2013
Field of study

In this report we present a series of formal models that describe dynamically reconfigurable applications at various stages of their life cycle. It is our intention that these models capture the essential concepts of such applications and the platforms on which they are deployed, and that they indicate the essential activities required to accomplish an application’s transition from one stage of its life cycle to the next. These models aim to support a life cycle in which applications are designed as a combination of services and realized by predefined components that are deployed in a framework specially tailored to the resource management needs of these applications

Repository TU/e

Pure OAI Repository

A framework for distributed Web-based microsystem design

Author: Saha Debashis, Massachusetts Institute of Technology.
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1998
Field of study

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 109-111).The increasing complexity of microsystem design mandates a distributed and collaborative design environment. The high integration levels call for tools and generators that allow exploration of the design space irrespective of the geographical or physical availability of the design tools. The World Wide Web serves as a desirable platform for distributed access to libraries, models and design tools. The rapid growth and acceptance of the World Wide Web has happened over the same time period in which distributed object systems have stabilized and matured. The Web can become an important platform for VLSI CAD, when the distributed object technologies (e.g, CORBA) are combined with the Web technologies (e.g., HTTP, CGI) and Web-aware object oriented languages (e.g., Java). In this thesis, a framework using the Object-Web technologies is presented, which enables distributed Web based CAD. The Object-Web architecture provides an open, interoperable and scalable distributed computing environment for microsystem design, in which Web based design tools can efficiently utilize the capabilities of existing design tools on the Web to build hierarchical Web tools. The framework includes the infrastructure to store and manipulate design objects, protocols for tool communication and WebTop, a Java hierarchical schematic/block editor with interfaces to distributed Web tools and cell libraries.by Debashis Saha.M.S

DSpace@MIT

Towards Closing the Programmability-Efficiency Gap using Software-Defined Hardware

Author: Pal Subhankar
Publication venue
Publication date: 01/01/2021
Field of study

The past decade has seen the breakdown of two important trends in the computing industry: Moore’s law, an observation that the number of transistors in a chip roughly doubles every eighteen months, and Dennard scaling, that enabled the use of these transistors within a constant power budget. This has caused a surge in domain-specific accelerators, i.e. specialized hardware that deliver significantly better energy eﬀiciency than general-purpose processors, such as CPUs. While the performance and eﬀiciency of such accelerators are highly desirable, the fast pace of algorithmic innovation and non-recurring engineering costs have deterred their widespread use, since they are only programmable across a narrow set of applications. This has engendered a programmability-eﬀiciency gap across contemporary platforms. A practical solution that can close this gap is thus lucrative and is likely to engender broad impact in both academic research and the industry. This dissertation proposes such a solution with a reconfigurable Software-Defined Hardware (SDH) system that morphs parts of the hardware on-the-fly to tailor to the requirements of each application phase. This system is designed to deliver near-accelerator-level efficiency across a broad set of applications, while retaining CPU-like programmability. The dissertation first presents a fixed-function solution to accelerate sparse matrix multiplication, which forms the basis of many applications in graph analytics and scientific computing. The solution consists of a tiled hardware architecture, co-designed with the outer product algorithm for Sparse Matrix-Matrix multiplication (SpMM), that uses on-chip memory reconfiguration to accelerate each phase of the algorithm. A proof-of-concept is then presented in the form of a prototyped 40 nm Complimentary Metal-Oxide Semiconductor (CMOS) chip that demonstrates energy efficiency and performance per die area improvements of 12.6x and 17.1x over a high-end CPU, and serves as a stepping stone towards a full SDH system. The next piece of the dissertation enhances the proposed hardware with reconfigurability of the dataflow and resource sharing modes, in order to extend acceleration support to a set of common parallelizable workloads. This reconfigurability lends the system the ability to cater to discrete data access and compute patterns, such as workloads with extensive data sharing and reuse, workloads with limited reuse and streaming access patterns, among others. Moreover, this system incorporates commercial cores and a prototyped software stack for CPU-level programmability. The proposed system is evaluated on a diverse set of compute-bound and memory-bound kernels that compose applications in the domains of graph analytics, machine learning, image and language processing. The evaluation shows average performance and energy-efficiency gains of 5.0x and 18.4x over the CPU. The final part of the dissertation proposes a runtime control framework that uses low-cost monitoring of hardware performance counters to predict the next best configuration and reconfigure the hardware, upon detecting a change in phase or nature of data within the application. In comparison to prior work, this contribution targets multicore CGRAs, uses low-overhead decision tree based predictive models, and incorporates reconfiguration cost-awareness into its policies. Compared to the best-average static (non-reconfiguring) configuration, the dynamically reconfigurable system achieves a 1.6x improvement in performance-per-Watt in the Energy-Efficient mode of operation, or the same performance with 23% lower energy in the Power-Performance mode, for SpMM across a suite of real-world inputs. The proposed reconfiguration mechanism itself outperforms the state-of-the-art approach for dynamic runtime control by up to 2.9x in terms of energy-efficiency.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169859/1/subh_1.pd

Deep Blue Documents at the University of Michigan

Adaptive Intelligent Systems for Extreme Environments

Author: Lu Yufan
Publication venue
Publication date: 06/04/2023
Field of study

As embedded processors become powerful, a growing number of embedded systems equipped with artificial intelligence (AI) algorithms have been used in radiation environments to perform routine tasks to reduce radiation risk for human workers. On the one hand, because of the low price, commercial-off-the-shelf devices and components are becoming increasingly popular to make such tasks more affordable. Meanwhile, it also presents new challenges to improve radiation tolerance, the capability to conduct multiple AI tasks and deliver the power efficiency of the embedded systems in harsh environments. There are three aspects of research work that have been completed in this thesis: 1) a fast simulation method for analysis of single event effect (SEE) in integrated circuits, 2) a self-refresh scheme to detect and correct bit-flips in random access memory (RAM), and 3) a hardware AI system with dynamic hardware accelerators and AI models for increasing flexibility and efficiency. The variances of the physical parameters in practical implementation, such as the nature of the particle, linear energy transfer and circuit characteristics, may have a large impact on the final simulation accuracy, which will significantly increase the complexity and cost in the workflow of the transistor level simulation for large-scale circuits. It makes it difficult to conduct SEE simulations for large-scale circuits. Therefore, in the first research work, a new SEE simulation scheme is proposed, to offer a fast and cost-efficient method to evaluate and compare the performance of large-scale circuits which subject to the effects of radiation particles. The advantages of transistor and hardware description language (HDL) simulations are combined here to produce accurate SEE digital error models for rapid error analysis in large-scale circuits. Under the proposed scheme, time-consuming back-end steps are skipped. The SEE analysis for large-scale circuits can be completed in just few hours. In high-radiation environments, bit-flips in RAMs can not only occur but may also be accumulated. However, the typical error mitigation methods can not handle high error rates with low hardware costs. In the second work, an adaptive scheme combined with correcting codes and refreshing techniques is proposed, to correct errors and mitigate error accumulation in extreme radiation environments. This scheme is proposed to continuously refresh the data in RAMs so that errors can not be accumulated. Furthermore, because the proposed design can share the same ports with the user module without changing the timing sequence, it thus can be easily applied to the system where the hardware modules are designed with fixed reading and writing latency. It is a challenge to implement intelligent systems with constrained hardware resources. In the third work, an adaptive hardware resource management system for multiple AI tasks in harsh environments was designed. Inspired by the “refreshing” concept in the second work, we utilise a key feature of FPGAs, partial reconfiguration, to improve the reliability and efficiency of the AI system. More importantly, this feature provides the capability to manage the hardware resources for deep learning acceleration. In the proposed design, the on-chip hardware resources are dynamically managed to improve the flexibility, performance and power efficiency of deep learning inference systems. The deep learning units provided by Xilinx are used to perform multiple AI tasks simultaneously, and the experiments show significant improvements in power efficiency for a wide range of scenarios with different workloads. To further improve the performance of the system, the concept of reconfiguration was further extended. As a result, an adaptive DL software framework was designed. This framework can provide a significant level of adaptability support for various deep learning algorithms on an FPGA-based edge computing platform. To meet the specific accuracy and latency requirements derived from the running applications and operating environments, the platform may dynamically update hardware and software (e.g., processing pipelines) to achieve better cost, power, and processing efficiency compared to the static system

University of Essex Research Repository

Palmo : a novel pulsed based signal processing technique for programmable mixed-signal VLSI

Author: Papathanasiou Konstandinos
Publication venue: The University of Edinburgh
Publication date: 01/01/1998
Field of study

In this thesis a new signal processing technique is presented. This technique exploits the use of pulses as the signalling mechanism. This Palmo 1 signalling method applied to signal processing is novel, combining the advantages of both digital and analogue techniques. Pulsed signals are robust, inherently low-power, easily regenerated, and easily distributed across and between chips. The Palmo cells used to perform analogue operations on the pulsed signals are compact, fast, simple and programmable

CiteSeerX

Edinburgh Research Archive

Heterogeneity-aware scheduling and data partitioning for system performance acceleration

Author: Yu Teng
Publication venue: The University of St Andrews
Publication date: 14/04/2020
Field of study

Over the past decade, heterogeneous processors and accelerators have become increasingly prevalent in modern computing systems. Compared with previous homogeneous parallel machines, the hardware heterogeneity in modern systems provides new opportunities and challenges for performance acceleration. Classic operating systems optimisation problems such as task scheduling, and application-specific optimisation techniques such as the adaptive data partitioning of parallel algorithms, are both required to work together to address hardware heterogeneity. Significant effort has been invested in this problem, but either focuses on a specific type of heterogeneous systems or algorithm, or a high-level framework without insight into the difference in heterogeneity between different types of system. A general software framework is required, which can not only be adapted to multiple types of systems and workloads, but is also equipped with the techniques to address a variety of hardware heterogeneity. This thesis presents approaches to design general heterogeneity-aware software frameworks for system performance acceleration. It covers a wide variety of systems, including an OS scheduler targeting on-chip asymmetric multi-core processors (AMPs) on mobile devices, a hierarchical many-core supercomputer and multi-FPGA systems for high performance computing (HPC) centers. Considering heterogeneity from on-chip AMPs, such as thread criticality, core sensitivity, and relative fairness, it suggests a collaborative based approach to co-design the task selector and core allocator on OS scheduler. Considering the typical sources of heterogeneity in HPC systems, such as the memory hierarchy, bandwidth limitations and asymmetric physical connection, it proposes an application-specific automatic data partitioning method for a modern supercomputer, and a topological-ranking heuristic based schedule for a multi-FPGA based reconfigurable cluster. Experiments on both a full system simulator (GEM5) and real systems (Sunway Taihulight Supercomputer and Xilinx Multi-FPGA based clusters) demonstrate the significant advantages of the suggested approaches compared against the state-of-the-art on variety of workloads."This work is supported by St Leonards 7th Century Scholarship and Computer Science PhD funding from University of St Andrews; by UK EPSRC grant Discovery: Pattern Discovery and Program Shaping for Manycore Systems (EP/P020631/1)." -- Acknowledgement

University of St. Andrews - Pure

St Andrews Research Repository

Header Parsing Logic in Network Switches Using Fine and Coarse-Grained Dynamic Reconfiguration Strategies

Author: Sonek Alexander
Publication venue: 'University of Waterloo'
Publication date: 29/04/2014
Field of study

Current ASIC only designs which interface with a general purpose processor are fairly restricted as far as their ability to be upgraded after fabrication. The primary intent of the research documented in this thesis is to determine if the inclusion of FPGAs in existing ASIC designs can be considered as an option for alleviating this constraint by analyzing the performance of such a framework as a replacement for the parsing logic in a typical network switch. This thesis also covers an ancilliary goal of the research which is to compare the various methods used to reconfigure modern FPGAs, including the use of self initiated dynamic partial reconfiguration, in regards to the degree in which they interrupt the operation of the device in which an FPGA is embedded. This portion of the research is also conducted in the context of a network switch and focuses on the ability of the network switch to reconfigure itself dynamically when presented with a new type of network traffic

University of Waterloo's Institutional Repository