242 research outputs found

    Null Convention Logic applications of asynchronous design in nanotechnology and cryptographic security

    Get PDF
    This dissertation presents two Null Convention Logic (NCL) applications of asynchronous logic circuit design in nanotechnology and cryptographic security. The first application is the Asynchronous Nanowire Reconfigurable Crossbar Architecture (ANRCA); the second one is an asynchronous S-Box design for cryptographic system against Side-Channel Attacks (SCA). The following are the contributions of the first application: 1) Proposed a diode- and resistor-based ANRCA (DR-ANRCA). Three configurable logic block (CLB) structures were designed to efficiently reconfigure a given DR-PGMB as one of the 27 arbitrary NCL threshold gates. A hierarchical architecture was also proposed to implement the higher level logic that requires a large number of DR-PGMBs, such as multiple-bit NCL registers. 2) Proposed a memristor look-up-table based ANRCA (MLUT-ANRCA). An equivalent circuit simulation model has been presented in VHDL and simulated in Quartus II. Meanwhile, the comparison between these two ANRCAs have been analyzed numerically. 3) Presented the defect-tolerance and repair strategies for both DR-ANRCA and MLUT-ANRCA. The following are the contributions of the second application: 1) Designed an NCL based S-Box for Advanced Encryption Standard (AES). Functional verification has been done using Modelsim and Field-Programmable Gate Array (FPGA). 2) Implemented two different power analysis attacks on both NCL S-Box and conventional synchronous S-Box. 3) Developed a novel approach based on stochastic logics to enhance the resistance against DPA and CPA attacks. The functionality of the proposed design has been verified using an 8-bit AES S-box design. The effects of decision weight, bitstream length, and input repetition times on error rates have been also studied. Experimental results shows that the proposed approach enhances the resistance to against the CPA attack by successfully protecting the hidden key --Abstract, page iii

    Enabling Hyperscale Web Services

    Full text link
    Modern web services such as social media, online messaging, web search, video streaming, and online banking often support billions of users, requiring data centers that scale to hundreds of thousands of servers, i.e., hyperscale. In fact, the world continues to expect hyperscale computing to drive more futuristic applications such as virtual reality, self-driving cars, conversational AI, and the Internet of Things. This dissertation presents technologies that will enable tomorrow’s web services to meet the world’s expectations. The key challenge in enabling hyperscale web services arises from two important trends. First, over the past few years, there has been a radical shift in hyperscale computing due to an unprecedented growth in data, users, and web service software functionality. Second, modern hardware can no longer support this growth in hyperscale trends due to a decline in hardware performance scaling. To enable this new hyperscale era, hardware architects must become more aware of hyperscale software needs and software researchers can no longer expect unlimited hardware performance scaling. In short, systems researchers can no longer follow the traditional approach of building each layer of the systems stack separately. Instead, they must rethink the synergy between the software and hardware worlds from the ground up. This dissertation establishes such a synergy to enable futuristic hyperscale web services. This dissertation bridges the software and hardware worlds, demonstrating the importance of that bridge in realizing efficient hyperscale web services via solutions that span the systems stack. The specific goal is to design software that is aware of new hardware constraints and architect hardware that efficiently supports new hyperscale software requirements. This dissertation spans two broad thrusts: (1) a software and (2) a hardware thrust to analyze the complex hyperscale design space and use insights from these analyses to design efficient cross-stack solutions for hyperscale computation. In the software thrust, this dissertation contributes uSuite, the first open-source benchmark suite of web services built with a new hyperscale software paradigm, that is used in academia and industry to study hyperscale behaviors. Next, this dissertation uses uSuite to study software threading implications in light of today’s hardware reality, identifying new insights in the age-old research area of software threading. Driven by these insights, this dissertation demonstrates how threading models must be redesigned at hyperscale by presenting an automated approach and tool, uTune, that makes intelligent run-time threading decisions. In the hardware thrust, this dissertation architects both commodity and custom hardware to efficiently support hyperscale software requirements. First, this dissertation characterizes commodity hardware’s shortcomings, revealing insights that influenced commercial CPU designs. Based on these insights, this dissertation presents an approach and tool, SoftSKU, that enables cheap commodity hardware to efficiently support new hyperscale software paradigms, improving the efficiency of real-world web services that serve billions of users, saving millions of dollars, and meaningfully reducing the global carbon footprint. This dissertation also presents a hardware-software co-design, uNotify, that redesigns commodity hardware with minimal modifications by using existing hardware mechanisms more intelligently to overcome new hyperscale overheads. Next, this dissertation characterizes how custom hardware must be designed at hyperscale, resulting in industry-academia benchmarking efforts, commercial hardware changes, and improved software development. Based on this characterization’s insights, this dissertation presents Accelerometer, an analytical model that estimates gains from hardware customization. Multiple hyperscale enterprises and hardware vendors use Accelerometer to make well-informed hardware decisions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169802/1/akshitha_1.pd

    A Scalable and Secure System Architecture for Smart Buildings

    Get PDF
    Recent years has seen profound changes in building technologies both in Europe and worldwide. With the emergence of Smart Grid and Smart City concepts, the Smart Building has attracted considerable attention and rapid development. The introduction of novel information and communication technologies (ICT) enables an optimized resource utilization while improving the building performance and occupants' satisfaction over a broad spectrum of operations. However, literature and industry have drawn attention to certain barriers and challenges that inhibit its universal adoption. The Smart Building is a cyber-physical system, which as a whole is more than the sum of its parts. The heterogeneous combination of systems, processes, and practices requires a multidisciplinary research. This work proposes and validates a systems engineering approach to the investigation of the identified challenges and the development of a viable architecture for the future Smart Building. Firstly, a data model for the building management system (BMS) enables a semantic abstraction of both the ICT and the building construction. A high-level application programming interface (API) facilitates the creation of generic management algorithms and external applications, independent from each Smart Building instance, promoting the intelligence portability and lowering the cost. Moreover, the proposed architecture ensures the scalability regardless of the occupant activities and the complexity of the optimization algorithms. Secondly, a real-time message-oriented middleware, as a distributed embedded architecture within the building, empowers the interoperability of the ICT devices and networks and their integration into the BMS. The middleware scales to any building construction regardless of the devices' performance and connectivity limitations, while a secure architecture ensures the integrity of data and operations. An extensive performance and energy efficiency study validates the proposed design. A "building-in-the-loop" emulation system, based on discrete-event simulation, virtualizes the Smart Building elements (e.g., loads, storage, generation, sensors, actuators, users, etc.). The high integration with the message-oriented middleware keeps the BMS agnostic to the virtual nature of the emulated instances. Its cooperative multitasking and immerse parallelism allow the concurrent emulation of hundreds of elements in real time. The virtualization facilitates the development of energy management strategies and financial viability studies on the exact building and occupant activities without a prior investment in the necessary infrastructure. This work concludes with a holistic system evaluation using a case study of a university building as a practical retrofitting estimation. It illustrates the system deployment, and highlights how a currently under development energy management system utilizes the BMS and its data analytics for demand-side management applications

    Jitter reduction techniques for digital audio.

    Get PDF
    by Tsang Yick Man, Steven.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (leaves 94-99).ABSTRACT --- p.iACKNOWLEDGMENT --- p.iiLIST OF GLOSSARY --- p.iiiChapter 1 --- INTRODUCTION --- p.1Chapter 1.1 --- What is the jitter ? --- p.3Chapter 2 --- WHY DOES JITTER OCCUR IN DIGITAL AUDIO ? --- p.4Chapter 2.1 --- Poorly-designed Phase Locked Loop ( PLL ) --- p.4Chapter 2.1.1 --- Digital data problem --- p.7Chapter 2.2 --- Sampling jitter or clock jitter ( Δti) --- p.9Chapter 2.3 --- Waveform distortion --- p.12Chapter 2.4 --- Logic induced jitter --- p.17Chapter 2.4.1 --- Digital noise mechanisms --- p.20Chapter 2.4.2 --- Different types of D-type flop-flip chips are linked below for ease of comparison --- p.21Chapter 2.4.3 --- Ground bounce --- p.22Chapter 2.5 --- Power supply high frequency noise --- p.23Chapter 2.6 --- Interface Jitter --- p.25Chapter 2.7 --- Cross-talk --- p.28Chapter 2.8 --- Inter-Symbol-Interference (ISI) --- p.28Chapter 2.9 --- Baseline wander --- p.29Chapter 2.10 --- Noise jitter --- p.30Chapter 2.11 --- FIFO jitter reduction chips --- p.31Chapter 3 --- JITTER REDUCTION TECHNIQUES --- p.33Chapter 3.1 --- Why using two-stage phase-locked loop (PLL ) ?Chapter 3.1.1 --- The PLL circuit components --- p.35Chapter 3.1.2 --- The PLL timing specifications --- p.36Chapter 3.2 --- Analog phase-locked loop (APLL ) circuit usedin second stage --- p.38Chapter 3.3 --- All digital phase-locked loop (ADPLL ) circuit used in second stage --- p.40Chapter 3.4 --- ADPLL design --- p.42Chapter 3.4.1 --- "Different of K counter value of ADPLL are listed for comparison with M=512, N=256, Kd=2" --- p.46Chapter 3.4.2 --- Computer simulated results and experimental results of the ADPLL --- p.47Chapter 3.4.3 --- PLL design notes --- p.58Chapter 3.5 --- Different of the all digital Phase-Locked Loop (ADPLL ) and the analogue Phase-Locked Loop (APLL ) are listed for comparison --- p.65Chapter 3.6 --- Discrete transistor oscillator --- p.68Chapter 3.7 --- Discrete transistor oscillator circuit operation --- p.69Chapter 3.8 --- The advantage and disadvantage of using external discrete oscillator --- p.71Chapter 3.9 --- Background of using high-precision oscillators --- p.72Chapter 3.9.1 --- The temperature compensated crystal circuit operation --- p.73Chapter 3.9.2 --- The temperature compensated circuit design notes --- p.75Chapter 3.10 --- The discrete voltage reference circuit operation --- p.76Chapter 3.10.1 --- Comparing the different types of Op-amps that can be used as a voltage comparator --- p.79Chapter 3.10.2 --- Precaution of separate CMOS chips Vdd and Vcc --- p.80Chapter 3.11 --- Board level jitter reduction method --- p.81Chapter 3.12 --- Digital audio interface chips --- p.82Chapter 3.12.1 --- Different brand of the digital interface receiver (DIR) chips and clock modular are listed for comparison --- p.84Chapter 4. --- APPLICATION CIRCUIT BLOCK DIAGRAMS OF JITTER REDUCTION AND CLOCK RECOVERY --- p.85Chapter 5 --- CONCLUSIONS --- p.90Chapter 5.1 --- Summary of the research --- p.90Chapter 5.2 --- Suggestions for further development --- p.92Chapter 5.3 --- Instrument listing that used in this thesis --- p.93Chapter 6 --- REFERENCES --- p.94Chapter 7 --- APPENDICES --- p.100Chapter 7.1.1 --- Phase instability in frequency dividersChapter 7.1.2 --- The effect of clock tree on Tskew on ASIC chipChapter 7.1.3 --- Digital audio transmission----Why jitter is important?Chapter 7.1.4 --- Overview of digital audio interface data structuresChapter 7.1.5 --- Typical frequency Vs temperature variations curve of Quartz crystalsChapter 7.2 --- IC specification used in these research projec

    A novel deep submicron bulk planar sizing strategy for low energy subthreshold standard cell libraries

    Get PDF
    Engineering andPhysical Science ResearchCouncil (EPSRC) and Arm Ltd for providing funding in the form of grants and studentshipsThis work investigates bulk planar deep submicron semiconductor physics in an attempt to improve standard cell libraries aimed at operation in the subthreshold regime and in Ultra Wide Dynamic Voltage Scaling schemes. The current state of research in the field is examined, with particular emphasis on how subthreshold physical effects degrade robustness, variability and performance. How prevalent these physical effects are in a commercial 65nm library is then investigated by extensive modeling of a BSIM4.5 compact model. Three distinct sizing strategies emerge, cells of each strategy are laid out and post-layout parasitically extracted models simulated to determine the advantages/disadvantages of each. Full custom ring oscillators are designed and manufactured. Measured results reveal a close correlation with the simulated results, with frequency improvements of up to 2.75X/2.43X obs erved for RVT/LVT devices respectively. The experiment provides the first silicon evidence of the improvement capability of the Inverse Narrow Width Effect over a wide supply voltage range, as well as a mechanism of additional temperature stability in the subthreshold regime. A novel sizing strategy is proposed and pursued to determine whether it is able to produce a superior complex circuit design using a commercial digital synthesis flow. Two 128 bit AES cores are synthesized from the novel sizing strategy and compared against a third AES core synthesized from a state-of-the-art subthreshold standard cell library used by ARM. Results show improvements in energy-per-cycle of up to 27.3% and frequency improvements of up to 10.25X. The novel subthreshold sizing strategy proves superior over a temperature range of 0 °C to 85 °C with a nominal (20 °C) improvement in energy-per-cycle of 24% and frequency improvement of 8.65X. A comparison to prior art is then performed. Valid cases are presented where the proposed sizing strategy would be a candidate to produce superior subthreshold circuits

    Energy efficient core designs for upcoming process technologies

    Get PDF
    Energy efficiency has been a first order constraint in the design of micro processors for the last decade. As Moore's law sunsets, new technologies are being actively explored to extend the march in increasing the computational power and efficiency. It is essential for computer architects to understand the opportunities and challenges in utilizing the upcoming process technology trends in order to design the most efficient processors. In this work, we consider three process technology trends and propose core designs that are best suited for each of the technologies. The process technologies are expected to be viable over a span of timelines. We first consider the most popular method currently available to improve the energy efficiency, i.e. by lowering the operating voltage. We make key observations regarding the limiting factors in scaling down the operating voltage for general purpose high performance processors. Later, we propose our novel core design, ScalCore, one that can work in high performance mode at nominal Vdd, and in a very energy-efficient mode at low Vdd. The resulting core design can operate at much lower voltages providing higher parallel performance while consuming lower energy. While lowering Vdd improves the energy efficiency, CMOS devices are fundamentally limited in their low voltage operation. Therefore, we next consider an upcoming device technology -- Tunneling Field-Effect Transistors (TFETs), that is expected to supplement CMOS device technology in the near future. TFETs can attain much higher energy efficiency than CMOS at low voltages. However, their performance saturates at high voltages and, therefore, cannot entirely replace CMOS when high performance is needed. Ideally, we desire a core that is as energy-efficient as TFET and provides as much performance as CMOS. To reach this goal, we characterize the TFET device behavior for core design and judiciously integrate TFET units, CMOS units in a single core. The resulting core, called HetCore, can provide very high energy efficiency while limiting the slowdown when compared to a CMOS core. Finally, we analyze Monolithic 3D (M3D) integration technology that is widely considered to be the only way to integrate more transistors on a chip. We present the first analysis of the architectural implications of using M3D for core design and show how to partition the core across different layers. We also address one of the key challenges in realizing the technology, namely, the top layer performance degradation. We propose a critical path based partitioning for logic stages and asymmetric bit/port partitioning for storage stages. The result is a core that performs nearly as well as a core without any top layer slowdown. When compared to a 2D baseline design, an M3D core not only provides much higher performance, it also reduces the energy consumption at the same time. In summary, this thesis addresses one of the fundamental challenges in computer architecture -- overcoming the fact that CMOS is not scaling anymore. As we increase the computing power on a single chip, our ability to power the entire chip keeps decreasing. This thesis proposes three solutions aimed at solving this problem over different timelines. Across all our solutions, we improve energy efficiency without compromising the performance of the core. As a result, we are able to operate twice as many cores with in the same power budget as regular cores, significantly alleviating the problem of dark silicon

    SystemunterstĂĽtzung fĂĽr moderne Speichertechnologien

    Get PDF
    Trust and scalability are the two significant factors which impede the dissemination of clouds. The possibility of privileged access to customer data by a cloud provider limits the usage of clouds for processing security-sensitive data. Low latency cloud services rely on in-memory computations, and thus, are limited by several characteristics of Dynamic RAM (DRAM) such as capacity, density, energy consumption, for example. Two technological areas address these factors. Mainstream server platforms, such as Intel Software Guard eXtensions (SGX) und AMD Secure Encrypted Virtualisation (SEV) offer extensions for trusted execution in untrusted environments. Various technologies of Non-Volatile RAM (NV-RAM) have better capacity and density compared to DRAM and thus can be considered as DRAM alternatives in the future. However, these technologies and extensions require new programming approaches and system support since they add features to the system architecture: new system components (Intel SGX) and data persistence (NV-RAM). This thesis is devoted to the programming and architectural aspects of persistent and trusted systems. For trusted systems, an in-depth analysis of new architectural extensions was performed. A novel framework named EActors and a database engine named STANlite were developed to effectively use the capabilities of trusted~execution. For persistent systems, an in-depth analysis of prospective memory technologies, their features and the possible impact on system architecture was performed. A new persistence model, called the hypervisor-based model of persistence, was developed and evaluated by the NV-Hypervisor. This offers transparent persistence for legacy and proprietary software, and supports virtualisation of persistent memory.Vertrauenswürdigkeit und Skalierbarkeit sind die beiden maßgeblichen Faktoren, die die Verbreitung von Clouds behindern. Die Möglichkeit privilegierter Zugriffe auf Kundendaten durch einen Cloudanbieter schränkt die Nutzung von Clouds bei der Verarbeitung von sicherheitskritischen und vertraulichen Informationen ein. Clouddienste mit niedriger Latenz erfordern die Durchführungen von Berechnungen im Hauptspeicher und sind daher an Charakteristika von Dynamic RAM (DRAM) wie Kapazität, Dichte, Energieverbrauch und andere Aspekte gebunden. Zwei technologische Bereiche befassen sich mit diesen Faktoren: Etablierte Server Plattformen wie Intel Software Guard eXtensions (SGX) und AMD Secure Encrypted Virtualisation (SEV) stellen Erweiterungen für vertrauenswürdige Ausführung in nicht vertrauenswürdigen Umgebungen bereit. Verschiedene Technologien von nicht flüchtigem Speicher bieten bessere Kapazität und Speicherdichte verglichen mit DRAM, und können daher in Zukunft als Alternative zu DRAM herangezogen werden. Jedoch benötigen diese Technologien und Erweiterungen neuartige Ansätze und Systemunterstützung bei der Programmierung, da diese der Systemarchitektur neue Funktionalität hinzufügen: Systemkomponenten (Intel SGX) und Persistenz (nicht-flüchtiger Speicher). Diese Dissertation widmet sich der Programmierung und den Architekturaspekten von persistenten und vertrauenswürdigen Systemen. Für vertrauenswürdige Systeme wurde eine detaillierte Analyse der neuen Architekturerweiterungen durchgeführt. Außerdem wurden das neuartige EActors Framework und die STANlite Datenbank entwickelt, um die neuen Möglichkeiten von vertrauenswürdiger Ausführung effektiv zu nutzen. Darüber hinaus wurde für persistente Systeme eine detaillierte Analyse zukünftiger Speichertechnologien, deren Merkmale und mögliche Auswirkungen auf die Systemarchitektur durchgeführt. Ferner wurde das neue Hypervisor-basierte Persistenzmodell entwickelt und mittels NV-Hypervisor ausgewertet, welches transparente Persistenz für alte und proprietäre Software, sowie Virtualisierung von persistentem Speicher ermöglicht

    A Review of Low-end, Middle-end and High-end IoT Devices

    Get PDF
    Internet of Things (IoT) devices play a crucial role in the overall development of IoT in providing countless applications in various areas. Due to the increasing interest and rapid technological growth of sensor technology, which have certainly revolutionized the way we live today, a need to provide a detailed analysis of the embedded platforms and boards is consequential. This paper presents a comprehensive survey of the recent and most-widely used commercial and research embedded systems and boards in different classification emphasizing their key attributes including processing and memory capabilities, security features, connectivity and communication interfaces, size, cost and appearance, operating system (OS) support, power specifications and battery life and listing some interesting projects for each device. Through this exploration and discussion, readers can have an overall understanding on this area and foster more subsequent studies

    Information Leakage Attacks and Countermeasures

    Get PDF
    The scientific community has been consistently working on the pervasive problem of information leakage, uncovering numerous attack vectors, and proposing various countermeasures. Despite these efforts, leakage incidents remain prevalent, as the complexity of systems and protocols increases, and sophisticated modeling methods become more accessible to adversaries. This work studies how information leakages manifest in and impact interconnected systems and their users. We first focus on online communications and investigate leakages in the Transport Layer Security protocol (TLS). Using modern machine learning models, we show that an eavesdropping adversary can efficiently exploit meta-information (e.g., packet size) not protected by the TLS’ encryption to launch fingerprinting attacks at an unprecedented scale even under non-optimal conditions. We then turn our attention to ultrasonic communications, and discuss their security shortcomings and how adversaries could exploit them to compromise anonymity network users (even though they aim to offer a greater level of privacy compared to TLS). Following up on these, we delve into physical layer leakages that concern a wide array of (networked) systems such as servers, embedded nodes, Tor relays, and hardware cryptocurrency wallets. We revisit location-based side-channel attacks and develop an exploitation neural network. Our model demonstrates the capabilities of a modern adversary but also presents an inexpensive tool to be used by auditors for detecting such leakages early on during the development cycle. Subsequently, we investigate techniques that further minimize the impact of leakages found in production components. Our proposed system design distributes both the custody of secrets and the cryptographic operation execution across several components, thus making the exploitation of leaks difficult

    FPGA structures for high speed and low overhead dynamic circuit specialization

    Get PDF
    A Field Programmable Gate Array (FPGA) is a programmable digital electronic chip. The FPGA does not come with a predefined function from the manufacturer; instead, the developer has to define its function through implementing a digital circuit on the FPGA resources. The functionality of the FPGA can be reprogrammed as desired and hence the name “field programmable”. FPGAs are useful in small volume digital electronic products as the design of a digital custom chip is expensive. Changing the FPGA (also called configuring it) is done by changing the configuration data (in the form of bitstreams) that defines the FPGA functionality. These bitstreams are stored in a memory of the FPGA called configuration memory. The SRAM cells of LookUp Tables (LUTs), Block Random Access Memories (BRAMs) and DSP blocks together form the configuration memory of an FPGA. The configuration data can be modified according to the user’s needs to implement the user-defined hardware. The simplest way to program the configuration memory is to download the bitstreams using a JTAG interface. However, modern techniques such as Partial Reconfiguration (PR) enable us to configure a part in the configuration memory with partial bitstreams during run-time. The reconfiguration is achieved by swapping in partial bitstreams into the configuration memory via a configuration interface called Internal Configuration Access Port (ICAP). The ICAP is a hardware primitive (macro) present in the FPGA used to access the configuration memory internally by an embedded processor. The reconfiguration technique adds flexibility to use specialized ci rcuits that are more compact and more efficient t han t heir b ulky c ounterparts. An example of such an implementation is the use of specialized multipliers instead of big generic multipliers in an FIR implementation with constant coefficients. To specialize these circuits and reconfigure during the run-time, researchers at the HES group proposed the novel technique called parameterized reconfiguration that can be used to efficiently and automatically implement Dynamic Circuit Specialization (DCS) that is built on top of the Partial Reconfiguration method. It uses the run-time reconfiguration technique that is tailored to implement a parameterized design. An application is said to be parameterized if some of its input values change much less frequently than the rest. These inputs are called parameters. Instead of implementing these parameters as regular inputs, in DCS these inputs are implemented as constants, and the application is optimized for the constants. For every change in parameter values, the design is re-optimized (specialized) during run-time and implemented by reconfiguring the optimized design for a new set of parameters. In DCS, the bitstreams of the parameterized design are expressed as Boolean functions of the parameters. For every infrequent change in parameters, a specialized FPGA configuration is generated by evaluating the corresponding Boolean functions, and the FPGA is reconfigured with the specialized configuration. A detailed study of overheads of DCS and providing suitable solutions with appropriate custom FPGA structures is the primary goal of the dissertation. I also suggest different improvements to the FPGA configuration memory architecture. After offering the custom FPGA structures, I investigated the role of DCS on FPGA overlays and the use of custom FPGA structures that help to reduce the overheads of DCS on FPGA overlays. By doing so, I hope I can convince the developer to use DCS (which now comes with minimal costs) in real-world applications. I start the investigations of overheads of DCS by implementing an adaptive FIR filter (using the DCS technique) on three different Xilinx FPGA platforms: Virtex-II Pro, Virtex-5, and Zynq-SoC. The study of how DCS behaves and what is its overhead in the evolution of the three FPGA platforms is the non-trivial basis to discover the costs of DCS. After that, I propose custom FPGA structures (reconfiguration controllers and reconfiguration drivers) to reduce the main overhead (reconfiguration time) of DCS. These structures not only reduce the reconfiguration time but also help curbing the power hungry part of the DCS system. After these chapters, I study the role of DCS on FPGA overlays. I investigate the effect of the proposed FPGA structures on Virtual-Coarse-Grained Reconfigurable Arrays (VCGRAs). I classify the VCGRA implementations into three types: the conventional VCGRA, partially parameterized VCGRA and fully parameterized VCGRA depending upon the level of parameterization. I have designed two variants of VCGRA grids for HPC image processing applications, namely, the MAC grid and Pixie. Finally, I try to tackle the reconfiguration time overhead at the hardware level of the FPGA by customizing the FPGA configuration memory architecture. In this part of my research, I propose to use a parallel memory structure to improve the reconfiguration time of DCS drastically. However, this improvement comes with a significant overhead of hardware resources which will need to be solved in future research on commercial FPGA configuration memory architectures
    • …
    corecore