26 research outputs found

    Rapid SoC Design: On Architectures, Methodologies and Frameworks

    Full text link
    Modern applications like machine learning, autonomous vehicles, and 5G networking require an order of magnitude boost in processing capability. For several decades, chip designers have relied on Moore’s Law - the doubling of transistor count every two years to deliver improved performance, higher energy efficiency, and an increase in transistor density. With the end of Dennard’s scaling and a slowdown in Moore’s Law, system architects have developed several techniques to deliver on the traditional performance and power improvements we have come to expect. More recently, chip designers have turned towards heterogeneous systems comprised of more specialized processing units to buttress the traditional processing units. These specialized units improve the overall performance, power, and area (PPA) metrics across a wide variety of workloads and applications. While the GPU serves as a classical example, accelerators for machine learning, approximate computing, graph processing, and database applications have become commonplace. This has led to an exponential growth in the variety (and count) of these compute units found in modern embedded and high-performance computing platforms. The various techniques adopted to combat the slowing of Moore’s Law directly translates to an increase in complexity for modern system-on-chips (SoCs). This increase in complexity in turn leads to an increase in design effort and validation time for hardware and the accompanying software stacks. This is further aggravated by fabrication challenges (photo-lithography, tooling, and yield) faced at advanced technology nodes (below 28nm). The inherent complexity in modern SoCs translates into increased costs and time-to-market delays. This holds true across the spectrum, from mobile/handheld processors to high-performance data-center appliances. This dissertation presents several techniques to address the challenges of rapidly birthing complex SoCs. The first part of this dissertation focuses on foundations and architectures that aid in rapid SoC design. It presents a variety of architectural techniques that were developed and leveraged to rapidly construct complex SoCs at advanced process nodes. The next part of the dissertation focuses on the gap between a completed design model (in RTL form) and its physical manifestation (a GDS file that will be sent to the foundry for fabrication). It presents methodologies and a workflow for rapidly walking a design through to completion at arbitrary technology nodes. It also presents progress on creating tools and a flow that is entirely dependent on open-source tools. The last part presents a framework that not only speeds up the integration of a hardware accelerator into an SoC ecosystem, but emphasizes software adoption and usability.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168119/1/ajayi_1.pd

    Co-design of Security Aware Power System Distribution Architecture as Cyber Physical System

    Get PDF
    The modern smart grid would involve deep integration between measurement nodes, communication systems, artificial intelligence, power electronics and distributed resources. On one hand, this type of integration can dramatically improve the grid performance and efficiency, but on the other, it can also introduce new types of vulnerabilities to the grid. To obtain the best performance, while minimizing the risk of vulnerabilities, the physical power system must be designed as a security aware system. In this dissertation, an interoperability and communication framework for microgrid control and Cyber Physical system enhancements is designed and implemented taking into account cyber and physical security aspects. The proposed data-centric interoperability layer provides a common data bus and a resilient control network for seamless integration of distributed energy resources. In addition, a synchronized measurement network and advanced metering infrastructure were developed to provide real-time monitoring for active distribution networks. A hybrid hardware/software testbed environment was developed to represent the smart grid as a cyber-physical system through hardware and software in the loop simulation methods. In addition it provides a flexible interface for remote integration and experimentation of attack scenarios. The work in this dissertation utilizes communication technologies to enhance the performance of the DC microgrids and distribution networks by extending the application of the GPS synchronization to the DC Networks. GPS synchronization allows the operation of distributed DC-DC converters as an interleaved converters system. Along with the GPS synchronization, carrier extraction synchronization technique was developed to improve the system’s security and reliability in the case of GPS signal spoofing or jamming. To improve the integration of the microgrid with the utility system, new synchronization and islanding detection algorithms were developed. The developed algorithms overcome the problem of SCADA and PMU based islanding detection methods such as communication failure and frequency stability. In addition, a real-time energy management system with online optimization was developed to manage the energy resources within the microgrid. The security and privacy were also addressed in both the cyber and physical levels. For the physical design, two techniques were developed to address the physical privacy issues by changing the current and electromagnetic signature. For the cyber level, a security mechanism for IEC 61850 GOOSE messages was developed to address the security shortcomings in the standard

    Design Automation of Low Power Circuits in Nano-Scale CMOS and Beyond-CMOS Technologies.

    Full text link
    Today’s integrated system on chips (SoCs) usually consist of billions of transistors accounting for both digital and analog blocks. Integrating such massive blocks on a single chip involves several challenges, especially when transferring analog blocks from an older technology to newer ones. Furthermore, the exponential growth for IoT devices necessitates small and low power circuits. Hence, new devices and architectures must be investigated to meet the power and area constraints for wireless sensor networks (WSNs). In such cases, design automation becomes an essential tool to reduce the time to market of the circuits. This dissertation focuses on automating the design process of analog designs in advanced CMOS technology nodes, as well as reciprocal quantum logic (RQL) superconducting circuits. For CMOS analog circuits, our design automation technique employs digital automatic placement and routing tools to synthesize and lay out analog blocks along with digital blocks in a cell-based design approach. This technique was demonstrated in the design of a digital-to-analog converter. In the domain of RQL circuits, the automated design of several functional units of a commercial Processor is presented. These automation techniques enable the design of VLSI-scale circuits in this technology. In addition to the investigation of new technologies, several new baseband signal processor architectures are presented in this dissertation. These architectures are suitable for low-power mm3-scale WSNs and enable high frequency transceivers to operate within the power constraints of standalone IoT nodes.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133177/1/elnaz_1.pd

    System level performance and yield optimisation for analogue integrated circuits

    No full text
    Advances in silicon technology over the last decade have led to increased integration of analogue and digital functional blocks onto the same single chip. In such a mixed signal environment, the analogue circuits must use the same process technology as their digital neighbours. With reducing transistor sizes, the impact of process variations on analogue design has become prominent and can lead to circuit performance falling below specification and hence reducing the yield.This thesis explores the methodology and algorithms for an analogue integrated circuit automation tool that optimizes performance and yield. The trade-offs between performance and yield are analysed using a combination of an evolutionary algorithm and Monte Carlo simulation. Through the integration of yield parameter into the optimisation process, the trade off between the performance functions can be better treated that able to produce a higher yield. The results obtained from the performance and variation exploration are modelled behaviourally using a Verilog-A language. The model has been verified with transistor level simulation and a silicon prototype.For a large analogue system, the circuit is commonly broken down into its constituent sub-blocks, a process known as hierarchical design. The use of hierarchical-based design and optimisation simplifies the design task and accelerates the design flow by encouraging design reuse.A new approach for system level yield optimisation using a hierarchical-based design is proposed and developed. The approach combines Multi-Objective Bottom Up (MUBU) modelling technique to model the circuit performance and variation and Top Down Constraint Design (TDCD) technique for the complete system level design. The proposed method has been used to design a 7th order low pass filter and a charge pump phase locked loop system. The results have been verified with transistor level simulations and suggest that an accurate system level performance and yield prediction can be achieved with the proposed methodology

    Low power digital baseband core for wireless Micro-Neural-Interface using CMOS sub/near-threshold circuit

    Get PDF
    This thesis presents the work on designing and implementing a low power digital baseband core with custom-tailored protocol for wirelessly powered Micro-Neural-Interface (MNI) System-on-Chip (SoC) to be implanted within the skull to record cortical neural activities. The core, on the tag end of distributed sensors, is designed to control the operation of individual MNI and communicate and control MNI devices implanted across the brain using received downlink commands from external base station and store/dump targeted neural data uplink in an energy efficient manner. The application specific protocol defines three modes (Time Stamp Mode, Streaming Mode and Snippet Mode) to extract neural signals with on-chip signal conditioning and discrimination. In Time Stamp Mode, Streaming Mode and Snippet Mode, the core executes basic on-chip spike discrimination and compression, real-time monitoring and segment capturing of neural signals so single spike timing as well as inter-spike timing can be retrieved with high temporal and spatial resolution. To implement the core control logic using sub/near-threshold logic, a novel digital design methodology is proposed which considers INWE (Inverse-Narrow-Width-Effect), RSCE (Reverse-Short-Channel-Effect) and variation comprehensively to size the transistor width and length accordingly to achieve close-to-optimum digital circuits. Ultra-low-power cell library containing 67 cells including physical cells and decoupling capacitor cells using the optimum fingers is designed, laid-out, characterized, and abstracted. A robust on-chip sense-amp-less SRAM memory (8X32 size) for storing neural data is implemented using 8T topology and LVT fingers. The design is validated with silicon tapeout and measurement shows the digital baseband core works at 400mV and 1.28 MHz system clock with an average power consumption of 2.2 ÎĽW, resulting in highest reported communication power efficiency of 290Kbps/ÎĽW to date

    Performance and area evaluations of processor-based benchmarks on FPGA devices

    Get PDF
    The computing system on SoCs is being long-term research since the FPGA technology has emerged due to its personality of re-programmable fabric, reconfigurable computing, and fast development time to market. During the last decade, uni-processor in a SoC is no longer to deal with the high growing market for complex applications such as Mobile Phones audio and video encoding, image and network processing. Due to the number of transistors on a silicon wafer is increasing, the recent FPGAs or embedded systems are advancing toward multi-processor-based design to meet tremendous performance and benefit this kind of systems are possible. Therefore, is an upcoming age of the MPSoC. In addition, most of the embedded processors are soft-cores, because they are flexible and reconfigurable for specific software functions and easy to build homogenous multi-processor systems for parallel programming. Moreover, behavioural synthesis tools are becoming a lot more powerful and enable to create datapath of logic units from high-level algorithms such as C to HDL and available for partitioning a HW/SW concurrent methodology. A range of embedded processors is able to implement on a FPGA-based prototyping to integrate the CPUs on a programmable device. This research is, firstly represent different types of computer architectures in modern embedded processors that are followed in different type of software applications (eg. Multi-threading Operations or Complex Functions) on FPGA-based SoCs; and secondly investigate their capability by executing a wide-range of multimedia software codes (Integer-algometric only) in different models of the processor-systems (uni-processor or multi-processor or Co-design), and finally compare those results in terms of the benchmarks and resource utilizations within FPGAs. All the examined programs were written in standard C and executed in a variety numbers of soft-core processors or hardware units to obtain the execution times. However, the number of processors and their customizable configuration or hardware datapath being generated are limited by a target FPGA resource, and designers need to understand the FPGA-based tradeoffs that have been considered - Speed versus Area. For this experimental purpose, I defined benchmarks into DLP / HLS catalogues, which are "data" and "function" intensive respectively. The programs of DLP will be executed in LEON3 MP and LE1 CMP multi-processor systems and the programs of HLS in the LegUp Co-design system on target FPGAs. In preliminary, the performance of the soft-core processors will be examined by executing all the benchmarks. The whole story of this thesis work centres on the issue of the execute times or the speed-up and area breakdown on FPGA devices in terms of different programs

    Third International Symposium on Space Mission Operations and Ground Data Systems, part 2

    Get PDF
    Under the theme of 'Opportunities in Ground Data Systems for High Efficiency Operations of Space Missions,' the SpaceOps '94 symposium included presentations of more than 150 technical papers spanning five topic areas: Mission Management, Operations, Data Management, System Development, and Systems Engineering. The symposium papers focus on improvements in the efficiency, effectiveness, and quality of data acquisition, ground systems, and mission operations. New technology, methods, and human systems are discussed. Accomplishments are also reported in the application of information systems to improve data retrieval, reporting, and archiving; the management of human factors; the use of telescience and teleoperations; and the design and implementation of logistics support for mission operations. This volume covers expert systems, systems development tools and approaches, and systems engineering issues

    Utilizing Magnetic Tunnel Junction Devices in Digital Systems

    Get PDF
    The research described in this dissertation is motivated by the desire to effectively utilize magnetic tunnel junctions (MTJs) in digital systems. We explore two aspects of this: (1) a read circuit useful for global clocking and magnetologic, and (2) hardware virtualization that utilizes the deeply-pipelined nature of magnetologic. In the first aspect, a read circuit is used to sense the state of an MTJ (low or high resistance) and produce a logic output that represents this state. With global clocking, an external magnetic field combined with on-chip MTJs is used as an alternative mechanism for distributing the clock signal across the chip. With magnetologic, logic is evaluated with MTJs that must be sensed by a read circuit and used to drive downstream logic. For these two uses, we develop a resistance-to-voltage (R2V) read circuit to sense MTJ resistance and produce a logic voltage output. We design and fabricate a prototype test chip in the 3 metal 2 poly 0.5 um process for testing the R2V read circuit and experimentally validating its correctness. Using a clocked low/high resistor pair, we show that the read circuit can correctly detect the input resistance and produce the desired square wave output. The read circuit speed is measured to operate correctly up to 48 MHz. The input node is relatively insensitive to node capacitance and can handle up to 10s of pF of capacitance without changing the bandwidth of the circuit. In the second aspect, hardware virtualization is a technique by which deeply-pipelined circuits that have feedback can be utilized. MTJs have the potential to act as state in a magnetologic circuit which may result in a deep pipeline. Streams of computation are then context switched into the hardware logic, allowing them to share hardware resources and more fully utilize the pipeline stages of the logic. While applicable to magnetologic using MTJs, virtualization is also applicable to traditional logic technologies like CMOS. Our investigation targets MTJs, FPGAs, and ASICs. We develop M/D/1 and M/G/1 queueing models of the performance of virtualized hardware with secondary memory using a fixed, hierarchical, round-robin schedule that predict average throughput, latency, and queue occupancy in the system. We develop three C-slow applications and calibrate them to a clock and resource model for FPGA and ASIC technologies. Last, using the M/G/1 model, we predict throughput, latency, and resource usage for MTJ, FPGA, and ASIC technologies. We show three design scenarios illustrating ways in which to use the model
    corecore