1,833 research outputs found

    AI/ML Algorithms and Applications in VLSI Design and Technology

    Full text link
    An evident challenge ahead for the integrated circuit (IC) industry in the nanometer regime is the investigation and development of methods that can reduce the design complexity ensuing from growing process variations and curtail the turnaround time of chip manufacturing. Conventional methodologies employed for such tasks are largely manual; thus, time-consuming and resource-intensive. In contrast, the unique learning strategies of artificial intelligence (AI) provide numerous exciting automated approaches for handling complex and data-intensive tasks in very-large-scale integration (VLSI) design and testing. Employing AI and machine learning (ML) algorithms in VLSI design and manufacturing reduces the time and effort for understanding and processing the data within and across different abstraction levels via automated learning algorithms. It, in turn, improves the IC yield and reduces the manufacturing turnaround time. This paper thoroughly reviews the AI/ML automated approaches introduced in the past towards VLSI design and manufacturing. Moreover, we discuss the scope of AI/ML applications in the future at various abstraction levels to revolutionize the field of VLSI design, aiming for high-speed, highly intelligent, and efficient implementations

    An Ultra-Low Cost and Multicast-Enabled Asynchronous NoC for Neuromorphic Edge Computing

    Get PDF
    Biological brains are increasingly taken as a guide toward more efficient forms of computing. The latest frontier considers the use of spiking neural-network-based neuromorphic processors for near-sensor data processing, in order to fit the tight power and resource budgets of edge computing devices. However, a prevailing focus on brain-inspired computing and storage primitives in the design of neuromorphic systems is currently bringing a fundamental bottleneck to the forefront: chip-scale communications. While communication architectures (typically, a network-on-chip) are generally inspired by, or even borrowed from, general purpose computing, neuromorphic communications exhibit unique characteristics: they consist of the event-driven routing of small amounts of information to a large number of destinations within tight area and power budgets. This article aims at an inflection point in network-on-chip design for brain-inspired communications, revolving around the combination of cost-effective and robust asynchronous design, architecture specialization for short messaging and lightweight hardware support for tree-based multicast. When validated with functional spiking neural network traffic, the proposed NoC delivers energy savings ranging from 42% to 71% over a state-of-the-art NoC used in a real multi-core neuromorphic processor for edge computing applications.<br/

    Beyond the arithmetic constraint: depth-optimal mapping of logic chains in reconfigurable fabrics

    Get PDF
    Look-up table based FPGAs have migrated from a niche technology for design prototyping to a valuable end-product component and, in some cases, a replacement for general purpose processors and ASICs alike. One way architects have bridged the performance gap between FPGAs and ASICs is through the inclusion of specialized components such as multipliers, RAM modules, and microcontrollers. Another dedicated structure that has become standard in reconfigurable fabrics is the arithmetic carry chain. Currently, it is only used to map arithmetic operations as identified by HDL macros. For non-arithmetic operations, it is an idle but potentially powerful resource.;Obstacles to using the carry chain for generic logic operations include lack of architectural and computer-aided design support. Current carry-select architectures facilitate carry chain reuse, although they do so only for (K-1)-input operations. Additionally, hardware description language (HDL) macros are the only recourse for a designer wishing to map generic logic chains in a carry-select architecture. A novel architecture that allows the full K-input operational capacity of the carry chain to be harnessed is presented as a solution to current architectural limitations. It is shown to have negligible impact on logic element area and delay. Using only two additional 2:1 pass transistor multiplexers, it enables the transmission of a K-input operation to the carry chain and general routing simultaneously. To successfully identify logic chains in an arbitrary Boolean network, ChainMap is presented as a novel technology mapping algorithm. ChainMap creates delay-optimal generic logic chains in polynomial time without HDL macros. It maps both arithmetic and non-arithmetic logic chains whenever depth increasing nodes, which increase logic depth but not routing depth, are encountered. Use of the chain is not reserved for arithmetic, but rather any set of gates exhibiting similar characteristics. By using the carry chain as a generic, near zero-delay adjacent cell interconnection structure a potential average optimal speedup of 1.4x is revealed. Post place and route experiments indicate that ChainMap solutions perform similarly to HDL chains when cluster resources are abundant and significantly better in cluster-constrained arrays

    Methodologies and Toolflows for the Predictable Design of Reliable and Low-Power NoCs

    Get PDF
    There is today the unmistakable need to evolve design methodologies and tool ows for Network-on-Chip based embedded systems. In particular, the quest for low-power requirements is nowadays a more-than-ever urgent dilemma. Modern circuits feature billion of transistors, and neither power management techniques nor batteries capacity are able to endure the increasingly higher integration capability of digital devices. Besides, power concerns come together with modern nanoscale silicon technology design issues. On one hand, system failure rates are expected to increase exponentially at every technology node when integrated circuit wear-out failure mechanisms are not compensated for. However, error detection and/or correction mechanisms have a non-negligible impact on the network power. On the other hand, to meet the stringent time-to-market deadlines, the design cycle of such a distributed and heterogeneous architecture must not be prolonged by unnecessary design iterations. Overall, there is a clear need to better discriminate reliability strategies and interconnect topology solutions upfront, by ranking designs based on power metric. In this thesis, we tackle this challenge by proposing power-aware design technologies. Finally, we take into account the most aggressive and disruptive methodology for embedded systems with ultra-low power constraints, by migrating NoC basic building blocks to asynchronous (or clockless) design style. We deal with this challenge delivering a standard cell design methodology and mainstream CAD tool ows, in this way partially relaxing the requirement of using asynchronous blocks only as hard macros

    A Low-Power DSP Architecture for a Fully Implantable Cochlear Implant System-on-a-Chip.

    Full text link
    The National Science Foundation Wireless Integrated Microsystems (WIMS) Engineering Research Center at the University of Michigan developed Systems-on-a-Chip to achieve biomedical implant and environmental monitoring functionality in low-milliwatt power consumption and 1-2 cm3 volume. The focus of this work is implantable electronics for cochlear implants (CIs), surgically implanted devices that utilize existing nerve connections between the brain and inner-ear in cases where degradation of the sensory hair cells in the cochlea has occurred. In the absence of functioning hair cells, a CI processes sound information and stimulates the nderlying nerve cells with currents from implanted electrodes, enabling the patient to understand speech. As the brain of the WIMS CI, the WIMS microcontroller unit (MCU) delivers the communication, signal processing, and storage capabilities required to satisfy the aggressive goals set forth. The 16-bit MCU implements a custom instruction set architecture focusing on power-efficient execution by providing separate data and address register windows, multi-word arithmetic, eight addressing modes, and interrupt and subroutine support. Along with 32KB of on-chip SRAM, a low-power 512-byte scratchpad memory is utilized by the WIMS custom compiler to obtain an average of 18% energy savings across benchmarks. A synthesizable dynamic frequency scaling circuit allows the chip to select a precision on-chip LC or ring oscillator, and perform clock scaling to minimize power dissipation; it provides glitch-free, software-controlled frequency shifting in 100ns, and dissipates only 480μW. A highly flexible and expandable 16-channel Continuous Interleaved Sampling Digital Signal Processor (DSP) is included as an MCU peripheral component. Modes are included to process data, stimulate through electrodes, and allow experimental stimulation or processing. The entire WIMS MCU occupies 9.18mm2 and consumes only 1.79mW from 1.2V in DSP mode. This is the lowest reported consumption for a cochlear DSP. Design methodologies were analyzed and a new top-down design flow is presented that encourages hardware and software co-design as well as cross-domain verification early in the design process. An O(n) technique for energy-per-instruction estimations both pre- and post-silicon is presented that achieves less than 4% error across benchmarks. This dissertation advances low-power system design while providing an improvement in hearing recovery devices.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91488/1/emarsman_1.pd

    High level synthesis of memory architectures

    Get PDF

    Partial VLSI implementation of the architecture for reusable components (ARC)

    Get PDF
    This work describes a novel VLSI implementation of the Architecture for Reusable Components (ARC) processor, using Hardware Description Language (HDL). The main goal here is to achieve efficient execution of reusable software through proper hardware support. This involves the hard wired implementation of each instruction designed for the ARC processor. Instructions are broken down into their logical functions, then modeled and simulated through the hierarchical design methods that HDL offers. The structural model of the processor has been developed and simulated. The purpose here has been to begin work on the design and implementation of the ARC processor. The instructions were built using HDL modules, and then simulated using a logic simulator. The effect of internal propagation delays in the execution of the logic modules have been investigated. Changes in delay parameters have been applied to obtain correct logic transfer operations. The redundancy in the logic transfer operations have also been investigated to see parallelism at the instruction execution level

    Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip

    Get PDF
    The sustained demand for faster, more powerful chips has been met by the availability of chip manufacturing processes allowing for the integration of increasing numbers of computation units onto a single die. The resulting outcome, especially in the embedded domain, has often been called SYSTEM-ON-CHIP (SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC). MPSoC design brings to the foreground a large number of challenges, one of the most prominent of which is the design of the chip interconnection. With a number of on-chip blocks presently ranging in the tens, and quickly approaching the hundreds, the novel issue of how to best provide on-chip communication resources is clearly felt. NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable answer to this design concern. By bringing large-scale networking concepts to the on-chip domain, they guarantee a structured answer to present and future communication requirements. The point-to-point connection and packet switching paradigms they involve are also of great help in minimizing wiring overhead and physical routing issues. However, as with any technology of recent inception, NoC design is still an evolving discipline. Several main areas of interest require deep investigation for NoCs to become viable solutions: • The design of the NoC architecture needs to strike the best tradeoff among performance, features and the tight area and power constraints of the onchip domain. • Simulation and verification infrastructure must be put in place to explore, validate and optimize the NoC performance. • NoCs offer a huge design space, thanks to their extreme customizability in terms of topology and architectural parameters. Design tools are needed to prune this space and pick the best solutions. • Even more so given their global, distributed nature, it is essential to evaluate the physical implementation of NoCs to evaluate their suitability for next-generation designs and their area and power costs. This dissertation performs a design space exploration of network-on-chip architectures, in order to point-out the trade-offs associated with the design of each individual network building blocks and with the design of network topology overall. The design space exploration is preceded by a comparative analysis of state-of-the-art interconnect fabrics with themselves and with early networkon- chip prototypes. The ultimate objective is to point out the key advantages that NoC realizations provide with respect to state-of-the-art communication infrastructures and to point out the challenges that lie ahead in order to make this new interconnect technology come true. Among these latter, technologyrelated challenges are emerging that call for dedicated design techniques at all levels of the design hierarchy. In particular, leakage power dissipation, containment of process variations and of their effects. The achievement of the above objectives was enabled by means of a NoC simulation environment for cycleaccurate modelling and simulation and by means of a back-end facility for the study of NoC physical implementation effects. Overall, all the results provided by this work have been validated on actual silicon layout
    • …
    corecore