473 research outputs found

    Accelerated Verilog Simulator Using Application Specific Microprocessor

    Get PDF
    Logic simulation is an important step in Very Large Scale Integration (VLSI) IC development. Advancement in Hardware Description Language (HDL) has made Verilog a widely adopted language used to model digital circuit and verification test bench. Electronic Design Automation (EDA) vendor provides software and hardwareassisted approaches to carry out simulations. However, software-based simulator is slow whereas hardware-assisted simulator does not offer the same simulation fidelity stipulated in Verilog. In this research project, a hardware-assisted Verilog simulator, VerCPU System, was proposed to address shortcomings in existing platforms. The simulator core is a custom designed application specific microprocessor specifically adapted to handle Verilog simulation. The microprocessor computes Verilog data in its native form while supporting event-driven parallelism directly to achieve speed supremacy. Being a compiled-code simulator, simulation fidelity compliancy is retained to offer the same result and visibility like the software-based solution. A functional system, VerCPU, was developed and prototyped on a Field Programmable Gate Array (FPGA) development board. This system was successfully verified and benchmarked against a software-based compiled-code simulator, i.e. Synopsys VCS®. VerCPU System can already achieve up to 6 times speed superiority with basic speed improvement techniques applied. The simulator had proven to be a viable alternate Verilog simulator to meet future simulation needs

    Formal Verification of an Iterative Low-Power x86 Floating-Point Multiplier with Redundant Feedback

    Full text link
    We present the formal verification of a low-power x86 floating-point multiplier. The multiplier operates iteratively and feeds back intermediate results in redundant representation. It supports x87 and SSE instructions in various precisions and can block the issuing of new instructions. The design has been optimized for low-power operation and has not been constrained by the formal verification effort. Additional improvements for the implementation were identified through formal verification. The formal verification of the design also incorporates the implementation of clock-gating and control logic. The core of the verification effort was based on ACL2 theorem proving. Additionally, model checking has been used to verify some properties of the floating-point scheduler that are relevant for the correct operation of the unit.Comment: In Proceedings ACL2 2011, arXiv:1110.447

    Heracles: A Tool for Fast RTL-Based Design Space Exploration of Multicore Processors

    Get PDF
    This paper presents Heracles, an open-source, functional, parameterized, synthesizable multicore system toolkit. Such a multi/many-core design platform is a powerful and versatile research and teaching tool for architectural exploration and hardware-software co-design. The Heracles toolkit comprises the soft hardware (HDL) modules, application compiler, and graphical user interface. It is designed with a high degree of modularity to support fast exploration of future multicore processors of di erent topologies, routing schemes, processing elements (cores), and memory system organizations. It is a component-based framework with parameterized interfaces and strong emphasis on module reusability. The compiler toolchain is used to map C or C++ based applications onto the processing units. The GUI allows the user to quickly con gure and launch a system instance for easy factorial development and evaluation. Hardware modules are implemented in synthesizable Verilog and are FPGA platform independent. The Heracles tool is freely available under the open-source MIT license at: http://projects.csail.mit.edu/heracle

    A Structured Design Methodology for High Performance VLSI Arrays

    Get PDF
    abstract: The geometric growth in the integrated circuit technology due to transistor scaling also with system-on-chip design strategy, the complexity of the integrated circuit has increased manifold. Short time to market with high reliability and performance is one of the most competitive challenges. Both custom and ASIC design methodologies have evolved over the time to cope with this but the high manual labor in custom and statistic design in ASIC are still causes of concern. This work proposes a new circuit design strategy that focuses mostly on arrayed structures like TLB, RF, Cache, IPCAM etc. that reduces the manual effort to a great extent and also makes the design regular, repetitive still achieving high performance. The method proposes making the complete design custom schematic but using the standard cells. This requires adding some custom cells to the already exhaustive library to optimize the design for performance. Once schematic is finalized, the designer places these standard cells in a spreadsheet, placing closely the cells in the critical paths. A Perl script then generates Cadence Encounter compatible placement file. The design is then routed in Encounter. Since designer is the best judge of the circuit architecture, placement by the designer will allow achieve most optimal design. Several designs like IPCAM, issue logic, TLB, RF and Cache designs were carried out and the performance were compared against the fully custom and ASIC flow. The TLB, RF and Cache were the part of the HEMES microprocessor.Dissertation/ThesisPh.D. Electrical Engineering 201

    SPICE²: A Spatial, Parallel Architecture for Accelerating the Spice Circuit Simulator

    Get PDF
    Spatial processing of sparse, irregular floating-point computation using a single FPGA enables up to an order of magnitude speedup (mean 2.8X speedup) over a conventional microprocessor for the SPICE circuit simulator. We deliver this speedup using a hybrid parallel architecture that spatially implements the heterogeneous forms of parallelism available in SPICE. We decompose SPICE into its three constituent phases: Model-Evaluation, Sparse Matrix-Solve, and Iteration Control and parallelize each phase independently. We exploit data-parallel device evaluations in the Model-Evaluation phase, sparse dataflow parallelism in the Sparse Matrix-Solve phase and compose the complete design in streaming fashion. We name our parallel architecture SPICE²: Spatial Processors Interconnected for Concurrent Execution for accelerating the SPICE circuit simulator. We program the parallel architecture with a high-level, domain-specific framework that identifies, exposes and exploits parallelism available in the SPICE circuit simulator. This design is optimized with an auto-tuner that can scale the design to use larger FPGA capacities without expert intervention and can even target other parallel architectures with the assistance of automated code-generation. This FPGA architecture is able to outperform conventional processors due to a combination of factors including high utilization of statically-scheduled resources, low-overhead dataflow scheduling of fine-grained tasks, and overlapped processing of the control algorithms. We demonstrate that we can independently accelerate Model-Evaluation by a mean factor of 6.5X(1.4--23X) across a range of non-linear device models and Matrix-Solve by 2.4X(0.6--13X) across various benchmark matrices while delivering a mean combined speedup of 2.8X(0.2--11X) for the two together when comparing a Xilinx Virtex-6 LX760 (40nm) with an Intel Core i7 965 (45nm). With our high-level framework, we can also accelerate Single-Precision Model-Evaluation on NVIDIA GPUs, ATI GPUs, IBM Cell, and Sun Niagara 2 architectures. We expect approaches based on exploiting spatial parallelism to become important as frequency scaling slows down and modern processing architectures turn to parallelism (\eg multi-core, GPUs) due to constraints of power consumption. This thesis shows how to express, exploit and optimize spatial parallelism for an important class of problems that are challenging to parallelize.</p

    Filling the Assurance Gap on Complex Electronics

    Get PDF
    Many of the methods used to develop software bare a close resemblance to Complex Electronics (CE) development. CE are now programmed to perform tasks that were previously handled by software, such as communication protocols. For example, the James Webb Space Telescope will use Field Programmable Gate Arrays (FPGAs), which can have over a million logic gates, to send telemetry. System-on-chip (SoC) devices, another type of complex electronics, can combine a microprocessor, input and output channels, and sometimes an FPGA for programmability. With this increased intricacy, the possibility of software-like bugs such as incorrect design, logic, and unexpected interactions within the logic is great. Since CE devices are obscuring the hardware/software boundary, mature software methodologies have been proposed, with slight modifications, to develop these devices. By using standardized S/W Engineering methods such as checklists, missing requirements and bugs can be detected earlier in the development cycle, thus creating a development process for CE that can be easily maintained and configurable based on the device used

    A Survey on IEEE 1588 Implementation for RISC-V Low-Power Embedded Devices

    Get PDF
    IEEE 1588, also known as the Precision Time Protocol (PTP), is a standard protocol for clock synchronization in distributed systems. While it is not architecture-specific, implementing IEEE 1588 on Reduced Instruction Set Computer-V (RISC-V) low-power embedded devices demands considering the system requirements and available resources. This paper explores various approaches and techniques to achieve accurate time synchronization in such instruments. The analysis covers software and hardware implementations, discussing each method’s challenges, benefits, and trade-offs. By examining the state-of-the-art in this field, this paper provides valuable insights and guidance for researchers and engineers working on time-critical applications in RISC-V-based embedded systems, aiding in selecting the most-suitable stack for their designs.This work was partially supported by the ECSEL Joint Undertaking in the H2020 project IMOCO4.E, grant agreement No.10100731, and by the Basque Government within the fund for research groups of the Basque University System IT1440-22 and KK-2023/00015

    Digital-to-Analog Converter Interface for Computer Assisted Biologically Inspired Systems

    Get PDF
    In today\u27s integrated circuit technology, system interfaces play an important role of enabling fast, reliable data communications. A key feature of this work is the exploration and development of ultra-low power data converters. Data converters are present in some form in almost all mixed-signal systems; in particular, digital-to-analog converters present the opportunity for digitally controlled analog signal sources. Such signal sources are used in a variety of applications such as neuromorphic systems and analog signal processing. Multi-dimensional systems, such as biologically inspired neuromorphic systems, require vectors of analog signals. To use a microprocessor to control these analog systems, we must ultimately convert the digital control signal to an analog control signal and deliver it to the system. Integrating such capabilities of a converter on chip can yield significant power and chip area constraints. Special attention is paid to the power efficiency of the data converter, the data converter design discussed in this thesis yields the lowest power consumption to date. The need for a converter with these properties leads us to the concept of a scalable array of power-efficient digital-to-analog converters; the channels of which are time-domain multiplexed so that chip-area is minimized while preserving performance. To take further advantage of microprocessor capabilities, an analog-to- digital design is proposed to return the analog system\u27s outputs to the microprocessor in a digital form. A current-steering digital-to-analog converter was chosen as a candidate for the conversion process because of its natural speed and voltage-to-current translation properties. This choice is nevertheless unusual, because current-steering digital- to-analog converters have a reputation for high performance with high power consumption. A time domain multiplexing scheme is presented such that a digital data set of any size is synchronously multiplexed through a finite array of converters, minimizing the total area and power consumption. I demonstrate the suitability of current-steering digital-to-analog converters for ultra low-power operation with a proof-of-concept design in a widely available 130 nm CMOS technology. In statistical simulation, the proposed digital-to-analog converter was capable of 8-bit, 100 kSps operation while consuming 231 nW of power from a 1 V supply

    Doctor of Philosophy

    Get PDF
    dissertationAdvancements in process technology and circuit techniques have enabled the creation of small chemical microsystems for use in a wide variety of biomedical and sensing applications. For applications requiring a small microsystem, many components can be integrated onto a single chip. This dissertation presents many low-power circuits, digital and analog, integrated onto a single chip called the Utah Microcontroller. To guide the design decisions for each of these components, two specific microsystems have been selected as target applications: a Smart Intravaginal Ring (S-IVR) and an NO releasing catheter. Both of these applications share the challenging requirements of integrating a large variety of low-power mixed-signal circuitry onto a single chip. These applications represent the requirements of a broad variety of small low-power sensing systems. In the course of the development of the Utah Microcontroller, several unique and significant contributions were made. A central component of the Utah Microcontroller is the WIMS Microprocessor, which incorporates a low-power feature called a scratchpad memory. For the first time, an analysis of scaling trends projected that scratchpad memories will continue to save power for the foreseeable future. This conclusion was bolstered by measured data from a fabricated microcontroller. In a 32 nm version of the WIMS Microprocessor, the scratchpad memory is projected to save ~10-30% of memory access energy depending upon the characteristics of the embedded program. Close examination of application requirements informed the design of an analog-to-digital converter, and a unique single-opamp buffered charge scaling DAC was developed to minimize power consumption. The opamp was designed to simultaneously meet the varied demands of many chip components to maximize circuit reuse. Each of these components are functional, have been integrated, fabricated, and tested. This dissertation successfully demonstrates that the needs of emerging small low-power microsystems can be met in advanced process nodes with the incorporation of low-power circuit techniques and design choices driven by application requirements
    corecore