113 research outputs found
Benchmark methodologies for the optimized physical synthesis of RISC-V microprocessors
As technology continues to advance and chip sizes shrink, the complexity and design time required for integrated circuits have significantly increased. To address these challenges, Electronic Design Automation (EDA) tools have been introduced to streamline the design flow. These tools offer various methodologies and options to optimize power, performance, and chip area. However, selecting the most suitable methods from these options can be challenging, as they may lead to trade-offs among power, performance, and area. While architectural and Register Transfer Level (RTL) optimizations have been extensively studied in existing literature, the impact of optimization methods available in EDA tools on performance has not been thoroughly researched. This thesis aims to optimize a semiconductor processor through EDA tools within the physical synthesis domain to achieve increased performance while maintaining a balance between power efficiency and area utilization. By leveraging floorplanning tools and carefully selecting technology libraries and optimization options, the CV32E40P open-source processor is subjected to various floorplans to analyze their impact on chip performance. The employed techniques, including multibit components prefer option, multiplexer tree prefer option, identification and exclusion of problematic cells, and placement blockages, lead to significant improvements in cell density, congestion mitigation, and timing. The optimized synthesis results demonstrate a 71\% enhancement in chip design performance without a substantial increase in area, showcasing the effectiveness of these techniques in improving large-scale integrated circuits' performance, efficiency, and manufacturability. By exploring and implementing the available options in EDA tools, this study demonstrates how the processor's performance can be significantly improved while maintaining a balanced and efficient chip design. The findings contribute valuable insights to the field of electronic design automation, offering guidance to designers in selecting suitable methodologies for optimizing processors and other integrated circuits
Methodology and Ecosystem for the Design of a Complex Network ASIC
Performance of HPC systems has risen steadily. While the 10 Petaflop/s barrier has been breached in the year 2011 the next large step into the exascale era is expected sometime between the years 2018 and 2020. The EXTOLL project will be an integral part in this venture. Originally designed as a research project on FPGA basis it will make the transition to an ASIC to improve its already excelling performance even further. This transition poses many challenges that will be presented in this thesis. Nowadays, it is not enough to look only at single components in a system. EXTOLL is part of complex ecosystem which must be optimized overall since everything is tightly interwoven and disregarding some aspects can cause the whole system either to work with limited performance or even to fail.
This thesis examines four different aspects in the design hierarchy and proposes efficient solutions or improvements for each of them. At first it takes a look at the design implementation and the differences between FPGA and ASIC design. It introduces a methodology to equip all on-chip memory with ECC logic automatically without the user’s input and in a transparent way so that the underlying code that uses the memory does not have to be changed. In the next step the floorplanning process is analyzed and an iterative solution is worked out based on physical and logical constraints of the EXTOLL design. Besides, a work flow for collaborative design is presented that allows multiple users to work on the design concurrently. The third part concentrates on the high-speed signal path from the chip to the connector and how it is affected by technological limitations. All constraints are analyzed and a package layout for the EXTOLL chip is proposed that is seen as the optimal solution. The last part develops a cost model for wafer and package level test and raises technological concerns that will affect the testing methodology. In order to run testing internally it proposes the development of a stand-alone test platform that is able to test packaged EXTOLL chips in every aspect
Optimising and evaluating designs for reconfigurable hardware
Growing demand for computational performance, and the rising cost for chip design and
manufacturing make reconfigurable hardware increasingly attractive for digital system implementation.
Reconfigurable hardware, such as field-programmable gate arrays (FPGAs),
can deliver performance through parallelism while also providing flexibility to enable
application builders to reconfigure them. However, reconfigurable systems, particularly
those involving run-time reconfiguration, are often developed in an ad-hoc manner. Such
an approach usually results in low designer productivity and can lead to inefficient designs.
This thesis covers three main achievements that address this situation. The first
achievement is a model that captures design parameters of reconfigurable hardware and
performance parameters of a given application domain. This model supports optimisations
for several design metrics such as performance, area, and power consumption. The second
achievement is a technique that enhances the relocatability of bitstreams for reconfigurable
devices, taking into account heterogeneous resources. This method increases the flexibility
of modules represented by these bitstreams while reducing configuration storage size and
design compilation time. The third achievement is a technique to characterise the power
consumption of FPGAs in different activity modes. This technique includes the evaluation
of standby power and dedicated low-power modes, which are crucial in meeting the
requirements for battery-based mobile devices
Creation and detection of hardware trojans using non-invasive off-the-shelf technologies
As a result of the globalisation of the semiconductor design and fabrication processes, integrated circuits are becoming increasingly vulnerable to malicious attacks. The most concerning threats are hardware trojans. A hardware trojan is a malicious inclusion or alteration to the existing design of an integrated circuit, with the possible effects ranging from leakage of sensitive information to the complete destruction of the integrated circuit itself. While the majority of existing detection schemes focus on test-time, they all require expensive methodologies to detect hardware trojans. Off-the-shelf approaches have often been overlooked due to limited hardware resources and detection accuracy. With the advances in technologies and the democratisation of open-source hardware, however, these tools enable the detection of hardware trojans at reduced costs during or after production. In this manuscript, a hardware trojan is created and emulated on a consumer FPGA board. The experiments to detect the trojan in a dormant and active state are made using off-the-shelf technologies taking advantage of different techniques such as Power Analysis Reports, Side Channel Analysis and Thermal Measurements. Furthermore, multiple attempts to detect the trojan are demonstrated and benchmarked. Our simulations result in a state-of-the-art methodology to accurately detect the trojan in both dormant and active states using off-the-shelf hardware
Recommended from our members
EDA design for Microscale Modular Assembled ASIC (M2A2) circuits
As the semiconductor industry has driven down the minimum feature size to well below 50nm, the mask cost to make devices has skyrocketed. The cost for a full set of masks is estimated to be about 2M for 65nm lithography nodes. According to some estimates, mask writing time goes up as a power of five as feature sizes are decreased below 50nm. In addition, higher complexity of large designs increases the number of design re-spins. The above two factors lead to considerable increase in the nonrecurring engineering cost (NRE) for standard cell ASICs, which has become prohibitively expensive for low to mid volume applications. Field programmable gate array (FPGAs) offer an acceptable solution for fast prototyping and ultra-low volume applications, but are generally not seen as a replacement for ASICs because of their highly inefficient space utilization, lower performance/speed and high power consumption. This is particularly the case as mobility has driven expectations for small form factor and low power consumption. In this work, a new type of ASICs named as Microscale Modular Assembled ASIC (M2A2) is proposed. This technology is a novel application of the high-speed, precision assembly technique for fabrication of ASICs using a limited number of mass-produced feedstock logic circuits. The idea is to share the mask cost for sub-100nm feature sizes across a large number of ASIC designs, decreasing the NRE for individual designs. The concept of constructing ASICs using repeating logic elements is based on previous works where it has been shown that ASICs made of via/metal configured structured elements can achieve space utilization and performance comparable to cell based ASICs. However, in the proposed technique, we provide significantly more choice in the transistor layer, in terms of feedstock types and their configuration. This thesis document deals with the electronic design automation (EDA) design for microscale modular assembled ASIC based circuits. The document discusses the design of feedstock cells, generation of feedstock preplaced design, generation of design collaterals to support M2A2 EDA flow, and front end M2A2 synthesis flow to meet the required functionality of design and achieve optimal quality of results (QoR) metrics in terms of circuit performance/speed, power and areaElectrical and Computer Engineerin
Graphics Processing Unit-Based Computer-Aided Design Algorithms for Electronic Design Automation
The electronic design automation (EDA) tools are a specific set of software that play important roles in modern integrated circuit (IC) design. These software automate the design processes of IC with various stages. Among these stages, two important EDA design tools are the focus of this research: floorplanning and global routing. Specifically, the goal of this study is to parallelize these two tools such that their execution time can be significantly shortened on modern multi-core and graphics processing unit (GPU) architectures. The GPU hardware is a massively parallel architecture, enabling thousands of independent threads to execute concurrently. Although a small set of EDA tools can benefit from using GPU to accelerate their speed, most algorithms in this field are designed with the single-core paradigm in mind. The floorplanning and global routing algorithms are among the latter, and difficult to render any speedup on the GPU due to their inherent sequential nature.
This work parallelizes the floorplanning and global routing algorithm through a novel approach and results in significant speedups for both tools implemented on the GPU hardware. Specifically, with a complete overhaul of solution space and design space exploration, a GPU-based floorplanning algorithm is able to render 4-166X speedup, while achieving similar or improved solutions compared with the sequential algorithm. The GPU-based global routing algorithm is shown to achieve significant speedup against existing state-of-the-art routers, while delivering competitive solution quality. Importantly, this parallel model for global routing renders a stable solution that is independent from the level of parallelism. In summary, this research has shown that through a design paradigm overhaul, sequential algorithms can also benefit from the massively parallel architecture. The findings of this study have a positive impact on the efficiency and design quality of modern EDA design flow
Design, Extraction, and Optimization Tool Flows and Methodologies for Homogeneous and Heterogeneous Multi-Chip 2.5D Systems
Chip and packaging industries are making significant progress in 2.5D design as a result of increasing popularity of their application. In advanced high-density 2.5D packages, package redistribution layers become similar to chip Back-End-of-Line routing layers, and the gap between them scales down with pin density improvement. Chiplet-package interactions become significant and severely affect system performance and reliability. Moreover, 2.5D integration offers opportunities to apply novel design techniques. The traditional die-by-die design approach neither carefully considers these interactions nor fully exploits the cross-boundary design opportunities.
This thesis presents chiplet-package cross-boundary design, extraction, analysis, and optimization tool flows and methodologies for high-density 2.5D packaging technologies. A holistic flow is presented that can capture all parasitics from chiplets and the package and improve system performance through iterative optimizations. Several design techniques are demonstrated for agile development and quick turn-around time. To validate the flow in silicon, a chip was taped out and studied in TSMC 65nm technology. As the holistic flow cannot handle heterogeneous technologies, in-context flows are presented. Three different flavors of the in-context flow are presented, which offer trade-offs between scalability and accuracy in heterogeneous 2.5D system designs. Inductance is an inseparable part of a package design. A holistic flow is presented that takes package inductance into account in timing analysis and optimization steps. Custom CAD tools are developed to make these flows compatible with the industry standard tools and the foundry model. To prove the effectiveness of the flows several design cases of an ARM Cortex-M0 are implemented for comparitive study
Dynamically and partially reconfigurable hardware architectures for high performance microarray bioinformatics data analysis
The field of Bioinformatics and Computational Biology (BCB) is a multidisciplinary field
that has emerged due to the computational demands of current state-of-the-art biotechnology.
BCB deals with the storage, organization, retrieval, and analysis of biological datasets,
which have grown in size and complexity in recent years especially after the completion of
the human genome project. The advent of Microarray technology in the 1990s has resulted in
the new concept of high throughput experiment, which is a biotechnology that measures the
gene expression profiles of thousands of genes simultaneously. As such, Microarray requires
high computational power to extract the biological relevance from its high dimensional data.
Current general purpose processors (GPPs) has been unable to keep-up with the increasing
computational demands of Microarrays and reached a limit in terms of clock speed.
Consequently, Field Programmable Gate Arrays (FPGAs) have been proposed as a low
power viable solution to overcome the computational limitations of GPPs and other methods.
The research presented in this thesis harnesses current state-of-the-art FPGAs and tools to
accelerate some of the most widely used data mining methods used for the analysis of
Microarray data in an effort to investigate the viability of the technology as an efficient, low
power, and economic solution for the analysis of Microarray data. Three widely used
methods have been selected for the FPGA implementations: one is the un-supervised Kmeans
clustering algorithm, while the other two are supervised classification methods,
namely, the K-Nearest Neighbour (K-NN) and Support Vector Machines (SVM). These
methods are thought to benefit from parallel implementation. This thesis presents detailed
designs and implementations of these three BCB applications on FPGA captured in Verilog
HDL, whose performance are compared with equivalent implementations running on GPPs.
In addition to acceleration, the benefits of current dynamic partial reconfiguration (DPR)
capability of modern Xilinx’ FPGAs are investigated with reference to the aforementioned
data mining methods.
Implementing K-means clustering on FPGA using non-DPR design flow has
outperformed equivalent implementations in GPP and GPU in terms of speed-up by two
orders and one order of magnitude, respectively; while being eight times more power
efficient than GPP and four times more than a GPU implementation. As for the energy
efficiency, the FPGA implementation was 615 times more energy efficient than GPPs, and 31 times more than GPUs. Over and above, the FPGA implementation outperformed the
GPP and GPU implementations in terms of speed-up as the dimensionality of the Microarray
data increases. Additionally, the DPR implementations of the K-means clustering have
shown speed-up in partial reconfiguration time of ~5x and 17x over full chip reconfiguration
for single-core and eight-core implementations, respectively.
Two architectures of the K-NN classifier have been implemented on FPGA, namely, A1
and A2. The K-NN implementation based on A1 architecture achieved a speed-up of ~76x
over an equivalent GPP implementation whereas the A2 architecture achieved ~68x speedup.
Furthermore, the FPGA implementation outperformed the equivalent GPP
implementation when the dimensionality of data was increased. In addition, The DPR
implementations of the K-NN classifier have achieved speed-ups in reconfiguration time
between ~4x to 10x over full chip reconfiguration when reconfiguring portion of the
classifier or the complete classifier.
Similar to K-NN, two architectures of the SVM classifier were implemented on FPGA
whereby the former outperformed an equivalent GPP implementation by ~61x and the latter
by ~49x. As for the DPR implementation of the SVM classifier, it has shown a speed-up of
~8x in reconfiguration time when reconfiguring the complete core or when exchanging it
with a K-NN core forming a multi-classifier.
The aforementioned implementations clearly show FPGAs to be an efficacious, efficient
and economic solution for bioinformatics Microarrays data analysis
- …