1,605 research outputs found

    Optimization of Cell-Aware Test

    Get PDF

    Benchmark methodologies for the optimized physical synthesis of RISC-V microprocessors

    Get PDF
    As technology continues to advance and chip sizes shrink, the complexity and design time required for integrated circuits have significantly increased. To address these challenges, Electronic Design Automation (EDA) tools have been introduced to streamline the design flow. These tools offer various methodologies and options to optimize power, performance, and chip area. However, selecting the most suitable methods from these options can be challenging, as they may lead to trade-offs among power, performance, and area. While architectural and Register Transfer Level (RTL) optimizations have been extensively studied in existing literature, the impact of optimization methods available in EDA tools on performance has not been thoroughly researched. This thesis aims to optimize a semiconductor processor through EDA tools within the physical synthesis domain to achieve increased performance while maintaining a balance between power efficiency and area utilization. By leveraging floorplanning tools and carefully selecting technology libraries and optimization options, the CV32E40P open-source processor is subjected to various floorplans to analyze their impact on chip performance. The employed techniques, including multibit components prefer option, multiplexer tree prefer option, identification and exclusion of problematic cells, and placement blockages, lead to significant improvements in cell density, congestion mitigation, and timing. The optimized synthesis results demonstrate a 71\% enhancement in chip design performance without a substantial increase in area, showcasing the effectiveness of these techniques in improving large-scale integrated circuits' performance, efficiency, and manufacturability. By exploring and implementing the available options in EDA tools, this study demonstrates how the processor's performance can be significantly improved while maintaining a balanced and efficient chip design. The findings contribute valuable insights to the field of electronic design automation, offering guidance to designers in selecting suitable methodologies for optimizing processors and other integrated circuits

    Database System Acceleration on FPGAs

    Get PDF
    Relational database systems provide various services and applications with an efficient means for storing, processing, and retrieving their data. The performance of these systems has a direct impact on the quality of service of the applications that rely on them. Therefore, it is crucial that database systems are able to adapt and grow in tandem with the demands of these applications, ensuring that their performance scales accordingly. In the past, Moore's law and algorithmic advancements have been sufficient to meet these demands. However, with the slowdown of Moore's law, researchers have begun exploring alternative methods, such as application-specific technologies, to satisfy the more challenging performance requirements. One such technology is field-programmable gate arrays (FPGAs), which provide ideal platforms for developing and running custom architectures for accelerating database systems. The goal of this thesis is to develop a domain-specific architecture that can enhance the performance of in-memory database systems when executing analytical queries. Our research is guided by a combination of academic and industrial requirements that seek to strike a balance between generality and performance. The former ensures that our platform can be used to process a diverse range of workloads, while the latter makes it an attractive solution for high-performance use cases. Throughout this thesis, we present the development of a system-on-chip for database system acceleration that meets our requirements. The resulting architecture, called CbMSMK, is capable of processing the projection, sort, aggregation, and equi-join database operators and can also run some complex TPC-H queries. CbMSMK employs a shared sort-merge pipeline for executing all these operators, which results in an efficient use of FPGA resources. This approach enables the instantiation of multiple acceleration cores on the FPGA, allowing it to serve multiple clients simultaneously. CbMSMK can process both arbitrarily deep and wide tables efficiently. The former is achieved through the use of the sort-merge algorithm which utilizes the FPGA RAM for buffering intermediate sort results. The latter is achieved through the use of KeRRaS, a novel variant of the forward radix sort algorithm introduced in this thesis. KeRRaS allows CbMSMK to process a table a few columns at a time, incrementally generating the final result through multiple iterations. Given that acceleration is a key objective of our work, CbMSMK benefits from many performance optimizations. For instance, multi-way merging is employed to reduce the number of merge passes required for the execution of the sort-merge algorithm, thus improving the performance of all our pipeline-breaking operators. Another example is our in-depth analysis of early aggregation, which led to the development of a novel cache-based algorithm that significantly enhances aggregation performance. Our experiments demonstrate that CbMSMK performs on average 5 times faster than the state-of-the-art CPU-based database management system MonetDB.:I Database Systems & FPGAs 1 INTRODUCTION 1.1 Databases & the Importance of Performance 1.2 Accelerators & FPGAs 1.3 Requirements 1.4 Outline & Summary of Contributions 2 BACKGROUND ON DATABASE SYSTEMS 2.1 Databases 2.1.1 Storage Model 2.1.2 Storage Medium 2.2 Database Operators 2.2.1 Projection 2.2.2 Filter 2.2.3 Sort 2.2.4 Aggregation 2.2.5 Join 2.2.6 Operator Classification 2.3 Database Queries 2.4 Impact of Acceleration 3 BACKGROUND ON FPGAS 3.1 FPGA 3.1.1 Logic Element 3.1.2 Block RAM (BRAM) 3.1.3 Digital Signal Processor (DSP) 3.1.4 IO Element 3.1.5 Programmable Interconnect 3.2 FPGADesignFlow 3.2.1 Specifications 3.2.2 RTL Description 3.2.3 Verification 3.2.4 Synthesis, Mapping, Placement, and Routing 3.2.5 TimingAnalysis 3.2.6 Bitstream Generation and FPGA Programming 3.3 Implementation Quality Metrics 3.4 FPGA Cards 3.5 Benefits of Using FPGAs 3.6 Challenges of Using FPGAs 4 RELATED WORK 4.1 Summary of Related Work 4.2 Platform Type 4.2.1 Accelerator Card 4.2.2 Coprocessor 4.2.3 Smart Storage 4.2.4 Network Processor 4.3 Implementation 4.3.1 Loop-based implementation 4.3.2 Sort-based Implementation 4.3.3 Hash-based Implementation 4.3.4 Mixed Implementation 4.4 A Note on Quantitative Performance Comparisons II Cache-Based Morphing Sort-Merge with KeRRaS (CbMSMK) 5 OBJECTIVES AND ARCHITECTURE OVERVIEW 5.1 From Requirements to Objectives 5.2 Architecture Overview 5.3 Outlineof Part II 6 COMPARATIVE ANALYSIS OF OPENCL AND RTL FOR SORT-MERGE PRIMITIVES ON FPGAS 6.1 Programming FPGAs 6.2 RelatedWork 6.3 Architecture 6.3.1 Global Architecture 6.3.2 Sorter Architecture 6.3.3 Merger Architecture 6.3.4 Scalability and Resource Adaptability 6.4 Experiments 6.4.1 OpenCL Sort-Merge Implementation 6.4.2 RTLSorters 6.4.3 RTLMergers 6.4.4 Hybrid OpenCL-RTL Sort-Merge Implementation 6.5 Summary & Discussion 7 RESOURCE-EFFICIENT ACCELERATION OF PIPELINE-BREAKING DATABASE OPERATORS ON FPGAS 7.1 The Case for Resource Efficiency 7.2 Related Work 7.3 Architecture 7.3.1 Sorters 7.3.2 Sort-Network 7.3.3 X:Y Mergers 7.3.4 Merge-Network 7.3.5 Join Materialiser (JoinMat) 7.4 Experiments 7.4.1 Experimental Setup 7.4.2 Implementation Description & Tuning 7.4.3 Sort Benchmarks 7.4.4 Aggregation Benchmarks 7.4.5 Join Benchmarks 7. Summary 8 KERRAS: COLUMN-ORIENTED WIDE TABLE PROCESSING ON FPGAS 8.1 The Scope of Database System Accelerators 8.2 Related Work 8.3 Key-Reduce Radix Sort(KeRRaS) 8.3.1 Time Complexity 8.3.2 Space Complexity (Memory Utilization) 8.3.3 Discussion and Optimizations 8.4 Architecture 8.4.1 MSM 8.4.2 MSMK: Extending MSM with KeRRaS 8.4.3 Payload, Aggregation and Join Processing 8.4.4 Limitations 8.5 Experiments 8.5.1 Experimental Setup 8.5.2 Datasets 8.5.3 MSMK vs. MSM 8.5.4 Payload-Less Benchmarks 8.5.5 Payload-Based Benchmarks 8.5.6 Flexibility 8.6 Summary 9 A STUDY OF EARLY AGGREGATION IN DATABASE QUERY PROCESSING ON FPGAS 9.1 Early Aggregation 9.2 Background & Related Work 9.2.1 Sort-Based Early Aggregation 9.2.2 Cache-Based Early Aggregation 9.3 Simulations 9.3.1 Datasets 9.3.2 Metrics 9.3.3 Sort-Based Versus Cache-Based Early Aggregation 9.3.4 Comparison of Set-Associative Caches 9.3.5 Comparison of Cache Structures 9.3.6 Comparison of Replacement Policies 9.3.7 Cache Selection Methodology 9.4 Cache System Architecture 9.4.1 Window Aggregator 9.4.2 Compressor & Hasher 9.4.3 Collision Detector 9.4.4 Collision Resolver 9.4.5 Cache 9.5 Experiments 9.5.1 Experimental Setup 9.5.2 Resource Utilization and Parameter Tuning 9.5.3 Datasets 9.5.4 Benchmarks on Synthetic Data 9.5.5 Benchmarks on Real Data 9.6 Summary 10 THE FULL PICTURE 10.1 System Architecture 10.2 Benchmarks 10.3 Meeting the Objectives III Conclusion 11 SUMMARY AND OUTLOOK ON FUTURE RESEARCH 11.1 Summary 11.2 Future Work BIBLIOGRAPHY LIST OF FIGURES LIST OF TABLE

    Optimization of Cell-Aware Test

    Get PDF

    Robust, Energy-Efficient, and Scalable Indoor Localization with Ultra-Wideband Technology

    Get PDF
    Ultra-wideband (UWB) technology has been rediscovered in recent years for its potential to provide centimeter-level accuracy in GNSS-denied environments. The large-scale adoption of UWB chipsets in smartphones brings demanding needs on the energy-efficiency, robustness, scalability, and crossdevice compatibility of UWB localization systems. This thesis investigates, characterizes, and proposes several solutions for these pressing concerns. First, we investigate the impact of different UWB device architectures on the energy efficiency, accuracy, and cross-platform compatibility of UWB localization systems. The thesis provides the first comprehensive comparison between the two types of physical interfaces (PHYs) defined in the IEEE 802.15.4 standard: with low and high pulse repetition frequency (LRP and HRP, respectively). In the comparison, we focus not only on the ranging/localization accuracy but also on the energy efficiency of the PHYs. We found that the LRP PHY consumes between 6.4–100 times less energy than the HRP PHY in the evaluated devices. On the other hand, distance measurements acquired with the HRP devices had 1.23–2 times lower standard deviation than those acquired with the LRP devices. Therefore, the HRP PHY might be more suitable for applications with high-accuracy constraints than the LRP PHY. The impact of different UWB PHYs also extends to the application layer. We found that ranging or localization error-mitigation techniques are frequently trained and tested on only one device and would likely not generalize to different platforms. To this end, we identified four challenges in developing platform-independent error-mitigation techniques in UWB localization, which can guide future research in this direction. Besides the cross-platform compatibility, localization error-mitigation techniques raise another concern: most of them rely on extensive data sets for training and testing. Such data sets are difficult and expensive to collect and often representative only of the precise environment they were collected in. We propose a method to detect and mitigate non-line-of-sight (NLOS) measurements that does not require any manually-collected data sets. Instead, the proposed method automatically labels incoming distance measurements based on their distance residuals during the localization process. The proposed detection and mitigation method reduces, on average, the mean and standard deviation of localization errors by 2.2 and 5.8 times, respectively. UWB and Bluetooth Low Energy (BLE) are frequently integrated in localization solutions since they can provide complementary functionalities: BLE is more energy-efficient than UWB but it can provide location estimates with only meter-level accuracy. On the other hand, UWB can localize targets with centimeter-level accuracy albeit with higher energy consumption than BLE. In this thesis, we provide a comprehensive study of the sources of instabilities in received signal strength (RSS) measurements acquired with BLE devices. The study can be used as a starting point for future research into BLE-based ranging techniques, as well as a benchmark for hybrid UWB–BLE localization systems. Finally, we propose a flexible scheduling scheme for time-difference of arrival (TDOA) localization with UWB devices. Unlike in previous approaches, the reference anchor and the order of the responding anchors changes every time slot. The flexible anchor allocation makes the system more robust to NLOS propagation than traditional approaches. In the proposed setup, the user device is a passive listener which localizes itself using messages received from the anchors. Therefore, the system can scale with an unlimited number of devices and can preserve the location privacy of the user. The proposed method is implemented on custom hardware using a commercial UWB chipset. We evaluated the proposed method against the standard TDOA algorithm and range-based localization. In line of sight (LOS), the proposed TDOA method has a localization accuracy similar to the standard TDOA algorithm, down to a 95% localization error of 15.9 cm. In NLOS, the proposed TDOA method outperforms the classic TDOA method in all scenarios, with a reduction of up to 16.4 cm in the localization error.Cotutelle -yhteisväitöskirj

    Marine Thruster I/O Board Redesign, Prototyping, and Certification

    Get PDF
    For about 20 years, the company Marine Technologies have used a circuit board called the IOB, which controls input and output signals. The Input Output Board (IOB) uses a logic device to manage the different signals. For the last 20 years this has been an FPGA (Field Programmable Gate Arrays). The manufacture, design, and supply of IOB belonged to another company, but the time came for Marine Technologies to claim the ownership of the IOB and make a design of their own. This was a good opportunity to make design changes and the possibility of using microcontrollers instead of FPGAs became an interesting pursuit. Microcontrollers naturally are cheaper and easier to acquire and have become considerably advanced, making them a possible replacement candidate. This thesis explores the process of implementing a microcontroller with the new IOB design and having the product certified. The new IOB must fulfill Marine Technologies’ set of demands which require it to be functionally identical to the original; it also needs to fulfill the international sets of standards that amongst other things set the demands for environmental robustness and Electromagnetic Compatibility (EMC) performance. To meet this set of demands, I completed an analysis of the current I/O usage of Marine Technologies’ systems and reduced the amount of I/O available to match this actual usage. This proved that a microcontroller have enough resources to handle the actual required I/O load of Marine Technologies’ systems. In terms of EMC, the best one can do is to design a circuit board that follows design guidelines for EMC as closely as possible and test it when the prototype arrives. The number one rule for EMC minded design, is to allow return currents to flow directly under the outgoing signal trace, which is best achieved by having dedicated, proper, and unbroken power and ground planes, placed in the layers between the top and bottom layer of the PCB. The design of the new IOB, called MT-IOB-Mk3-Transit, was done by closely examining the design of the previous two FPGA based iterations of the IOB, called the MT-IOB-Mk1 and MT-IOB-Mk2. The IOB-Mk3-Transit uses elements from both boards, by looking at 20 years of field testing and usage, what works best and what does not, while at the same time considering how the new microcontroller fits within these elements. In most aspects the IOB-Mk3-Transit is a mosaic containing elements from both the IOB-Mk1 and the Mk2, which are known to function reliably for 20 years. During functional testing of the IOB-Mk3-Transit, the crucial functions were working well. The board was tested in a certification lab in Italy, and due to the board being designed with sub optimal EMC practice, we used two attempts in Italy before finally passing the EMC tests, requiring some research at home before travelling for the second attempt. The product was then certified, installed on a vessel and is now in use. Taking the lessons learned from the IOB-Mk3-Transit, the new iteration purely called the MT-IOB-Mk3 has been designed, following the stated EMC guidelines closely to improve performance, and correcting a few minor issues of the IOB-Mk3-Transit. This board has yet to be tested. In the end, the question of using a microcontroller instead of an FPGA to perform the duties of the IOB, is only partially answered. Yes, the microcontroller can perform all the required functions that the FPGA did, and it will be implemented as a part of the Marine Technologies environment for now, but long-term reliability is a question that can only be answered by long-term use and testing.Masteroppgave i fysikkPHYS399MAMN-PHY

    A Low-Energy Security Solution for IoT-Based Smart Farms

    Get PDF
    This work proposes a novel configuration of the Transport Layer Security protocol (TLS), suitable for low energy Internet of Things (IoT), applications. The motivation behind the redesign of TLS is energy consumption minimisation and sustainable farming, as exemplified by an application domain of aquaponic smart farms. The work therefore considers decentralisation of a formerly centralised security model, with a focus on reducing energy consumption for battery powered devices. The research presents a four-part investigation into the security solution, composed of a risk assessment, energy analysis of authentication and data exchange functions, and finally the design and verification of a novel consensus authorisation mechanism. The first investigation considered traditional risk-driven threat assessment, but to include energy reduction, working towards device longevity within a content-oriented framework. Since the aquaponics environments include limited but specific data exchanges, a content-oriented approach produced valuable insights into security and privacy requirements that would later be tested by implementing a variety of mechanisms available on the ESP32. The second and third investigations featured the energy analysis of authentication and data exchange functions respectively, where the results of the risk assessment were implemented to compare the re-configurations of TLS mechanisms and domain content. Results concluded that selective confidentiality and persistent secure sessions between paired devices enabled considerable improvements for energy consumptions, and were a good reflection of the possibilities suggested by the risk assessment. The fourth and final investigation proposed a granular authorisation design to increase the safety of access control that would otherwise be binary in TLS. The motivation was for damage mitigation from inside attacks or network faults. The approach involved an automated, hierarchy-based, decentralised network topology to reduce data duplication whilst still providing robustness beyond the vulnerability of central governance. Formal verification using model-checking indicated a safe design model, using four automated back-ends. The research concludes that lower energy IoT solutions for the smart farm application domain are possible

    IoT: Communication protocols and security threats

    Get PDF
    In this study, we review the fundamentals of IoT architecture and we thoroughly present the communication protocols that have been invented especially for IoT technology. Moreover, we analyze security threats, and general implementation problems, presenting several sectors that can benefit the most from IoT development. Discussion over the findings of this review reveals open issues and challenges and specifies the next steps required to expand and support IoT systems in a secure framework

    Flexible Hardware-based Security-aware Mechanisms and Architectures

    Get PDF
    For decades, software security has been the primary focus in securing our computing platforms. Hardware was always assumed trusted, and inherently served as the foundation, and thus the root of trust, of our systems. This has been further leveraged in developing hardware-based dedicated security extensions and architectures to protect software from attacks exploiting software vulnerabilities such as memory corruption. However, the recent outbreak of microarchitectural attacks has shaken these long-established trust assumptions in hardware entirely, thereby threatening the security of all of our computing platforms and bringing hardware and microarchitectural security under scrutiny. These attacks have undeniably revealed the grave consequences of hardware/microarchitecture security flaws to the entire platform security, and how they can even subvert the security guarantees promised by dedicated security architectures. Furthermore, they shed light on the sophisticated challenges particular to hardware/microarchitectural security; it is more critical (and more challenging) to extensively analyze the hardware for security flaws prior to production, since hardware, unlike software, cannot be patched/updated once fabricated. Hardware cannot reliably serve as the root of trust anymore, unless we develop and adopt new design paradigms where security is proactively addressed and scrutinized across the full stack of our computing platforms, at all hardware design and implementation layers. Furthermore, novel flexible security-aware design mechanisms are required to be incorporated in processor microarchitecture and hardware-assisted security architectures, that can practically address the inherent conflict between performance and security by allowing that the trade-off is configured to adapt to the desired requirements. In this thesis, we investigate the prospects and implications at the intersection of hardware and security that emerge across the full stack of our computing platforms and System-on-Chips (SoCs). On one front, we investigate how we can leverage hardware and its advantages, in contrast to software, to build more efficient and effective security extensions that serve security architectures, e.g., by providing execution attestation and enforcement, to protect the software from attacks exploiting software vulnerabilities. We further propose that they are microarchitecturally configured at runtime to provide different types of security services, thus adapting flexibly to different deployment requirements. On another front, we investigate how we can protect these hardware-assisted security architectures and extensions themselves from microarchitectural and software attacks that exploit design flaws that originate in the hardware, e.g., insecure resource sharing in SoCs. More particularly, we focus in this thesis on cache-based side-channel attacks, where we propose sophisticated cache designs, that fundamentally mitigate these attacks, while still preserving performance by enabling that the performance security trade-off is configured by design. We also investigate how these can be incorporated into flexible and customizable security architectures, thus complementing them to further support a wide spectrum of emerging applications with different performance/security requirements. Lastly, we inspect our computing platforms further beneath the design layer, by scrutinizing how the actual implementation of these mechanisms is yet another potential attack surface. We explore how the security of hardware designs and implementations is currently analyzed prior to fabrication, while shedding light on how state-of-the-art hardware security analysis techniques are fundamentally limited, and the potential for improved and scalable approaches
    • …
    corecore