1,691 research outputs found

    Specification and Verification of Synchronous Hardware using LOTOS

    Get PDF
    This paper investigates specification and verification of synchronous circuits using DILL (Digital Logic in LOTOS). After an overview of the DILL approach, the paper focuses on the characteristics of synchronous circuits. A more constrained model is presented for specifying digital components and verifying them. Two standard benchmark circuits are specified using this new model, and analysed by the CADP toolset (Cæsar/Aldébaran Development Package)

    Hardware-Assisted Secure Computation

    Get PDF
    The theory community has worked on Secure Multiparty Computation (SMC) for more than two decades, and has produced many protocols for many settings. One common thread in these works is that the protocols cannot use a Trusted Third Party (TTP), even though this is conceptually the simplest and most general solution. Thus, current protocols involve only the direct players---we call such protocols self-reliant. They often use blinded boolean circuits, which has several sources of overhead, some due to the circuit representation and some due to the blinding. However, secure coprocessors like the IBM 4758 have actual security properties similar to ideal TTPs. They also have little RAM and a slow CPU.We call such devices Tiny TTPs. The availability of real tiny TTPs opens the door for a different approach to SMC problems. One major challenge with this approach is how to execute large programs on large inputs using the small protected memory of a tiny TTP, while preserving the trust properties that an ideal TTP provides. In this thesis we have investigated the use of real TTPs to help with the solution of SMC problems. We start with the use of such TTPs to solve the Private Information Retrieval (PIR) problem, which is one important instance of SMC. Our implementation utilizes a 4758. The rest of the thesis is targeted at general SMC. Our SMC system, Faerieplay, moves some functionality into a tiny TTP, and thus avoids the blinded circuit overhead. Faerieplay consists of a compiler from high-level code to an arithmetic circuit with special gates for efficient indirect array access, and a virtual machine to execute this circuit on a tiny TTP while maintaining the typical SMC trust properties. We report on Faerieplay\u27s security properties, the specification of its components, and our implementation and experiments. These include comparisons with the Fairplay circuit-based two-party system, and an implementation of the Dijkstra graph shortest path algorithm. We also provide an implementation of an oblivious RAM which supports similar tiny TTP-based SMC functionality but using a standard RAM program. Performance comparisons show Faerieplay\u27s circuit approach to be considerably faster, at the expense of a more constrained programming environment when targeting a circuit

    A survey of 5G technologies: regulatory, standardization and industrial perspectives

    Get PDF
    In recent years, there have been significant developments in the research on 5th Generation (5G) networks. Several enabling technologies are being explored for the 5G mobile system era. The aim is to evolve a cellular network that is intrinsically flexible and remarkably pushes forward the limits of legacy mobile systems across all dimensions of performance metrics. All the stakeholders, such as regulatory bodies, standardization authorities, industrial fora, mobile operators and vendors, must work in unison to bring 5G to fruition. In this paper, we aggregate the 5G-related information coming from the various stakeholders, in order to i) have a comprehensive overview of 5G and ii) to provide a survey of the envisioned 5G technologies; their development thus far from the perspective of those stakeholders will open up new frontiers of services and applications for next-generation wireless networks. Keywords: 5G, ITU, Next-generation wireless network

    Algorithm to layout (ATL) systems for VLSI design

    Get PDF
    PhD ThesisThe complexities involved in custom VLSI design together with the failure of CAD techniques to keep pace with advances in the fabrication technology have resulted in a design bottleneck. Powerful tools are required to exploit the processing potential offered by the densities now available. Describing a system in a high level algorithmic notation makes writing, understanding, modification, and verification of a design description easier. It also removes some of the emphasis on the physical issues of VLSI design, and focus attention on formulating a correct and well structured design. This thesis examines how current trends in CAD techniques might influence the evolution of advanced Algorithm To Layout (ATL) systems. The envisaged features of an example system are specified. Particular attention is given to the implementation of one its features COPTS (Compilation Of Occam Programs To Schematics). COPTS is capable of generating schematic diagrams from which an actual layout can be derived. It takes a description written in a subset of Occam and generates a high level schematic diagram depicting its realisation as a VLSI system. This diagram provides the designer with feedback on the relative placement and interconnection of the operators used in the source code. It also gives a visual representation of the parallelism defined in the Occam description. Such diagrams are a valuable aid in documenting the implementation of a design. Occam has also been selected as the input to the design system that COPTS is a feature of. The choice of Occam was made on the assumption that the most appropriate algorithmic notation for such a design system will be a suitable high level programming language. This is in contrast to current automated VLSI design systems, which typically use a hardware des~ription language for input. These special purpose languages currently concentrate on handling structural/behavioural information and have limited ability to express algorithms. Using a language such as Occam allows a designer to write a behavioural description which can be compiled and executed as a simulator, or prototype, of the system. The programmability introduced into the design process enables designers to concentrate on a design's underlying algorithm. The choice of this algorithm is the most crucial decision since it determines the performance and area of the silicon implementation. The thesis is divided into four sections, each of several chapters. The first section considers VLSI design complexity, compares the expert systems and silicon compilation approaches to tackling it, and examines its parallels with software complexity. The second section reviews the advantages of using a conventional programming language for VLSI system descriptions. A number of alternative high level programming languages are considered for application in VLSI design. The third section defines the overall ATL system COPTS is envisaged to be part of, and considers the schematic representation of Occam programs. The final section presents a summary of the overall project and suggestions for future work on realising the full ATL system

    Power Side Channels in Security ICs: Hardware Countermeasures

    Full text link
    Power side-channel attacks are a very effective cryptanalysis technique that can infer secret keys of security ICs by monitoring the power consumption. Since the emergence of practical attacks in the late 90s, they have been a major threat to many cryptographic-equipped devices including smart cards, encrypted FPGA designs, and mobile phones. Designers and manufacturers of cryptographic devices have in response developed various countermeasures for protection. Attacking methods have also evolved to counteract resistant implementations. This paper reviews foundational power analysis attack techniques and examines a variety of hardware design mitigations. The aim is to highlight exposed vulnerabilities in hardware-based countermeasures for future more secure implementations

    VLSI design methodology

    Get PDF

    Synthesis of hardware systems from very high level behavioural specifications

    Get PDF

    Database System Acceleration on FPGAs

    Get PDF
    Relational database systems provide various services and applications with an efficient means for storing, processing, and retrieving their data. The performance of these systems has a direct impact on the quality of service of the applications that rely on them. Therefore, it is crucial that database systems are able to adapt and grow in tandem with the demands of these applications, ensuring that their performance scales accordingly. In the past, Moore's law and algorithmic advancements have been sufficient to meet these demands. However, with the slowdown of Moore's law, researchers have begun exploring alternative methods, such as application-specific technologies, to satisfy the more challenging performance requirements. One such technology is field-programmable gate arrays (FPGAs), which provide ideal platforms for developing and running custom architectures for accelerating database systems. The goal of this thesis is to develop a domain-specific architecture that can enhance the performance of in-memory database systems when executing analytical queries. Our research is guided by a combination of academic and industrial requirements that seek to strike a balance between generality and performance. The former ensures that our platform can be used to process a diverse range of workloads, while the latter makes it an attractive solution for high-performance use cases. Throughout this thesis, we present the development of a system-on-chip for database system acceleration that meets our requirements. The resulting architecture, called CbMSMK, is capable of processing the projection, sort, aggregation, and equi-join database operators and can also run some complex TPC-H queries. CbMSMK employs a shared sort-merge pipeline for executing all these operators, which results in an efficient use of FPGA resources. This approach enables the instantiation of multiple acceleration cores on the FPGA, allowing it to serve multiple clients simultaneously. CbMSMK can process both arbitrarily deep and wide tables efficiently. The former is achieved through the use of the sort-merge algorithm which utilizes the FPGA RAM for buffering intermediate sort results. The latter is achieved through the use of KeRRaS, a novel variant of the forward radix sort algorithm introduced in this thesis. KeRRaS allows CbMSMK to process a table a few columns at a time, incrementally generating the final result through multiple iterations. Given that acceleration is a key objective of our work, CbMSMK benefits from many performance optimizations. For instance, multi-way merging is employed to reduce the number of merge passes required for the execution of the sort-merge algorithm, thus improving the performance of all our pipeline-breaking operators. Another example is our in-depth analysis of early aggregation, which led to the development of a novel cache-based algorithm that significantly enhances aggregation performance. Our experiments demonstrate that CbMSMK performs on average 5 times faster than the state-of-the-art CPU-based database management system MonetDB.:I Database Systems & FPGAs 1 INTRODUCTION 1.1 Databases & the Importance of Performance 1.2 Accelerators & FPGAs 1.3 Requirements 1.4 Outline & Summary of Contributions 2 BACKGROUND ON DATABASE SYSTEMS 2.1 Databases 2.1.1 Storage Model 2.1.2 Storage Medium 2.2 Database Operators 2.2.1 Projection 2.2.2 Filter 2.2.3 Sort 2.2.4 Aggregation 2.2.5 Join 2.2.6 Operator Classification 2.3 Database Queries 2.4 Impact of Acceleration 3 BACKGROUND ON FPGAS 3.1 FPGA 3.1.1 Logic Element 3.1.2 Block RAM (BRAM) 3.1.3 Digital Signal Processor (DSP) 3.1.4 IO Element 3.1.5 Programmable Interconnect 3.2 FPGADesignFlow 3.2.1 Specifications 3.2.2 RTL Description 3.2.3 Verification 3.2.4 Synthesis, Mapping, Placement, and Routing 3.2.5 TimingAnalysis 3.2.6 Bitstream Generation and FPGA Programming 3.3 Implementation Quality Metrics 3.4 FPGA Cards 3.5 Benefits of Using FPGAs 3.6 Challenges of Using FPGAs 4 RELATED WORK 4.1 Summary of Related Work 4.2 Platform Type 4.2.1 Accelerator Card 4.2.2 Coprocessor 4.2.3 Smart Storage 4.2.4 Network Processor 4.3 Implementation 4.3.1 Loop-based implementation 4.3.2 Sort-based Implementation 4.3.3 Hash-based Implementation 4.3.4 Mixed Implementation 4.4 A Note on Quantitative Performance Comparisons II Cache-Based Morphing Sort-Merge with KeRRaS (CbMSMK) 5 OBJECTIVES AND ARCHITECTURE OVERVIEW 5.1 From Requirements to Objectives 5.2 Architecture Overview 5.3 Outlineof Part II 6 COMPARATIVE ANALYSIS OF OPENCL AND RTL FOR SORT-MERGE PRIMITIVES ON FPGAS 6.1 Programming FPGAs 6.2 RelatedWork 6.3 Architecture 6.3.1 Global Architecture 6.3.2 Sorter Architecture 6.3.3 Merger Architecture 6.3.4 Scalability and Resource Adaptability 6.4 Experiments 6.4.1 OpenCL Sort-Merge Implementation 6.4.2 RTLSorters 6.4.3 RTLMergers 6.4.4 Hybrid OpenCL-RTL Sort-Merge Implementation 6.5 Summary & Discussion 7 RESOURCE-EFFICIENT ACCELERATION OF PIPELINE-BREAKING DATABASE OPERATORS ON FPGAS 7.1 The Case for Resource Efficiency 7.2 Related Work 7.3 Architecture 7.3.1 Sorters 7.3.2 Sort-Network 7.3.3 X:Y Mergers 7.3.4 Merge-Network 7.3.5 Join Materialiser (JoinMat) 7.4 Experiments 7.4.1 Experimental Setup 7.4.2 Implementation Description & Tuning 7.4.3 Sort Benchmarks 7.4.4 Aggregation Benchmarks 7.4.5 Join Benchmarks 7. Summary 8 KERRAS: COLUMN-ORIENTED WIDE TABLE PROCESSING ON FPGAS 8.1 The Scope of Database System Accelerators 8.2 Related Work 8.3 Key-Reduce Radix Sort(KeRRaS) 8.3.1 Time Complexity 8.3.2 Space Complexity (Memory Utilization) 8.3.3 Discussion and Optimizations 8.4 Architecture 8.4.1 MSM 8.4.2 MSMK: Extending MSM with KeRRaS 8.4.3 Payload, Aggregation and Join Processing 8.4.4 Limitations 8.5 Experiments 8.5.1 Experimental Setup 8.5.2 Datasets 8.5.3 MSMK vs. MSM 8.5.4 Payload-Less Benchmarks 8.5.5 Payload-Based Benchmarks 8.5.6 Flexibility 8.6 Summary 9 A STUDY OF EARLY AGGREGATION IN DATABASE QUERY PROCESSING ON FPGAS 9.1 Early Aggregation 9.2 Background & Related Work 9.2.1 Sort-Based Early Aggregation 9.2.2 Cache-Based Early Aggregation 9.3 Simulations 9.3.1 Datasets 9.3.2 Metrics 9.3.3 Sort-Based Versus Cache-Based Early Aggregation 9.3.4 Comparison of Set-Associative Caches 9.3.5 Comparison of Cache Structures 9.3.6 Comparison of Replacement Policies 9.3.7 Cache Selection Methodology 9.4 Cache System Architecture 9.4.1 Window Aggregator 9.4.2 Compressor & Hasher 9.4.3 Collision Detector 9.4.4 Collision Resolver 9.4.5 Cache 9.5 Experiments 9.5.1 Experimental Setup 9.5.2 Resource Utilization and Parameter Tuning 9.5.3 Datasets 9.5.4 Benchmarks on Synthetic Data 9.5.5 Benchmarks on Real Data 9.6 Summary 10 THE FULL PICTURE 10.1 System Architecture 10.2 Benchmarks 10.3 Meeting the Objectives III Conclusion 11 SUMMARY AND OUTLOOK ON FUTURE RESEARCH 11.1 Summary 11.2 Future Work BIBLIOGRAPHY LIST OF FIGURES LIST OF TABLE
    corecore