1,672 research outputs found

    A performance driven approach for hardware synthesis of guarded atomic actions

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 137-140).Hardware designers are facing new challenges in the design of complex ASIC's and processors as their sizes approach up to 100 million logic gates. We believe no adequate solution exists that allows designers to specify hardware which takes full advantage of the available resources in these devices. The hardware design specification languages are either too low level to support efficient large scale design (for example, Verilog), or the language and synthesis methodology is so high-level that the designer's micro-architectural ingenuity is lost in the design process. This results in circuits that oftentimes do not match the designer's expectations (for example, C-based behavioral synthesis). 'This thesis presents a design methodology and related synthesis algorithms that address several of the key issues of hardware design specification and high-level synthesis while avoiding the pitfalls of past approaches. The areas we focus on are modular compilation and performance specification. The modular flow allows for the separate compilation of modules and ensures the correct usage of module interfaces by attaching annotations with well defined semantics to them. We also introduce performance specifications as a core part of a design description.(cont.) This allows a designer to more easily achieve the expected design performance and it allows for rapid micro-architectural exploration. We chose guarded atomic actions as the foundation of this research because of their clean execution semantics. These semantics allow for easy design transformation (either manual or compiler driven) while ensuring that the correctness of the design is maintained. We demonstrate the practicality and power of this methodology using several examples, such as a processor which from a single design description can automatically be transformed into an unpipelined processor or a superscalar processor simply by changing a single-line performance specification.by Daniel L. Rosenband.Ph.D

    Guarded atomic actions and refinement in a system-on-chip development flow: bridging the specification gap with Event-B

    No full text
    Modern System-on-chip (SoC) hardware design puts considerable pressure on existing design and verification flows, languages and tools. The Register Transfer Level (RTL)description, which forms the input for synchronous, logic synthesis-driven design is at too low a level of abstraction for efficient architectural exploration and re-use. The existing methods for taking a high-level paper specification and refining this specification to an implementation that meets its performance criteria is largely manual and error-prone and as RTL descriptions get larger, a systematic design method is necessary to address explicitly the timing issues that arise when applying logic synthesis to such large blocks.Guarded Atomic Actions have been shown to offer a convenient notation for describing microarchitectures that is amenable to formal reasoning and high-level synthesis. Event-B is a language and method that supports the development of specifications with automatic proof and refinement, based on guarded atomic actions. Latency-insensitive design ensures that a design composed of functionally correct components will be independent of communication latency. A method has been developed which uses Event-B for latency-insensitive SoC component and sub-system design which can be combined with high-level, component synthesis to enable architectural exploration and re-use at the specification level and to close the specification gap in the SoC hardware flow

    Automatic generation of hardware/software interfaces

    Get PDF
    Enabling new applications for mobile devices often requires the use of specialized hardware to reduce power consumption. Because of time-to-market pressure, current design methodologies for embedded applications require an early partitioning of the design, allowing the hardware and software to be developed simultaneously, each adhering to a rigid interface contract. This approach is problematic for two reasons: (1) a detailed hardware-software interface is difficult to specify until one is deep into the design process, and (2) it prevents the later migration of functionality across the interface motivated by efficiency concerns or the addition of features. We address this problem using the Bluespec Codesign Language~(BCL) which permits the designer to specify the hardware-software partition in the source code, allowing the compiler to synthesize efficient software and hardware along with transactors for communication between the partitions. The movement of functionality across the hardware-software boundary is accomplished by simply specifying a new partitioning, and since the compiler automatically generates the desired interface specifications, it eliminates yet another error-prone design task. In this paper we present BCL, an extension of a commercially available hardware design language (Bluespec SystemVerilog), a new software compiling scheme, and preliminary results generated using our compiler for various hardware-software decompositions of an Ogg Vorbis audio decoder, and a ray-tracing application.National Science Foundation (U.S.) (NSF (#CCF-0541164))National Research Foundation of Korea (grant from the Korean Government (MEST) (#R33-10095)

    Submicron Systems Architecture: Semiannual Technical Report

    Get PDF
    No abstract available

    An empirical evaluation of High-Level Synthesis languages and tools for database acceleration

    Get PDF
    High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.Peer ReviewedPostprint (authorā€™s final draft

    EOS: A project to investigate the design and construction of real-time distributed embedded operating systems

    Get PDF
    The EOS project is investigating the design and construction of a family of real-time distributed embedded operating systems for reliable, distributed aerospace applications. Using the real-time programming techniques developed in co-operation with NASA in earlier research, the project staff is building a kernel for a multiple processor networked system. The first six months of the grant included a study of scheduling in an object-oriented system, the design philosophy of the kernel, and the architectural overview of the operating system. In this report, the operating system and kernel concepts are described. An environment for the experiments has been built and several of the key concepts of the system have been prototyped. The kernel and operating system is intended to support future experimental studies in multiprocessing, load-balancing, routing, software fault-tolerance, distributed data base design, and real-time processing

    Hierarchical Transactions for Hardware/Software Cosynthesis

    Get PDF
    Modern heterogeneous devices provide of a variety of computationally diverse components holding tremendous performance and power capability. Hardware-software cosynthesis offers system-level synthesis and optimization opportunities to realize the potential of these evolving architectures. Efficiently coordinating high-throughput data to make use of available computational resources requires a myriad of distributed local memories, caching structures, and data motion resources. In fact, storage, caching, and data transfer components comprise the majority of silicon real estate. Conventional automated approaches, unfortunately, do not effectively represent applications in a way that captures data motion and state management which dictate dominant system costs. Consequently, existing cosynthesis methods suffer from poor utility of computational resources. Automated cosynthesis tailored towards memory-centric optimizations can address the challenge, adapting partitioning, scheduling, mapping, and binding techniques to maximize overall system utility.This research presents a novel hierarchical transaction model that formalizes state and control management through an abstract data/control encapsulation semantic. It is designed from the ground-up to enable efficient synthesis across heterogeneous system components, with an emphasis on memory capacity constraints. It intrinsically encourages a high degree of concurrency and latency tolerance, and provides verification tools to ensure correctness. A unique data/execution hierarchical encapsulation framework guarantees scalable analysis, supporting a novel concept of state and control mobility. A front-end language allows concise expression of designer intent, and is structured with synthesis in mind. Designers express families of valid executions in a minimal format through high-level dependencies, type systems, and computational relationships, allowing synthesis tools to manage lower-level details. This dissertation introduces and exercises the model, discussing language construction, demonstrating control and data-dominated applications, and presenting a synthesis path that exhibits near-linear scalability with problem size

    Synthesis of multi-cycle circuits from guarded atomic actions

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 143-147).One solution to the timing closure problem is to perform infrequent operations in more than one clock cycle. Despite the apparent simplicity of the solution statement, it is not easily considered because it requires changes in RTL, which in turn exacerbates the verification problem. Another approach to the problem is to avoid it altogether, by using a high-level design methodology and allow the synthesis tool to generate the design that matches design requirements. This approach hinges on the ability of the tool to be able to generate satisfactory RTL from the high-level description, an ability which often cannot be tested until late in the project. Failure to meet the requirements can result in costly delays as an alternative way of expressing the design intent is sought and experimented with. We offer a timing closure solution that does not suffer from these problems. We have selected atomic actions as the high-level design methodology. We exploit the fact that semantics of atomic actions are untimed, that is, the time to execute an action does not change its outcome. The current hardware synthesis technique from atomic actions assumes that each action takes one clock cycle to complete its computation. Consequently, the action with the longest combinational path determines the clock cycle of the entire design, often leading to needlessly slow circuits. By augmenting the description of the actions with desired timing information, we allow the designer to split long paths over multiple clock cycles without giving up the semantics of atomicity. We also introduce loops with dynamic bounds into the atomic action description. These loops are not unrolled for synthesis, but the guards are evaluated for each iteration. Our synthesis results show that the clock speed and performance of circuits can be improved substantially with our technique, without having to substantially change the design.by Michal Karczmarek.Ph.D

    A methodology for hardware-software codesign

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 150-156).Special purpose hardware is vital to embedded systems as it can simultaneously improve performance while reducing power consumption. The integration of special purpose hardware into applications running in software is difficult for a number of reasons. Some of the difficulty is due to the difference between the models used to program hardware and software, but great effort is also required to coordinate the simultaneous execution of the application running on the microprocessor with the accelerated kernel(s) running in hardware. To further compound the problem, current design methodologies for embedded applications require an early determination of the design partitioning which allows hardware and software to be developed simultaneously, each adhering to a rigid interface contract. This approach is problematic because often a good hardware-software decomposition is not known until deep into the design process. Fixed interfaces and the burden of reimplementation prevent the migration of functionality motivated by repartitioning. This thesis presents a two-part solution to the integration of special purpose hardware into applications running in software. The first part addresses the problem of generating infrastructure for hardware-accelerated applications. We present a methodology in which the application is represented as a dataflow graph and the computation at each node is specified for execution either in software or as specialized hardware using the programmer's language of choice. An interface compiler as been implemented which takes as input the FIFO edges of the graph and generates code to connect all the different parts of the program, including those which communicate across the hardware/software boundary. This methodology, which we demonstrate on an FPGA platform, enables programmers to effectively exploit hardware acceleration without ever leaving the application space. The second part of this thesis presents an implementation of the Bluespec Codesign Language (BCL) to address the difficulty of experimenting with hardware/software partitioning alternatives. Based on guarded atomic actions, BCL can be used to specify both hardware and low-level software. Based on Bluespec SystemVerilog (BSV) for which a hardware compiler by Bluespec Inc. is commercially available, BCL has been augmented with extensions to support more efficient software generation. In BCL, the programmer specifies the entire design, including the partitioning, allowing the compiler to synthesize efficient software and hardware, along with transactors for communication between the partitions. The benefit of using a single language to express the entire design is that a programmer can easily experiment with many different hardware/software decompositions without needing to re-write the application code. Used together, the BCL and interface compilers represent a comprehensive solution to the task of integrating specialized hardware into an application.by Myron King.Ph.D
    • ā€¦
    corecore