200 research outputs found

    Submicron Systems Architecture Project : Semiannual Technical Report

    Get PDF
    The Mosaic C is an experimental fine-grain multicomputer based on single-chip nodes. The Mosaic C chip includes 64KB of fast dynamic RAM, processor, packet interface, ROM for bootstrap and self-test, and a two-dimensional selftimed router. The chip architecture provides low-overhead and low-latency handling of message packets, and high memory and network bandwidth. Sixty-four Mosaic chips are packaged by tape-automated bonding (TAB) in an 8 x 8 array on circuit boards that can, in turn, be arrayed in two dimensions to build arbitrarily large machines. These 8 x 8 boards are now in prototype production under a subcontract with Hewlett-Packard. We are planning to construct a 16K-node Mosaic C system from 256 of these boards. The suite of Mosaic C hardware also includes host-interface boards and high-speed communication cables. The hardware developments and activities of the past eight months are described in section 2.1. The programming system that we are developing for the Mosaic C is based on the same message-passing, reactive-process, computational model that we have used with earlier multicomputers, but the model is implemented for the Mosaic in a way that supports finegrain concurrency. A process executes only in response to receiving a message, and may in execution send messages, create new processes, and modify its persistent variables before it either exits or becomes dormant in preparation for receiving another message. These computations are expressed in an object-oriented programming notation, a derivative of C++ called C+-. The computational model and the C+- programming notation are described in section 2.2. The Mosaic C runtime system, which is written in C+-, provides automatic process placement and highly distributed management of system resources. The Mosaic C runtime system is described in section 2.3

    Submicron Systems Architecture Project: Semiannual Technial Report

    Get PDF
    No abstract available

    Analysis and Optimization for Pipelined Asynchronous Systems

    Get PDF
    Most microelectronic chips used today--in systems ranging from cell phones to desktop computers to supercomputers--operate in basically the same way: they synchronize the operation of their millions of internal components using a clock that is distributed globally. This global clocking is becoming a critical design challenge in the quest for building chips that offer increasingly greater functionality, higher speed, and better energy efficiency. As an alternative, asynchronous or clockless design obviates the need for global synchronization; instead, components operate concurrently and synchronize locally only when necessary. This dissertation focuses on one class of asynchronous circuits: application specific stream processing systems (i.e. those that take in a stream of data items and produce a stream of processed results.) High-speed stream processors are a natural match for many high-end applications, including 3D graphics rendering, image and video processing, digital filters and DSPs, cryptography, and networking processors. This dissertation aims to make the design, analysis, optimization, and testing of circuits in the chosen domain both fast and efficient. Although much of the groundwork has already been laid by years of past work, my work identifies and addresses four critical missing pieces: i) fast performance analysis for estimating the throughput of a fine-grained pipelined system; ii) automated and versatile design space exploration; iii) a full suite of circuit level modules that connect together to implement a wide variety of system behaviors; and iv) testing and design for testability techniques that identify and target the types of errors found only in high-speed pipelined asynchronous systems. I demonstrate these techniques on a number of examples, ranging from simple applications that allow for easy comparison to hand-designed alternatives to more complex systems, such as a JPEG encoder. I also demonstrate these techniques through the design and test of a fully asynchronous GCD demonstration chip

    Instruction scheduling in micronet-based asynchronous ILP processors

    Get PDF

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

    Doctor of Philosophy

    Get PDF
    dissertationThe design of integrated circuit (IC) requires an exhaustive verification and a thorough test mechanism to ensure the functionality and robustness of the circuit. This dissertation employs the theory of relative timing that has the advantage of enabling designers to create designs that have significant power and performance over traditional clocked designs. Research has been carried out to enable the relative timing approach to be supported by commercial electronic design automation (EDA) tools. This allows asynchronous and sequential designs to be designed using commercial cad tools. However, two very significant holes in the flow exist: the lack of support for timing verification and manufacturing test. Relative timing (RT) utilizes circuit delay to enforce and measure event sequencing on circuit design. Asynchronous circuits can optimize power-performance product by adjusting the circuit timing. A thorough analysis on the timing characteristic of each and every timing path is required to ensure the robustness and correctness of RT designs. All timing paths have to conform to the circuit timing constraints. This dissertation addresses back-end design robustness by validating full cyclical path timing verification with static timing analysis and implementing design for testability (DFT). Circuit reliability and correctness are necessary aspects for the technology to become commercially ready. In this study, scan-chain, a commercial DFT implementation, is applied to burst-mode RT designs. In addition, a novel testing approach is developed along with scan-chain to over achieve 90% fault coverage on two fault models: stuck-at fault model and delay fault model. This work evaluates the cost of DFT and its coverage trade-off then determines the best implementation. Designs such as a 64-point fast Fourier transform (FFT) design, an I2C design, and a mixed-signal design are built to demonstrate power, area, performance advantages of the relative timing methodology and are used as a platform for developing the backend robustness. Results are verified by performing post-silicon timing validation and test. This work strengthens overall relative timed circuit flow, reliability, and testability

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    NASA Space Engineering Research Center Symposium on VLSI Design

    Get PDF
    The NASA Space Engineering Research Center (SERC) is proud to offer, at its second symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories and the electronics industry. These featured speakers share insights into next generation advances that will serve as a basis for future VLSI design. Questions of reliability in the space environment along with new directions in CAD and design are addressed by the featured speakers

    A Network-based Asynchronous Architecture for Cryptographic Devices

    Get PDF
    Institute for Computing Systems ArchitectureThe traditional model of cryptography examines the security of the cipher as a mathematical function. However, ciphers that are secure when specified as mathematical functions are not necessarily secure in real-world implementations. The physical implementations of ciphers can be extremely difficult to control and often leak socalled side-channel information. Side-channel cryptanalysis attacks have shown to be especially effective as a practical means for attacking implementations of cryptographic algorithms on simple hardware platforms, such as smart-cards. Adversaries can obtain sensitive information from side-channels, such as the timing of operations, power consumption and electromagnetic emissions. Some of the attack techniques require surprisingly little side-channel information to break some of the best known ciphers. In constrained devices, such as smart-cards, straightforward implementations of cryptographic algorithms can be broken with minimal work. Preventing these attacks has become an active and a challenging area of research. Power analysis is a successful cryptanalytic technique that extracts secret information from cryptographic devices by analysing the power consumed during their operation. A particularly dangerous class of power analysis, differential power analysis (DPA), relies on the correlation of power consumption measurements. It has been proposed that adding non-determinism to the execution of the cryptographic device would reduce the danger of these attacks. It has also been demonstrated that asynchronous logic has advantages for security-sensitive applications. This thesis investigates the security and performance advantages of using a network-based asynchronous architecture, in which the functional units of the datapath form a network. Non-deterministic execution is achieved by exploiting concurrent execution of instructions both with and without data-dependencies; and by forwarding register values between instructions with data-dependencies using randomised routing over the network. The executions of cryptographic algorithms on different architectural configurations are simulated, and the obtained power traces are subjected to DPA attacks. The results show that the proposed architecture introduces a level of non-determinism in the execution that significantly raises the threshold for DPA attacks to succeed. In addition, the performance analysis shows that the improved security does not degrade performance
    corecore