4,598 research outputs found

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

    Get PDF
    Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

    Patient Observers and Non-perturbative Infrared Dynamics in Inflation

    Get PDF
    We have previously derived the effect of soft graviton modes on the quantum state of de Sitter using spontaneously broken asymptotic symmetries. In the present paper we reinterpret this effect in terms of particle production and relate the quantum states with and without soft modes by means of Bogoliubov transformations. This also enables us to address the much discussed issues regarding the observability of infrared effects in de Sitter from a new perspective. While it is commonly agreed that infrared effects are not visible to a single sub-horizon observer at late times, we argue that the question is less trivial for a {\it patient observer} who has lived long enough to have a record of the state before the soft mode was created. Though classically there is no obstruction to measuring this effect locally, we give several indications that quantum mechanical uncertainties may censor the effect. We then apply our methods to find a non-perturbative description of the quantum state pertaining to the Page time of de Sitter, and derive with these new methods the probability distribution for the local quantum states of de Sitter and slow-roll inflation in the presence of long modes. Finally, we use this to formulate a precise criterion for the existence of eternal inflation in general classes of slow-roll inflation.Comment: 37 page

    Domain-specific and reconfigurable instruction cells based architectures for low-power SoC

    Get PDF

    Synthesis and Optimization of Reversible Circuits - A Survey

    Full text link
    Reversible logic circuits have been historically motivated by theoretical research in low-power electronics as well as practical improvement of bit-manipulation transforms in cryptography and computer graphics. Recently, reversible circuits have attracted interest as components of quantum algorithms, as well as in photonic and nano-computing technologies where some switching devices offer no signal gain. Research in generating reversible logic distinguishes between circuit synthesis, post-synthesis optimization, and technology mapping. In this survey, we review algorithmic paradigms --- search-based, cycle-based, transformation-based, and BDD-based --- as well as specific algorithms for reversible synthesis, both exact and heuristic. We conclude the survey by outlining key open challenges in synthesis of reversible and quantum logic, as well as most common misconceptions.Comment: 34 pages, 15 figures, 2 table

    AUTOMATING DATA-LAYOUT DECISIONS IN DOMAIN-SPECIFIC LANGUAGES

    Get PDF
    A long-standing challenge in High-Performance Computing (HPC) is the simultaneous achievement of programmer productivity and hardware computational efficiency. The challenge has been exacerbated by the onset of multi- and many-core CPUs and accelerators. Only a few expert programmers have been able to hand-code domain-specific data transformations and vectorization schemes needed to extract the best possible performance on such architectures. In this research, we examined the possibility of automating these methods by developing a Domain-Specific Language (DSL) framework. Our DSL approach extends C++14 by embedding into it a high-level data-parallel array language, and by using a domain-specific compiler to compile to hybrid-parallel code. We also implemented an array index-space transformation algebra within this high-level array language to manipulate array data-layouts and data-distributions. The compiler introduces a novel method for SIMD auto-vectorization based on array data-layouts. Our new auto-vectorization technique is shown to outperform the default auto-vectorization strategy by up to 40% for stencil computations. The compiler also automates distributed data movement with overlapping of local compute with remote data movement using polyhedral integer set analysis. Along with these main innovations, we developed a new technique using C++ template metaprogramming for developing embedded DSLs using C++. We also proposed a domain-specific compiler intermediate representation that simplifies data flow analysis of abstract DSL constructs. We evaluated our framework by constructing a DSL for the HPC grand-challenge domain of lattice quantum chromodynamics. Our DSL yielded performance gains of up to twice the flop rate over existing production C code for selected kernels. This gain in performance was obtained while using less than one-tenth the lines of code. The performance of this DSL was also competitive with the best hand-optimized and hand-vectorized code, and is an order of magnitude better than existing production DSLs.Doctor of Philosoph

    VLSI architectures for high speed Fourier transform processing

    Get PDF

    Architectural Solutions for NanoMagnet Logic

    Get PDF
    The successful era of CMOS technology is coming to an end. The limit on minimum fabrication dimensions of transistors and the increasing leakage power hinder the technological scaling that has characterized the last decades. In several different ways, this problem has been addressed changing the architectures implemented in CMOS, adopting parallel processors and thus increasing the throughput at the same operating frequency. However, architectural alternatives cannot be the definitive answer to a continuous increase in performance dictated by Moore’s law. This problem must be addressed from a technological point of view. Several alternative technologies that could substitute CMOS in next years are currently under study. Among them, magnetic technologies such as NanoMagnet Logic (NML) are interesting because they do not dissipate any leakage power. More- over, magnets have memory capability, so it is possible to merge logic and memory in the same device. However, magnetic circuits, and NML in this specific research, have also some important drawbacks that need to be addressed: first, the circuit clock frequency is limited to 100 MHz, to avoid errors in data propagation; second, there is a connection between circuit layout and timing, and in particular, longer wires will have longer latency. These drawbacks are intrinsic to the technology and for this reason they cannot be avoided. The only chance is to limit their impact from an architectural point of view. The first step followed in the research path of this thesis is indeed the choice and optimization of architectures able to deal with the problems of NML. Systolic Ar- rays are identified as an ideal solution for this technology, because they are regular structures with local interconnections that limit the long latency of wires; more- over they are composed of several Processing Elements that work in parallel, thus exploit parallelization to increase throughput (limiting the impact of the low clock frequency). Through the analysis of Systolic Arrays for NML, several possible im- provements have been identified and addressed: 1) it has been defined a rigorous way to increase throughput with interleaving, providing equations that allow to esti- mate the number of operations to be interleaved and the rules to provide inputs; 2) a latency insensitive circuit has been designed, that exploits a data communication protocol between processing elements to avoid data synchronization problems. This feature has been exploited to design a latency insensitive Systolic Array that is able to execute the Floyd-Steinberg dithering algorithm. All the improvements presented in this framework apply to Systolic Arrays implemented in any technology. So, they can also be exploited to increase performance of today’s CMOS parallel circuits. This research path is presented in Chapter 3. While Systolic Arrays are an interesting solution for NML, their usage could be quite limited because they are normally application-specific. The second re- search path addresses this problem. A Reconfigurable Systolic Array is presented, that can be programmed to execute several algorithms. This architecture has been tested implementing many algorithms, including FIR and IIR filters, Discrete Cosine Transform and Matrix Multiplication. This research path is presented in Chapter 4. In common Von Neumann architectures, the logic part of the circuit and the memory one are separated. Today bus communication between logic and memory represents the bottleneck of the system. This problem is addressed presenting Logic- In-Memory (LIM), an architecture where memory elements are merged in logic ones. This research path aims at defining a real LIM architectures. This has been done in two steps. The first step is represented by an architecture composed of three layers: memory, routing and logic. In the second step instead the routing plane is no more present, and its features are inherited by the memory plane. In this solution, a pyramidal memory model is used, where memories near logic elements contain the most probably used data, and other memory layers contain the remaining data and instruction set. This circuit has been tested with odd-even sort algorithms and it has been benchmarked against GPUs and ASIC. This research path is presented in Chapter 5. MagnetoElastic NML (ME-NML) is a technological improvement of the NML principle, proposed by researchers of Politecnico di Torino, where the clock system is based on the induced stretch of a piezoelectric substrate when a voltage is ap- plied to its boundaries. The main advantage of this solution is that it consumes much less power than the classic clock implementation. This technology has not yet been investigated from an architectural point of view and considering complex circuits. In this research field, a standard methodology for the design of ME-NML circuits has been proposed. It is based on a Standard Cell Library and an enhanced VHDL model. The effectiveness of this methodology has been proved designing a Galois Field Multiplier. Moreover the serial-parallel trade-off in ME-NML has been investigated, designing three different solutions for the Multiply and Accumulate structure. This research path is presented in Chapter 6. While ME-NML is an extremely interesting technology, it needs to be combined with other faster technologies to have a real competitive system. Signal interfaces between NML and other technologies (mainly CMOS) have been rarely presented in literature. A mixed-technology multiplexer is designed and presented as the basis for a CMOS to NML interface. The reverse interface (from ME-NML to CMOS) is instead based on a sensing circuit for the Faraday effect: a change in the polarization of a magnet induces an electric field that can be used to generate an input signal for a CMOS circuit. This research path is presented in Chapter 7. The research work presented in this thesis represents a fundamental milestone in the path towards nanotechnologies. The most important achievement is the de- sign and simulation of complex circuits with NML, benchmarking this technology with real application examples. The characterization of a technology considering complex functions is a major step to be performed and that has not yet been ad- dressed in literature for NML. Indeed, only in this way it is possible to intercept in advance any weakness of NanoMagnet Logic that cannot be discovered consid- ering only small circuits. Moreover, the architectural improvements introduced in this thesis, although technology-driven, can be actually applied to any technology. We have demonstrated the advantages that can derive applying them to CMOS cir- cuits. This thesis represents therefore a major step in two directions: the first is the enhancement of NML technology; the second is a general improvement of parallel architectures and the development of the new Logic-In-Memory paradigm
    • …
    corecore