751 research outputs found

    The Case for Asymmetric Systolic Array Floorplanning

    Full text link
    The widespread proliferation of deep learning applications has triggered the need to accelerate them directly in hardware. General Matrix Multiplication (GEMM) kernels are elemental deep-learning constructs and they inherently map onto Systolic Arrays (SAs). SAs are regular structures that are well-suited for accelerating matrix multiplications. Typical SAs use a pipelined array of Processing Elements (PEs), which communicate with local connections and pre-orchestrated data movements. In this work, we show that the physical layout of SAs should be asymmetric to minimize wirelength and improve energy efficiency. The floorplan of the SA adjusts better to the asymmetric widths of the horizontal and vertical data buses and their switching activity profiles. It is demonstrated that such physically asymmetric SAs reduce interconnect power by 9.1% when executing state-of-the-art Convolutional Neural Network (CNN) layers, as compared to SAs of the same size but with a square (i.e., symmetric) layout. The savings in interconnect power translate, in turn, to 2.1% overall power savings.Comment: CNNA 202

    Low-Power Data Streaming in Systolic Arrays with Bus-Invert Coding and Zero-Value Clock Gating

    Full text link
    Systolic Array (SA) architectures are well suited for accelerating matrix multiplications through the use of a pipelined array of Processing Elements (PEs) communicating with local connections and pre-orchestrated data movements. Even though most of the dynamic power consumption in SAs is due to multiplications and additions, pipelined data movement within the SA constitutes an additional important contributor. The goal of this work is to reduce the dynamic power consumption associated with the feeding of data to the SA, by synergistically applying bus-invert coding and zero-value clock gating. By exploiting salient attributes of state-of-the-art CNNs, such as the value distribution of the weights, the proposed SA applies appropriate encoding only to the data that exhibits high switching activity. Similarly, when one of the inputs is zero, unnecessary operations are entirely skipped. This selectively targeted, application-aware encoding approach is demonstrated to reduce the dynamic power consumption of data streaming in CNN applications using Bfloat16 arithmetic by 1%-19%. This translates to an overall dynamic power reduction of 6.2%-9.4%.Comment: International Conference on Modern Circuits and Systems Technologies (MOCAST

    ArrayFlex: A Systolic Array Architecture with Configurable Transparent Pipelining

    Full text link
    Convolutional Neural Networks (CNNs) are the state-of-the-art solution for many deep learning applications. For maximum scalability, their computation should combine high performance and energy efficiency. In practice, the convolutions of each CNN layer are mapped to a matrix multiplication that includes all input features and kernels of each layer and is computed using a systolic array. In this work, we focus on the design of a systolic array with configurable pipeline with the goal to select an optimal pipeline configuration for each CNN layer. The proposed systolic array, called ArrayFlex, can operate in normal, or in shallow pipeline mode, thus balancing the execution time in cycles and the operating clock frequency. By selecting the appropriate pipeline configuration per CNN layer, ArrayFlex reduces the inference latency of state-of-the-art CNNs by 11%, on average, as compared to a traditional fixed-pipeline systolic array. Most importantly, this result is achieved while using 13%-23% less power, for the same applications, thus offering a combined energy-delay-product efficiency between 1.4x and 1.8x.Comment: DATE 202

    Ambipolar charge injection and transport in a single pentacene monolayer island

    Full text link
    Electrons and holes are locally injected in a single pentacene monolayer island. The two-dimensional distribution and concentration of the injected carriers are measured by electrical force microscopy. In crystalline monolayer islands, both carriers are delocalized over the whole island. On disordered monolayer, carriers stay localized at their injection point. These results provide insight into the electronic properties, at the nanometer scale, of organic monolayers governing performances of organic transistors and molecular devices.Comment: To be published in Nano Letter

    Exciton bimolecular annihilation dynamics in supramolecular nanostructures of conjugated oligomers

    Get PDF
    We present femtosecond transient absorption measurements on π\pi-conjugated supramolecular assemblies in a high pump fluence regime. Oligo(\emph{p}-phenylenevinylene) monofunctionalized with ureido-\emph{s}-triazine (MOPV) self-assembles into chiral stacks in dodecane solution below 75^{\circ}C at a concentration of 4×1044\times 10^{-4} M. We observe exciton bimolecular annihilation in MOPV stacks at high excitation fluence, indicated by the fluence-dependent decay of 111^1Bu_{u}-exciton spectral signatures, and by the sub-linear fluence dependence of time- and wavelength-integrated photoluminescence (PL) intensity. These two characteristics are much less pronounced in MOPV solution where the phase equilibrium is shifted significantly away from supramolecular assembly, slightly below the transition temperature. A mesoscopic rate-equation model is applied to extract the bimolecular annihilation rate constant from the excitation fluence dependence of transient absorption and PL signals. The results demonstrate that the bimolecular annihilation rate is very high with a square-root dependence in time. The exciton annihilation results from a combination of fast exciton diffusion and resonance energy transfer. The supramolecular nanostructures studied here have electronic properties that are intermediate between molecular aggregates and polymeric semiconductors

    On the Munn-Silbey approach to polaron transport with off-diagonal coupling

    Full text link
    Improved results using a method similar to the Munn-Silbey approach have been obtained on the temperature dependence of transport properties of an extended Holstein model incorporating simultaneous diagonal and off-diagonal exciton-phonon coupling. The Hamiltonian is partially diagonalized by a canonical transformation, and optimal transformation coefficients are determined in a self-consistent manner. Calculated transport properties exhibit substantial corrections on those obtained previously by Munn and Silbey for a wide range of temperatures thanks to a numerically exact evaluation and an added momentum-dependence of the transformation matrix. Results on the diffusion coefficient in the moderate and weak coupling regime show distinct band-like and hopping-like transport features as a function of temperature.Comment: 12 pages, 6 figures, accpeted in Journal of Physical Chemistry B: Shaul Mukamel Festschrift (2011

    Nonexponetial relaxation of photoinduced conductance in organic field effect transistor

    Get PDF
    We report detailed studies of the slow relaxation of the photoinduced excess charge carriers in organic metal-insulator-semiconductor field effect transistors consisting of poly(3-hexylthiophene) as the active layer. The relaxation process cannot be physically explained by processes, which lead to a simple or a stretched-exponential decay behavior. Models based on serial relaxation dynamics due to a hierarchy of systems with increasing spatial separation of the photo-generated negative and positive charges are used to explain the results. In order to explain the observed trend, the model is further modified by introducing a gate voltage dependent coulombic distribution manifested by the trapped negative charge carriers.Comment: 17 pages, 3 Figure

    Performance of Monolayer Graphene Nanomechanical Resonators with Electrical Readout

    Full text link
    The enormous stiffness and low density of graphene make it an ideal material for nanoelectromechanical (NEMS) applications. We demonstrate fabrication and electrical readout of monolayer graphene resonators, and test their response to changes in mass and temperature. The devices show resonances in the MHz range. The strong dependence of the resonant frequency on applied gate voltage can be fit to a membrane model, which yields the mass density and built-in strain. Upon removal and addition of mass, we observe changes in both the density and the strain, indicating that adsorbates impart tension to the graphene. Upon cooling, the frequency increases; the shift rate can be used to measure the unusual negative thermal expansion coefficient of graphene. The quality factor increases with decreasing temperature, reaching ~10,000 at 5 K. By establishing many of the basic attributes of monolayer graphene resonators, these studies lay the groundwork for applications, including high-sensitivity mass detectors
    corecore