120 research outputs found

    Application-Specific Number Representation

    No full text
    Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application- specific number representations. Well-known number formats include fixed-point, floating- point, logarithmic number system (LNS), and residue number system (RNS). Such different number representations lead to different arithmetic designs and error behaviours, thus produc- ing implementations with different performance, accuracy, and cost. To investigate the design options in number representations, the first part of this thesis presents a platform that enables automated exploration of the number representation design space. The second part of the thesis shows case studies that optimise the designs for area, latency or throughput from the perspective of number representations. Automated design space exploration in the first part addresses the following two major issues: ² Automation requires arithmetic unit generation. This thesis provides optimised arithmetic library generators for logarithmic and residue arithmetic units, which support a wide range of bit widths and achieve significant improvement over previous designs. ² Generation of arithmetic units requires specifying the bit widths for each variable. This thesis describes an automatic bit-width optimisation tool called R-Tool, which combines dynamic and static analysis methods, and supports different number systems (fixed-point, floating-point, and LNS numbers). Putting it all together, the second part explores the effects of application-specific number representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic imaging computations. Experimental results show that customising the number representations brings benefits to hardware implementations: by selecting a more appropriate number format, we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%. On the performance side, hardware implementations with customised number formats achieve 5 to potentially over 40 times speedup over software implementations

    Large-Scale Automatic K-Means Clustering for Heterogeneous Many-Core Supercomputer

    Get PDF
    Funding: UK EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R010528/1.This article presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We first introduce a multilevel parallel partition approach that not only partitions by dataflow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufficient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas.PostprintPeer reviewe

    Giant thermal transport tuning at a metal/ferroelectric interface

    Get PDF
    Interfacial thermal transport plays a prominent role in the thermal management of nanoscale objects and is of fundamental importance for basic research and nanodevices. At metal/insulator interfaces, a configuration commonly found in electronic devices, heat transport strongly depends upon the effective energy transfer from thermalized electrons in the metal to the phonons in the insulator. However, the mechanism of interfacial electron–phonon coupling and thermal transport at metal/insulator interfaces is not well understood. Here, the observation of a substantial enhancement of the interfacial thermal resistance and the important role of surface charges at the metal/ferroelectric interface in an Al/BiFeO3 membrane are reported. By applying uniaxial strain, the interfacial thermal resistance can be varied substantially (up to an order of magnitude), which is attributed to the renormalized interfacial electron–phonon coupling caused by the charge redistribution at the interface due to the polarization rotation. These results imply that surface charges at a metal/insulator interface can substantially enhance the interfacial electron–phonon-mediated thermal coupling, providing a new route to optimize the thermal transport performance in next-generation nanodevices, power electronics, and thermal logic devices.Peer ReviewedPostprint (author's final draft

    Spatially heterogeneous shifts in vegetation phenology induced by climate change threaten the integrity of the avian migration network

    Get PDF
    Phenological responses to climate change frequently vary among trophic levels, which can result in increasing asynchrony between the peak energy requirements of consumers and the availability of resources. Migratory birds use multiple habitats with seasonal food resources along migration flyways. Spatially heterogeneous climate change could cause the phenology of food availability along the migration flyway to become desynchronized. Such heterogeneous shifts in food phenology could pose a challenge to migratory birds by reducing their opportunity for food availability along the migration path and consequently influencing their survival and reproduction. We develop a novel graph-based approach to quantify this problem and deploy it to evaluate the condition of the heterogeneous shifts in vegetation phenology for 16 migratory herbivorous waterfowl species in Asia. We show that climate change-induced heterogeneous shifts in vegetation phenology could cause a 12% loss of migration network integrity on average across all study species. Species that winter at relatively lower latitudes are subjected to a higher loss of integrity in their migration network. These findings highlight the susceptibility of migratory species to climate change. Our proposed methodological framework could be applied to migratory species in general to yield an accurate assessment of the exposure under climate change and help to identify actions for biodiversity conservation in the face of climate-related risks

    Validating quantum-supremacy experiments with exact and fast tensor network contraction

    Full text link
    The quantum circuits that declare quantum supremacy, such as Google Sycamore [Nature \textbf{574}, 505 (2019)], raises a paradox in building reliable result references. While simulation on traditional computers seems the sole way to provide reliable verification, the required run time is doomed with an exponentially-increasing compute complexity. To find a way to validate current ``quantum-supremacy" circuits with more than 5050 qubits, we propose a simulation method that exploits the ``classical advantage" (the inherent ``store-and-compute" operation mode of von Neumann machines) of current supercomputers, and computes uncorrelated amplitudes of a random quantum circuit with an optimal reuse of the intermediate results and a minimal memory overhead throughout the process. Such a reuse strategy reduces the original linear scaling of the total compute cost against the number of amplitudes to a sublinear pattern, with greater reduction for more amplitudes. Based on a well-optimized implementation of this method on a new-generation Sunway supercomputer, we directly verify Sycamore by computing three million exact amplitudes for the experimentally generated bitstrings, obtaining an XEB fidelity of 0.191%0.191\% which closely matches the estimated value of 0.224%0.224\%. Our computation scales up to 41,932,80041,932,800 cores with a sustained single-precision performance of 84.884.8 Pflops, which is accomplished within 8.58.5 days. Our method has a far-reaching impact in solving quantum many-body problems, statistical problems as well as combinatorial optimization problems where one often needs to contract many tensor networks which share a significant portion of tensors in common.Comment: 7 pages, 4 figures, comments are welcome
    corecore