120 research outputs found
Application-Specific Number Representation
Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application-
specific number representations. Well-known number formats include fixed-point, floating-
point, logarithmic number system (LNS), and residue number system (RNS). Such different
number representations lead to different arithmetic designs and error behaviours, thus produc-
ing implementations with different performance, accuracy, and cost.
To investigate the design options in number representations, the first part of this thesis presents
a platform that enables automated exploration of the number representation design space. The
second part of the thesis shows case studies that optimise the designs for area, latency or
throughput from the perspective of number representations.
Automated design space exploration in the first part addresses the following two major issues:
² Automation requires arithmetic unit generation. This thesis provides optimised
arithmetic library generators for logarithmic and residue arithmetic units, which support
a wide range of bit widths and achieve significant improvement over previous designs.
² Generation of arithmetic units requires specifying the bit widths for each
variable. This thesis describes an automatic bit-width optimisation tool called R-Tool,
which combines dynamic and static analysis methods, and supports different number
systems (fixed-point, floating-point, and LNS numbers).
Putting it all together, the second part explores the effects of application-specific number
representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic
imaging computations. Experimental results show that customising the number representations
brings benefits to hardware implementations: by selecting a more appropriate number format,
we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by
performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%.
On the performance side, hardware implementations with customised number formats achieve
5 to potentially over 40 times speedup over software implementations
Large-Scale Automatic K-Means Clustering for Heterogeneous Many-Core Supercomputer
Funding: UK EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R010528/1.This article presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We first introduce a multilevel parallel partition approach that not only partitions by dataflow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufficient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas.PostprintPeer reviewe
Giant thermal transport tuning at a metal/ferroelectric interface
Interfacial thermal transport plays a prominent role in the thermal management of nanoscale objects and is of fundamental importance for basic research and nanodevices. At metal/insulator interfaces, a configuration commonly found in electronic devices, heat transport strongly depends upon the effective energy transfer from thermalized electrons in the metal to the phonons in the insulator. However, the mechanism of interfacial electron–phonon coupling and thermal transport at metal/insulator interfaces is not well understood. Here, the observation of a substantial enhancement of the interfacial thermal resistance and the important role of surface charges at the metal/ferroelectric interface in an Al/BiFeO3 membrane are reported. By applying uniaxial strain, the interfacial thermal resistance can be varied substantially (up to an order of magnitude), which is attributed to the renormalized interfacial electron–phonon coupling caused by the charge redistribution at the interface due to the polarization rotation. These results imply that surface charges at a metal/insulator interface can substantially enhance the interfacial electron–phonon-mediated thermal coupling, providing a new route to optimize the thermal transport performance in next-generation nanodevices, power electronics, and thermal logic devices.Peer ReviewedPostprint (author's final draft
Spatially heterogeneous shifts in vegetation phenology induced by climate change threaten the integrity of the avian migration network
Phenological responses to climate change frequently vary among trophic levels, which can result in increasing asynchrony between the peak energy requirements of consumers and the availability of resources. Migratory birds use multiple habitats with seasonal food resources along migration flyways. Spatially heterogeneous climate change could cause the phenology of food availability along the migration flyway to become desynchronized. Such heterogeneous shifts in food phenology could pose a challenge to migratory birds by reducing their opportunity for food availability along the migration path and consequently influencing their survival and reproduction. We develop a novel graph-based approach to quantify this problem and deploy it to evaluate the condition of the heterogeneous shifts in vegetation phenology for 16 migratory herbivorous waterfowl species in Asia. We show that climate change-induced heterogeneous shifts in vegetation phenology could cause a 12% loss of migration network integrity on average across all study species. Species that winter at relatively lower latitudes are subjected to a higher loss of integrity in their migration network. These findings highlight the susceptibility of migratory species to climate change. Our proposed methodological framework could be applied to migratory species in general to yield an accurate assessment of the exposure under climate change and help to identify actions for biodiversity conservation in the face of climate-related risks
Validating quantum-supremacy experiments with exact and fast tensor network contraction
The quantum circuits that declare quantum supremacy, such as Google Sycamore
[Nature \textbf{574}, 505 (2019)], raises a paradox in building reliable result
references. While simulation on traditional computers seems the sole way to
provide reliable verification, the required run time is doomed with an
exponentially-increasing compute complexity. To find a way to validate current
``quantum-supremacy" circuits with more than qubits, we propose a
simulation method that exploits the ``classical advantage" (the inherent
``store-and-compute" operation mode of von Neumann machines) of current
supercomputers, and computes uncorrelated amplitudes of a random quantum
circuit with an optimal reuse of the intermediate results and a minimal memory
overhead throughout the process. Such a reuse strategy reduces the original
linear scaling of the total compute cost against the number of amplitudes to a
sublinear pattern, with greater reduction for more amplitudes. Based on a
well-optimized implementation of this method on a new-generation Sunway
supercomputer, we directly verify Sycamore by computing three million exact
amplitudes for the experimentally generated bitstrings, obtaining an XEB
fidelity of which closely matches the estimated value of .
Our computation scales up to cores with a sustained
single-precision performance of Pflops, which is accomplished within
days. Our method has a far-reaching impact in solving quantum many-body
problems, statistical problems as well as combinatorial optimization problems
where one often needs to contract many tensor networks which share a
significant portion of tensors in common.Comment: 7 pages, 4 figures, comments are welcome
- …