Search CORE

120 research outputs found

Application-Specific Number Representation

Author: Fu Haohuan
Fu Haohuan
Publication venue: Computing, Imperial College London
Publication date: 01/02/2009
Field of study

Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application- specific number representations. Well-known number formats include fixed-point, floating- point, logarithmic number system (LNS), and residue number system (RNS). Such different number representations lead to different arithmetic designs and error behaviours, thus produc- ing implementations with different performance, accuracy, and cost. To investigate the design options in number representations, the first part of this thesis presents a platform that enables automated exploration of the number representation design space. The second part of the thesis shows case studies that optimise the designs for area, latency or throughput from the perspective of number representations. Automated design space exploration in the first part addresses the following two major issues: ² Automation requires arithmetic unit generation. This thesis provides optimised arithmetic library generators for logarithmic and residue arithmetic units, which support a wide range of bit widths and achieve significant improvement over previous designs. ² Generation of arithmetic units requires specifying the bit widths for each variable. This thesis describes an automatic bit-width optimisation tool called R-Tool, which combines dynamic and static analysis methods, and supports different number systems (fixed-point, floating-point, and LNS numbers). Putting it all together, the second part explores the effects of application-specific number representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic imaging computations. Experimental results show that customising the number representations brings benefits to hardware implementations: by selecting a more appropriate number format, we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%. On the performance side, hardware implementations with customised number formats achieve 5 to potentially over 40 times speedup over software implementations

Spiral - Imperial College Digital Repository

Large-Scale Automatic K-Means Clustering for Heterogeneous Many-Core Supercomputer

Author: Fu Haohuan
Janjic Vladimir
Liu Pan
Thomson John
Wang Shicai
Yan Xiaohan
Yang Guangwen
Yu Teng
Zhao Wenlai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/12/2019
Field of study

Funding: UK EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R010528/1.This article presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We ﬁrst introduce a multilevel parallel partition approach that not only partitions by dataﬂow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufﬁcient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas.PostprintPeer reviewe

University of Dundee Online Publications

University of St. Andrews - Pure

St Andrews Research Repository

Giant thermal transport tuning at a metal/ferroelectric interface

Author: Cazorla Silva Claudio
Di Chen
Fu Haohuan
Geng Zhiming
Iñiguez Amigot José Ignacio
Nie Yuefeng
Rurali Riccardo
Yan Xuejun
Zang Yipeng
Publication venue: 'Wiley'
Publication date: 22/10/2021
Field of study

Interfacial thermal transport plays a prominent role in the thermal management of nanoscale objects and is of fundamental importance for basic research and nanodevices. At metal/insulator interfaces, a configuration commonly found in electronic devices, heat transport strongly depends upon the effective energy transfer from thermalized electrons in the metal to the phonons in the insulator. However, the mechanism of interfacial electron–phonon coupling and thermal transport at metal/insulator interfaces is not well understood. Here, the observation of a substantial enhancement of the interfacial thermal resistance and the important role of surface charges at the metal/ferroelectric interface in an Al/BiFeO3 membrane are reported. By applying uniaxial strain, the interfacial thermal resistance can be varied substantially (up to an order of magnitude), which is attributed to the renormalized interfacial electron–phonon coupling caused by the charge redistribution at the interface due to the polarization rotation. These results imply that surface charges at a metal/insulator interface can substantially enhance the interfacial electron–phonon-mediated thermal coupling, providing a new route to optimize the thermal transport performance in next-generation nanodevices, power electronics, and thermal logic devices.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Spatially heterogeneous shifts in vegetation phenology induced by climate change threaten the integrity of the avian migration network

Author: Cole Eleanor F
de Boer Willem F
Fu Haohuan
Gong Peng
Sheldon Benjamin C
Si Yali
Wei Jie
Wielstra Ben
Xu Fei
Publication venue: Wiley
Publication date: 18/01/2024
Field of study

Phenological responses to climate change frequently vary among trophic levels, which can result in increasing asynchrony between the peak energy requirements of consumers and the availability of resources. Migratory birds use multiple habitats with seasonal food resources along migration flyways. Spatially heterogeneous climate change could cause the phenology of food availability along the migration flyway to become desynchronized. Such heterogeneous shifts in food phenology could pose a challenge to migratory birds by reducing their opportunity for food availability along the migration path and consequently influencing their survival and reproduction. We develop a novel graph-based approach to quantify this problem and deploy it to evaluate the condition of the heterogeneous shifts in vegetation phenology for 16 migratory herbivorous waterfowl species in Asia. We show that climate change-induced heterogeneous shifts in vegetation phenology could cause a 12% loss of migration network integrity on average across all study species. Species that winter at relatively lower latitudes are subjected to a higher loss of integrity in their migration network. These findings highlight the susceptibility of migratory species to climate change. Our proposed methodological framework could be applied to migratory species in general to yield an accurate assessment of the exposure under climate change and help to identify actions for biodiversity conservation in the face of climate-related risks

Oxford University Research Archive

Leiden University Scholary Publications

Validating quantum-supremacy experiments with exact and fast tensor network contraction

Author: Chen Dexun
Chen Yaojian
Fu Haohuan
Gan Lin
Gao Jiangang
Guo Chu
Liu Xin
Liu Yong
Shi Xinmin
Song Jiawei
Wu Wei
Wu Wenzhao
Yang Guangwen
Publication venue
Publication date: 09/12/2022
Field of study

The quantum circuits that declare quantum supremacy, such as Google Sycamore [Nature \textbf{574}, 505 (2019)], raises a paradox in building reliable result references. While simulation on traditional computers seems the sole way to provide reliable verification, the required run time is doomed with an exponentially-increasing compute complexity. To find a way to validate current ``quantum-supremacy" circuits with more than

50

qubits, we propose a simulation method that exploits the ``classical advantage" (the inherent ``store-and-compute" operation mode of von Neumann machines) of current supercomputers, and computes uncorrelated amplitudes of a random quantum circuit with an optimal reuse of the intermediate results and a minimal memory overhead throughout the process. Such a reuse strategy reduces the original linear scaling of the total compute cost against the number of amplitudes to a sublinear pattern, with greater reduction for more amplitudes. Based on a well-optimized implementation of this method on a new-generation Sunway supercomputer, we directly verify Sycamore by computing three million exact amplitudes for the experimentally generated bitstrings, obtaining an XEB fidelity of

0.191\%

which closely matches the estimated value of

0.224\%

. Our computation scales up to

41,932,800

cores with a sustained single-precision performance of

84.8

Pflops, which is accomplished within

8.5

days. Our method has a far-reaching impact in solving quantum many-body problems, statistical problems as well as combinatorial optimization problems where one often needs to contract many tensor networks which share a significant portion of tensors in common.Comment: 7 pages, 4 figures, comments are welcome

arXiv.org e-Print Archive