Search CORE

42 research outputs found

Warp-Aware Adaptive Energy Efficiency Calibration for Multi-GPU Systems

Author: Cheng Lianglun
Song Xiaoyu
Wan Hai
Wang Tao
Wang Zhuowei
Zhao Wuqing
Publication venue: PDXScholar
Publication date: 01/08/2022
Field of study

Massive GPU acceleration processors have been used in high-performance computing systems. The Dennard-scaling has led to power and thermal constraints limiting the performance of such systems. The demand for both increased performance and energy-efficiency is highly desired. This paper presents a multi-layer low-power optimisation method for warps and tasks parallelisms. We present a dynamic frequency regulation scheme for performance parameters in terms of load balance and load imbalance. The method monitors the energy parameters in runtime and adjusts adaptively the voltage level to ensure the performance efficiency with energy reduction. The experimental results show that the multi-layer low-power optimisation with dynamic frequency regulation can achieve 40% energy consumption reduction with only 1.6% performance degradation, thus reducing 59% maximum energy consumption. It can further save about 30% energy consumption in comparison with the single-layer energy optimisation

PDXScholar (Portland State University)

Deactivation Effects of Tb3+ on Ho3+ Emission in Fluoroindate Glasses for 3.9 μm Laser Applications

Author: Cheng Zhuowei
Farrell Gerald
Jia Shijie
Wang Pengfei
Wang Ruicong
Wang Shunbin
Zhang Zhi
Publication venue: Technological University Dublin
Publication date: 01/01/2023
Field of study

A series of Ho3+/Tb3+ co-doped fluoroindate glasses with good thermal stability have been synthesized to study the deactivation effects of Tb3+ on the Ho3+: 3.9 μm emission. Efficient 3.9 μm emission enhancement is obtained under excitation by an 888 nm laser diode (LD). The Judd-Ofelt (J-O) intensity parameters and radiative properties are calculated to evaluate the spectroscopic properties. Possible energy transfer processes resulting in emission reinforcement are discussed. A higher spontaneous transition probability and larger peak emission cross section are achieved with the inclusion of Tb3+. This analysis supports the conclusion that Ho3+/Tb3+ co-doped fluoroindate glass is a potentially useful laser material for highly efficient 3.9 μm fiber lasers

Arrow@TUDublin

Preliminary study: quantification of chronic pain from physiological data.

Author: Cheng Zhuowei,
Publication venue
Publication date: 11/05/2023
Field of study

Ezid

Recommended from our members

Automated Detection of Extracellular Action Potentials Propagation and Short Latency Coupling

Author: Cheng Zhuowei
Publication venue: eScholarship, University of California
Publication date: 01/01/2022
Field of study

Multi-electrode arrays (MEAs) non-invasively record extracellular action potentials (eAPs, also known as spikes) from hundreds of neurons simultaneously. We developed two algorithms that work with recordings from such devices. The first algorithm allows for automated detection of action potential propagation. Since extracellular electrodes sample from the local electrical field, each electrode can detect eAPs from multiple nearby neurons. One method to assign eAPs to their source neurons is to use spike sorting, a computational process that groups eAPs from single `units' based on assumptions of how spike waveforms correlate with different neuronal sources, to interpret spike trains at individual electrodes of high-density arrays. However, when experimental conditions result in changes to eAP waveforms, spike sorting routines may have difficulty correlating eAPs from multiple neurons at single electrodes before and after such waveform changes. We present here a novel, empirical method for unambiguously isolating eAPs from individual, uniquely identifiable neurons, based on automated multi-point detection of action potential propagation. This method is insensitive to changes in eAP waveform morphology because it makes no assumptions about the relationship between spike waveform and neuronal source. Our algorithm for automated detection of action potential propagation produces a `fingerprint' that uniquely identifies those spikes from each source neuron. By unambiguously isolating eAPs from multiple neurons in each recording, on a range of platforms and experimental preparations, our method now enables high-content screening with contemporary MEAs. We outline the limitations and strengths of propagation-based isolation of eAPs from single neurons and propose how our automated method complements spike sorting and could be adapted to in vivo use. Our second algorithm uses the information extracted from the first algorithm to non-invasively detect synaptic relationships among neurons from in vitro networks. Our methods identify short latency spiking relationships between neurons with properties expected of synaptically coupled neurons, namely they were recapitulated by direct stimulation and were sensitive to changing the number of active synaptic sites. Our methods enabled us to assemble a functional subset of neuronal connectivity in our cultures

eScholarship - University of California

DHD-Net: A Novel Deep-Learning-based Dehazing Network

Author: Cheng Lianglun
Wang Hao
Wang Zhuowei
Xie Liangru
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Eliminating haze interference in images is still a challenging problem. In this paper, we consider more systematically the physical hazing mechanisms, combined with deep learning, propose a new end-to-end dehazing network called DHD-Net. For physical hazing mechanisms, we fuse the global atmosphere light, transmission maps, and the atmospheric scattering model for dehazing. For the estimation of global atmosphere light, We propose a deep learning-based haze density estimation algorithm (DL-HDE). We establish a new dataset, of which each data item consists of the hazy image, the transmission map, the haze-free image, and the dense-haze area mask. Our experimental results demonstrate that our proposed DHD-Net has better dehazing performance than state-of-the-art algorithms

NORA - Norwegian Open Research Archives

Activity-Driven Task Allocation in Energy Constrained Heterogeneous Gpus Systems

Author: Cheng Lianglun
Song Xiaoyu
Wang Hao
Wang Zhuowei
Publication venue: PDXScholar
Publication date: 29/11/2020
Field of study

As computing systems continue to increase in complexity, energy optimization plays a key role in the design and implementation of heterogeneous systems. Although the energy consumed by off-chip memory accounts for a large proportion of the total power consumed by the system as a whole, current research on energy optimization mainly focuses on optimizing the energy consumed by the processors. This paper explores the coordinated optimization of the holistic performance of the processors and memory system for heterogeneous systems with energy constraints. A communication-computing pipeline model for parallel executions is characterized to optimize program performance by simultaneously scaling the voltage and frequency of the processors and memory using task allocation strategies. A synergistic load-balancing optimization approach is presented to resolve the load imbalance among graphics processing units. Our experimental results substantiate the effectiveness of the approach in terms of execution times and throughputs with the energy constraints

PDXScholar (Portland State University)

NORA - Norwegian Open Research Archives

Three-level performance optimization for heterogeneous systems based on software prefetching under power constraints

Author: Cheng Lianglun
Wang Hao
Wang Zhuowei
Zhao Wuqing
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

High power consumption has become one of the critical problems restricting the development of high-performance computers. Recently, there are numerous studies on optimizing the execution performance while satisfying the power constraint in recent years. However, these methods mainly focus on homogeneous systems without considering the power or speed difference of heterogeneous processors, so it is difficult to apply these methods in the heterogeneous systems with an accelerator. In this paper, by abstracting the current execution model of a heterogeneous system, we propose a new framework for managing the system power consumption with a three-level power control mechanism. The three levels from top to bottom are: system-level power controller (SPC), group-level power controller (GPC) and unit-level power controller (UPC). The study establishes a power management method for software prefetch in UPC to scale frequency and voltage of programs, select the optimal prefetch distance and guide optimization process to satisfy the constraint boundary according to power constraints. The strategy for dividing power based on key threads is put forward in GPC to preferentially allocate power to threads in key paths. In SPC, a method for evaluating the performance of heterogeneous processing engines is designed for dividing power in order to improve the overall execution performance of the system while sustaining the fairness between concurrent applications. Finally, the proposed framework is verified on a central processing unit (CPU)-graphics processing unit (GPU) heterogeneous system

NORA - Norwegian Open Research Archives

Few-Shot Text Classification with Global–Local Feature Information

Author: Depei Wang
Lianglun Cheng
Weiwen Zhang
Zhuowei Wang
Publication venue: MDPI AG
Publication date: 01/06/2022
Field of study

Meta-learning frameworks have been proposed to generalize machine learning models for domain adaptation without sufficient label data in computer vision. However, text classification with meta-learning is less investigated. In this paper, we propose SumFS to find global top-ranked sentences by extractive summary and improve the local vocabulary category features. The SumFS consists of three modules: (1) an unsupervised text summarizer that removes redundant information; (2) a weighting generator that associates feature words with attention scores to weight the lexical representations of words; (3) a regular meta-learning framework that trains with limited labeled data using a ridge regression classifier. In addition, a marine news dataset was established with limited label data. The performance of the algorithm was tested on THUCnews, Fudan, and marine news datasets. Experiments show that the SumFS can maintain or even improve accuracy while reducing input features. Moreover, the training time of each epoch is reduced by more than 50%

Directory of Open Access Journals

PubMed Central