42 research outputs found
Warp-Aware Adaptive Energy Efficiency Calibration for Multi-GPU Systems
Massive GPU acceleration processors have been used in high-performance computing systems. The Dennard-scaling has led to power and thermal constraints limiting the performance of such systems. The demand for both increased performance and energy-efficiency is highly desired. This paper presents a multi-layer low-power optimisation method for warps and tasks parallelisms. We present a dynamic frequency regulation scheme for performance parameters in terms of load balance and load imbalance. The method monitors the energy parameters in runtime and adjusts adaptively the voltage level to ensure the performance efficiency with energy reduction. The experimental results show that the multi-layer low-power optimisation with dynamic frequency regulation can achieve 40% energy consumption reduction with only 1.6% performance degradation, thus reducing 59% maximum energy consumption. It can further save about 30% energy consumption in comparison with the single-layer energy optimisation
Deactivation Effects of Tb3+ on Ho3+ Emission in Fluoroindate Glasses for 3.9 μm Laser Applications
A series of Ho3+/Tb3+ co-doped fluoroindate glasses with good thermal stability have been synthesized to study the deactivation effects of Tb3+ on the Ho3+: 3.9 μm emission. Efficient 3.9 μm emission enhancement is obtained under excitation by an 888 nm laser diode (LD). The Judd-Ofelt (J-O) intensity parameters and radiative properties are calculated to evaluate the spectroscopic properties. Possible energy transfer processes resulting in emission reinforcement are discussed. A higher spontaneous transition probability and larger peak emission cross section are achieved with the inclusion of Tb3+. This analysis supports the conclusion that Ho3+/Tb3+ co-doped fluoroindate glass is a potentially useful laser material for highly efficient 3.9 μm fiber lasers
Recommended from our members
Automated Detection of Extracellular Action Potentials Propagation and Short Latency Coupling
Multi-electrode arrays (MEAs) non-invasively record extracellular action potentials (eAPs, also known as spikes) from hundreds of neurons simultaneously. We developed two algorithms that work with recordings from such devices. The first algorithm allows for automated detection of action potential propagation. Since extracellular electrodes sample from the local electrical field, each electrode can detect eAPs from multiple nearby neurons. One method to assign eAPs to their source neurons is to use spike sorting, a computational process that groups eAPs from single `units' based on assumptions of how spike waveforms correlate with different neuronal sources, to interpret spike trains at individual electrodes of high-density arrays. However, when experimental conditions result in changes to eAP waveforms, spike sorting routines may have difficulty correlating eAPs from multiple neurons at single electrodes before and after such waveform changes. We present here a novel, empirical method for unambiguously isolating eAPs from individual, uniquely identifiable neurons, based on automated multi-point detection of action potential propagation. This method is insensitive to changes in eAP waveform morphology because it makes no assumptions about the relationship between spike waveform and neuronal source. Our algorithm for automated detection of action potential propagation produces a `fingerprint' that uniquely identifies those spikes from each source neuron. By unambiguously isolating eAPs from multiple neurons in each recording, on a range of platforms and experimental preparations, our method now enables high-content screening with contemporary MEAs. We outline the limitations and strengths of propagation-based isolation of eAPs from single neurons and propose how our automated method complements spike sorting and could be adapted to in vivo use. Our second algorithm uses the information extracted from the first algorithm to non-invasively detect synaptic relationships among neurons from in vitro networks. Our methods identify short latency spiking relationships between neurons with properties expected of synaptically coupled neurons, namely they were recapitulated by direct stimulation and were sensitive to changing the number of active synaptic sites. Our methods enabled us to assemble a functional subset of neuronal connectivity in our cultures
DHD-Net: A Novel Deep-Learning-based Dehazing Network
Eliminating haze interference in images is still a challenging problem. In this paper, we consider more systematically the physical hazing mechanisms, combined with deep learning, propose a new end-to-end dehazing network called DHD-Net. For physical hazing mechanisms, we fuse the global atmosphere light, transmission maps, and the atmospheric scattering model for dehazing. For the estimation of global atmosphere light, We propose a deep learning-based haze density estimation algorithm (DL-HDE). We establish a new dataset, of which each data item consists of the hazy image, the transmission map, the haze-free image, and the dense-haze area mask. Our experimental results demonstrate that our proposed DHD-Net has better dehazing performance than state-of-the-art algorithms
Activity-Driven Task Allocation in Energy Constrained Heterogeneous Gpus Systems
As computing systems continue to increase in complexity, energy optimization plays a key role in the design and implementation of heterogeneous systems. Although the energy consumed by off-chip memory accounts for a large proportion of the total power consumed by the system as a whole, current research on energy optimization mainly focuses on optimizing the energy consumed by the processors. This paper explores the coordinated optimization of the holistic performance of the processors and memory system for heterogeneous systems with energy constraints. A communication-computing pipeline model for parallel executions is characterized to optimize program performance by simultaneously scaling the voltage and frequency of the processors and memory using task allocation strategies. A synergistic load-balancing optimization approach is presented to resolve the load imbalance among graphics processing units. Our experimental results substantiate the effectiveness of the approach in terms of execution times and throughputs with the energy constraints
Three-level performance optimization for heterogeneous systems based on software prefetching under power constraints
High power consumption has become one of the critical problems restricting the development of high-performance computers. Recently, there are numerous studies on optimizing the execution performance while satisfying the power constraint in recent years. However, these methods mainly focus on homogeneous systems without considering the power or speed difference of heterogeneous processors, so it is difficult to apply these methods in the heterogeneous systems with an accelerator. In this paper, by abstracting the current execution model of a heterogeneous system, we propose a new framework for managing the system power consumption with a three-level power control mechanism. The three levels from top to bottom are: system-level power controller (SPC), group-level power controller (GPC) and unit-level power controller (UPC). The study establishes a power management method for software prefetch in UPC to scale frequency and voltage of programs, select the optimal prefetch distance and guide optimization process to satisfy the constraint boundary according to power constraints. The strategy for dividing power based on key threads is put forward in GPC to preferentially allocate power to threads in key paths. In SPC, a method for evaluating the performance of heterogeneous processing engines is designed for dividing power in order to improve the overall execution performance of the system while sustaining the fairness between concurrent applications. Finally, the proposed framework is verified on a central processing unit (CPU)-graphics processing unit (GPU) heterogeneous system
Few-Shot Text Classification with Global–Local Feature Information
Meta-learning frameworks have been proposed to generalize machine learning models for domain adaptation without sufficient label data in computer vision. However, text classification with meta-learning is less investigated. In this paper, we propose SumFS to find global top-ranked sentences by extractive summary and improve the local vocabulary category features. The SumFS consists of three modules: (1) an unsupervised text summarizer that removes redundant information; (2) a weighting generator that associates feature words with attention scores to weight the lexical representations of words; (3) a regular meta-learning framework that trains with limited labeled data using a ridge regression classifier. In addition, a marine news dataset was established with limited label data. The performance of the algorithm was tested on THUCnews, Fudan, and marine news datasets. Experiments show that the SumFS can maintain or even improve accuracy while reducing input features. Moreover, the training time of each epoch is reduced by more than 50%