42 research outputs found

    Warp-Aware Adaptive Energy Efficiency Calibration for Multi-GPU Systems

    Get PDF
    Massive GPU acceleration processors have been used in high-performance computing systems. The Dennard-scaling has led to power and thermal constraints limiting the performance of such systems. The demand for both increased performance and energy-efficiency is highly desired. This paper presents a multi-layer low-power optimisation method for warps and tasks parallelisms. We present a dynamic frequency regulation scheme for performance parameters in terms of load balance and load imbalance. The method monitors the energy parameters in runtime and adjusts adaptively the voltage level to ensure the performance efficiency with energy reduction. The experimental results show that the multi-layer low-power optimisation with dynamic frequency regulation can achieve 40% energy consumption reduction with only 1.6% performance degradation, thus reducing 59% maximum energy consumption. It can further save about 30% energy consumption in comparison with the single-layer energy optimisation

    Deactivation Effects of Tb3+ on Ho3+ Emission in Fluoroindate Glasses for 3.9 μm Laser Applications

    Get PDF
    A series of Ho3+/Tb3+ co-doped fluoroindate glasses with good thermal stability have been synthesized to study the deactivation effects of Tb3+ on the Ho3+: 3.9 μm emission. Efficient 3.9 μm emission enhancement is obtained under excitation by an 888 nm laser diode (LD). The Judd-Ofelt (J-O) intensity parameters and radiative properties are calculated to evaluate the spectroscopic properties. Possible energy transfer processes resulting in emission reinforcement are discussed. A higher spontaneous transition probability and larger peak emission cross section are achieved with the inclusion of Tb3+. This analysis supports the conclusion that Ho3+/Tb3+ co-doped fluoroindate glass is a potentially useful laser material for highly efficient 3.9 μm fiber lasers

    DHD-Net: A Novel Deep-Learning-based Dehazing Network

    No full text
    Eliminating haze interference in images is still a challenging problem. In this paper, we consider more systematically the physical hazing mechanisms, combined with deep learning, propose a new end-to-end dehazing network called DHD-Net. For physical hazing mechanisms, we fuse the global atmosphere light, transmission maps, and the atmospheric scattering model for dehazing. For the estimation of global atmosphere light, We propose a deep learning-based haze density estimation algorithm (DL-HDE). We establish a new dataset, of which each data item consists of the hazy image, the transmission map, the haze-free image, and the dense-haze area mask. Our experimental results demonstrate that our proposed DHD-Net has better dehazing performance than state-of-the-art algorithms

    Activity-Driven Task Allocation in Energy Constrained Heterogeneous Gpus Systems

    No full text
    As computing systems continue to increase in complexity, energy optimization plays a key role in the design and implementation of heterogeneous systems. Although the energy consumed by off-chip memory accounts for a large proportion of the total power consumed by the system as a whole, current research on energy optimization mainly focuses on optimizing the energy consumed by the processors. This paper explores the coordinated optimization of the holistic performance of the processors and memory system for heterogeneous systems with energy constraints. A communication-computing pipeline model for parallel executions is characterized to optimize program performance by simultaneously scaling the voltage and frequency of the processors and memory using task allocation strategies. A synergistic load-balancing optimization approach is presented to resolve the load imbalance among graphics processing units. Our experimental results substantiate the effectiveness of the approach in terms of execution times and throughputs with the energy constraints

    Three-level performance optimization for heterogeneous systems based on software prefetching under power constraints

    No full text
    High power consumption has become one of the critical problems restricting the development of high-performance computers. Recently, there are numerous studies on optimizing the execution performance while satisfying the power constraint in recent years. However, these methods mainly focus on homogeneous systems without considering the power or speed difference of heterogeneous processors, so it is difficult to apply these methods in the heterogeneous systems with an accelerator. In this paper, by abstracting the current execution model of a heterogeneous system, we propose a new framework for managing the system power consumption with a three-level power control mechanism. The three levels from top to bottom are: system-level power controller (SPC), group-level power controller (GPC) and unit-level power controller (UPC). The study establishes a power management method for software prefetch in UPC to scale frequency and voltage of programs, select the optimal prefetch distance and guide optimization process to satisfy the constraint boundary according to power constraints. The strategy for dividing power based on key threads is put forward in GPC to preferentially allocate power to threads in key paths. In SPC, a method for evaluating the performance of heterogeneous processing engines is designed for dividing power in order to improve the overall execution performance of the system while sustaining the fairness between concurrent applications. Finally, the proposed framework is verified on a central processing unit (CPU)-graphics processing unit (GPU) heterogeneous system

    Few-Shot Text Classification with Global–Local Feature Information

    No full text
    Meta-learning frameworks have been proposed to generalize machine learning models for domain adaptation without sufficient label data in computer vision. However, text classification with meta-learning is less investigated. In this paper, we propose SumFS to find global top-ranked sentences by extractive summary and improve the local vocabulary category features. The SumFS consists of three modules: (1) an unsupervised text summarizer that removes redundant information; (2) a weighting generator that associates feature words with attention scores to weight the lexical representations of words; (3) a regular meta-learning framework that trains with limited labeled data using a ridge regression classifier. In addition, a marine news dataset was established with limited label data. The performance of the algorithm was tested on THUCnews, Fudan, and marine news datasets. Experiments show that the SumFS can maintain or even improve accuracy while reducing input features. Moreover, the training time of each epoch is reduced by more than 50%
    corecore