60 research outputs found

    OCCL: a Deadlock-free Library for GPU Collective Communication

    Full text link
    Various distributed deep neural network (DNN) training technologies lead to increasingly complicated use of collective communications on GPU. The deadlock-prone collectives on GPU force researchers to guarantee that collectives are enqueued in a consistent order on each GPU to prevent deadlocks. In complex distributed DNN training scenarios, manual hardcoding is the only practical way for deadlock prevention, which poses significant challenges to the development of artificial intelligence. This paper presents OCCL, which is, to the best of our knowledge, the first deadlock-free collective communication library for GPU supporting dynamic decentralized preemption and gang-scheduling for collectives. Leveraging the preemption opportunity of collectives on GPU, OCCL dynamically preempts collectives in a decentralized way via the deadlock-free collective execution framework and allows dynamic decentralized gang-scheduling via the stickiness adjustment scheme. With the help of OCCL, researchers no longer have to struggle to get all GPUs to launch collectives in a consistent order to prevent deadlocks. We implement OCCL with several optimizations and integrate OCCL with a distributed deep learning framework OneFlow. Experimental results demonstrate that OCCL achieves comparable or better latency and bandwidth for collectives compared to NCCL, the state-of-the-art. When used in distributed DNN training, OCCL can improve the peak training throughput by up to 78% compared to statically sequenced NCCL, while introducing overheads of less than 6.5% across various distributed DNN training approaches

    Improved physiological and morphological traits of root synergistically enhanced salinity tolerance in rice under appropriate nitrogen application rate

    Get PDF
    Numerous papers studied the relations between nitrogen rate and rice yield in saline soils, whereas the rice root morphological and physiological characteristics mediating nitrogen rates in yield formation under varied salinity levels remain less concerns. Through a field experiment applied with five nitrogen rates (0, 210, 255, 300, 345, and 390 kg ha–1) in saline land, we found that rice yield peaked at 7.7 t ha–1 under 300 kg ha–1 nitrogen, and excessive N was not conductive for increasing yield. To further elucidate its internal physiological mechanism, a pot experiment was designed with three N rates (210 [N1], 300 [N2], 390 [N3] kg ha–1) and three salt concentrations (0 [S0], 1.5 [S1], 3.0 [S2] g kg–1 NaCl). Results showed that the average grain yield was decreased by 19.1 and 51.1% under S1 and S2, respectively, while notably increased by 18.5 and 14.5% under N2 and N3, respectively. Salinity stress significantly inhibited root biomass, root length and surface area, root oxidation capacity (ROC), K+ and K+/Na+ ratio, and nitrogen metabolism-related enzyme activities, whereas root Na+ and antioxidant enzyme activities were notably increased. The mechanism of how insufficient N supply (N1) affected rice yield formation was consistent at different salinity levels, which displayed adverse impacts on root morphological and physiological traits, thereby significantly inhibiting leaf photosynthesis and grain yield of rice. However, the mechanism thorough which excessive N (N3) affected yield formation was quite different under varied salinity levels. Under lower salinity (S0 and S1), no significant differences on root morphological traits and grain yield were observed except the significantly decline in activities of NR and GS between N3 and N2 treatments. Under higher salinity level (S2), the decreased ROC, K+/Na+ ratio due to increased Na+, antioxidant enzyme activities, and NR and GS activities were the main reason leading to undesirable root morphological traits and leaf photosynthesis, which further triggered decreased grain yield under N3 treatment, compared to that under N2 treatment. Overall, our results suggest that improved physiological and morphological traits of root synergistically enhanced salinity tolerance in rice under appropriate nitrogen application rate

    Deep reinforcement learning for intractable routing & inverse problems

    No full text
    Solving intractable problems with huge/infinite solution space is challenging and has motivated much research. Classical methods mainly focus on fast searching via either approximation or (meta)heuristics with the help of some regularizers. However, neither the solution quality nor inference time is satisfying. Recently, a popular trend is to leverage deep learning to learn to solve intractable problems and much impressive progress has been achieved with good solution quality and fast inference. Among the learning-based ones, deep reinforcement learning (DRL) based ones show superiority, since they learn a more flexible policy with less supervision. Many exciting achievements can be found in board games, video games, robotics. However, most of the current methods are proposed for some specific tasks with practical settings neglected. To push DRL one step forward to real-life applications, we propose a paradigm that can learn to solve a wider range of intractable problems and attempt to provide an instruction and insight on how to systematically learn to solve more practical intractable problems via DRL. Following the proposed paradigm, we proposed four frameworks for four practical intractable problems, namely travelling salesman problem with time window and rejection (TSPTWR), multiple TSPTWR (mTSPTWR), robust image denoising and customized low-light image enhancement respectively. Particularly, different from the counterparts, where the deep neural network (DNN) is the main concern, in our paradigm, the modelling of Markov decision process (MDP), and the design of action and reward are also studied. By doing so, we are able to flexibly circumvent the complex design of DNN and make good use of existing DRL based methods to more practical problems. Extensive experiments show that our proposed frameworks can outperform both classical and learning-based baselines for these applications. The success of these four applications demonstrates that our proposed paradigm is a general and promising solution to solve intractable problems efficiently. In the end, we conclude this thesis and point out some interesting directions that could be followed as future work.Doctor of Philosoph

    Controller design and simulation for heavy duty vehicle platooning

    No full text
    Driving Heavy Duty Vehicles (HDVs) as a platoon has potential to significantly reduce the fuel consumption and human labor, meanwhile increasing the safety. A suitable controller which can maintain the platoon moving in a certain topology plays a pivotal role in HDV platooning. In this dissertation, first a conventional PID controller for a longitudinal HDV platoon is designed and tuned for controlling the heavy duty vehicles moving in a constant distance topology or a constant distance & headway time topology. Further, an attempt is performed to design a controller using Model Predictive Control(MPC) which has the same control objectives as the PID controller for two scenarios, namely unconstrained and constrained optimization problem. Comparative simulation studies were carried out to test the performance of both the controllers with the help of MATLAB and VISSIM 8. A U.S. freeway I5 is built in VISSIM 8 and a 14-vehicle platoon is generated on this road network. VISSIM 8 gives the information of each vehicle to MATLAB, and using all the information, MATLAB calculates the desired acceleration and feeds it back to VISSIM 8 using these two control methods. Based on the simulation results, a comparison between the PID controller and MPC controller is given. The PID controller need less computational time but cannot handle with constrains. The MPC controller needs more time to solve the optimization problem, but a systematic handling of constraints yields significant improvements in the performance of the proposed MPC over PID controller. Moreover, the proposed MPC can have a good performance without too much tuning, which is better than the PID controller. Hence, according to the above mentioned advantages and disadvantages a Shiftable hybrid-controller is generalized in the end.Master of Science (Computer Control and Automation

    Study of moment-based MPC formulations and their connection to classical control

    No full text
    Model predictive control (MPC) is a well-established modern control technology used in diverse applications to provide (sub)optimal operating conditions while incorporating safety and performance constraints. In MPC, the control action at the current time instant is obtained by solving a finite horizon optimal control problem according to the forecasts of the future process behavior. Hence the quality/validity of predictions generated from process models determine the performance of these controllers ([1]). Although models based on first principles are expected to provide better predictions for a wider range of operating conditions ([2]), the model predictions should also incorporate the effects of uncertain deviations to overcome the adverse effects. To this end, robust model predictive control techniques are developed in order to reduce the effect of uncertainty ([3,4]) in dynamical processes

    Does an AI Streamer Have Feelings?The Influence of the Positive Emotions of AI Streamer on Consumers’ Purchase Intention

    Get PDF
    Streamers have become significant participants in pushing sales as the livestreaming commerce sector is growing rapidly. However, livestreaming commerce businesses are currently constrained by the high expense of human streamers and their limited online time. As a result, the AI streamers began to enter the studio, resulting in increased trading chances. But can AI streamers satisfy consumers and produce good sales performance? In this paper, projecting from both emotional and cognitive perspectives, we examine the impacts of AI streamer’s positive emotions on consumers’ purchase intention. This study has the potential to provide guidance for the development of livestreaming commerce platform and AI streamer design technology

    Feature Optimization of EEG Signals Based on Ant Colony Algorithm

    Get PDF
    EEG signal can be understood as a kind of bioelectrical signal, which can reflect emotional information when the body is in different emotional states. However, the data collected are often high-dimensional. including many irrelevant or redundant features. The high-dimensional features make the space cost increase exponentially, which brings many difficulties to the research. Ant colony optimization algorithm, a swarm intelligence algorithm, can be used for feature selection. Ant colony optimization algorithm is used for feature selection of EEG signals. The feature subset to be selected is trained cooperatively and learned actively. The classification accuracy is evaluated through convolutional neural network, and the optimal subset is selected from the iterative local optimal solution. The results show that the ant colony optimization algorithm can effectively reduce the time complexity and calculation cost, Improve the accuracy of classification

    Particle Directional Conveyance under Longitudinal Vibration by considering the Trough Surface Texture: Numerical Simulation Based on the Discrete Element Method

    Get PDF
    Particles can move directionally in a trough with finlike asperities under longitudinal vibrations. Here, we present an analysis of the particle conveyance mechanism and the influence of the asperity shape on the particle conveyance capacity by employing a numerical simulation based on the discrete element method (DEM). A dynamic-static matching method is proposed to characterize the three microcontact parameters in the simulation: the restitution coefficient, static friction coefficient, and rolling friction coefficient. The simulation shows that the asymmetric force induced by the finlike asperities and its cumulative effect over time lead to the particle directional conveyance. The conveyance velocity increases with increasing vibration time and is related to the median coordination number. The asperity height and slope inclination angles determine the trough shape and distance between two asperities directly. An undersized or oversized distance reduces the steady conveyance velocity. We find the optimal distance to be between one and two particle diameters
    • …
    corecore