112 research outputs found

    Pretraining in Deep Reinforcement Learning: A Survey

    Full text link
    The past few years have seen rapid progress in combining reinforcement learning (RL) with deep learning. Various breakthroughs ranging from games to robotics have spurred the interest in designing sophisticated RL algorithms and systems. However, the prevailing workflow in RL is to learn tabula rasa, which may incur computational inefficiency. This precludes continuous deployment of RL algorithms and potentially excludes researchers without large-scale computing resources. In many other areas of machine learning, the pretraining paradigm has shown to be effective in acquiring transferable knowledge, which can be utilized for a variety of downstream tasks. Recently, we saw a surge of interest in Pretraining for Deep RL with promising results. However, much of the research has been based on different experimental settings. Due to the nature of RL, pretraining in this field is faced with unique challenges and hence requires new design principles. In this survey, we seek to systematically review existing works in pretraining for deep reinforcement learning, provide a taxonomy of these methods, discuss each sub-field, and bring attention to open problems and future directions

    Revisiting Discrete Soft Actor-Critic

    Full text link
    We study the adaption of soft actor-critic (SAC) from continuous action space to discrete action space. We revisit vanilla SAC and provide an in-depth understanding of its Q value underestimation and performance instability issues when applied to discrete settings. We thereby propose entropy-penalty and double average Q-learning with Q-clip to address these issues. Extensive experiments on typical benchmarks with discrete action space, including Atari games and a large-scale MOBA game, show the efficacy of our proposed method. Our code is at:https://github.com/coldsummerday/Revisiting-Discrete-SAC

    Gain scheduled torque compensation of PMSG-based wind turbine for frequency regulation in an isolated grid

    Get PDF
    Frequency stability in an isolated grid can be easily impacted by sudden load or wind speed changes. Many frequency regulation techniques are utilized to solve this problem. However, there are only few studies designing torque compensation controllers based on power performances in different Speed Parts. It is a major challenge for a wind turbine generator (WTG) to achieve the satisfactory compensation performance in different Speed Parts. To tackle this challenge, this paper proposes a gain scheduled torque compensation strategy for permanent magnet synchronous generator (PMSG) based wind turbines. Our main idea is to improve the anti-disturbance ability for frequency regulation by compensating torque based on WTG speed Parts. To achieve higher power reserve in each Speed Part, an enhanced deloading method of WTG is proposed. We develop a new small-signal dynamic model through analyzing the steady-state performances of deloaded WTG in the whole range of wind speed. Subsequently, H∞ theory is leveraged in designing the gain scheduled torque compensation controller to effectively suppress frequency fluctuation. Moreover, since torque compensation brings about untimely power adjustment in over-rated wind speed condition, the conventional speed reference of pitch control system is improved. Our simulation and experimental results demonstrate that the proposed strategy can significantly improve frequency stability and smoothen power fluctuation resulting from wind speed variations. The minimum of frequency deviation with the proposed strategy is improved by up to 0.16 Hz at over-rated wind speed. Our technique can also improve anti-disturbance ability in frequency domain and achieve power balance

    Multiple adaptive model predictive controllers for frequency regulation in wind farms

    Get PDF
    Frequent and inadequate power regulation could significantly impact the main shaft mechanical load and the fatigue of wind turbines, which imposes a stringent requirement to perform frequency regulation. However, the existing work on frequency regulation mainly uses torque compensation to improve the frequency response, while few of them consider the mechanical fatigue of the main shaft caused by torque compensation of the frequency controller. In this paper, the mechanical fatigue of the main shaft can be mitigated in all of the speed sections thanks to the proposed frequency regulation controllers. Precisely, a multiple adaptive model predictive controller (MAMPC), which seamlessly integrates the multiple model predictive control (MMPC) and the real-time AutoRegressive with eXogenous inputs (ARX) model, is proposed. It nicely handles the rate of change in compensation torque to mitigate the mechanical load on the shaft in all of the speed sections. The effectiveness of our method is verified through extensive simulations. With the proposed method, the minimum frequency deviation can be reduced, and the number of fatigue cycles of the main shaft can be extended

    Development and validation of an artificial intelligence-powered acne grading system incorporating lesion identification

    Get PDF
    BackgroundThe management of acne requires the consideration of its severity; however, a universally adopted evaluation system for clinical practice is lacking. Artificial intelligence (AI) evaluation systems hold the promise of enhancing the efficiency and reproducibility of assessments. Artificial intelligence (AI) evaluation systems offer the potential to enhance the efficiency and reproducibility of assessments in this domain. While the identification of skin lesions represents a crucial component of acne evaluation, existing AI systems often overlook lesion identification or fail to integrate it with severity assessment. This study aimed to develop an AI-powered acne grading system and compare its performance with physician image-based scoring.MethodsA total of 1,501 acne patients were included in the study, and standardized pictures were obtained using the VISIA system. The initial evaluation involved 40 stratified sampled frontal photos assessed by seven dermatologists. Subsequently, the three doctors with the highest inter-rater agreement annotated the remaining 1,461 images, which served as the dataset for the development of the AI system. The dataset was randomly divided into two groups: 276 images were allocated for training the acne lesion identification platform, and 1,185 images were used to assess the severity of acne.ResultsThe average precision of our model for skin lesion identification was 0.507 and the average recall was 0.775. The AI severity grading system achieved good agreement with the true label (linear weighted kappa = 0.652). After integrating the lesion identification results into the severity assessment with fixed weights and learnable weights, the kappa rose to 0.737 and 0.696, respectively, and the entire evaluation on a Linux workstation with a Tesla K40m GPU took less than 0.1s per picture.ConclusionThis study developed a system that detects various types of acne lesions and correlates them well with acne severity grading, and the good accuracy and efficiency make this approach potentially an effective clinical decision support tool

    Optimal dispatch based on prediction of distributed electric heating storages in combined electricity and heat networks

    Get PDF
    The volatility of wind power generations could significantly challenge the economic and secure operation of combined electricity and heat networks. To tackle this challenge, this paper proposes a framework of optimal dispatch with distributed electric heating storage based on a correlation-based long short-term memory prediction model. The prediction model of distributed electric heating storage is developed to model its behavior characteristics which are obtained by the autocorrelation and correlation analysis with external factors including weather and time-of-use price. An optimal dispatch model of combined electricity and heat networks is then formulated and resolved by a constraint reduction technique with clustering and classification. Our method is verified through numerous simulations. The results show that, compared with the state-of-the-art techniques of support vector machine and recurrent neural networks, the mean absolute percentage error with the proposed correlation-based long short-term memory can be reduced by 1.009 and 0.481 respectively. Compared with conventional method, the peak wind power curtailment with dispatching distributed electric heating storage is reduced by nearly 30% and 50% in two cases respectively

    Switchable Electrostatically Templated Polymerization

    Get PDF
    We report a switchable, templated polymerization system where the strength of the templating effect can be modulated by solution pH and/or ionic strength. The responsiveness to these cues is incorporated through a dendritic polyamidoamine-based template of which the charge density depends on pH. The dendrimers act as a template for the polymerization of an oppositely charged monomer, namely sodium styrene sulfonate. We show that the rate of polymerization and maximum achievable monomer conversion are directly related to the charge density of the template, and hence the environmental pH. The polymerization could effectively be switched “ON” and “OFF” on demand, by cycling between acidic and alkaline reaction environments. These findings break ground for a novel concept, namely harnessing co-assembly of a template and growing polymer chains with tunable association strength to create and control coupled polymerization and self-assembly pathways of (charged) macromolecular building blocks
    • 

    corecore