10 research outputs found

    Guided Policy Search for Sequential Multitask Learning

    Get PDF
    Policy search in reinforcement learning (RL) is a practical approach to interact directly with environments in parameter spaces, that often deal with dilemmas of local optima and real-time sample collection. A promising algorithm, known as guided policy search (GPS), is capable of handling the challenge of training samples using trajectory-centric methods. It can also provide asymptotic local convergence guarantees. However, in its current form, the GPS algorithm cannot operate in sequential multitask learning scenarios. This is due to its batch-style training requirement, where all training samples are collectively provided at the start of the learning process. The algorithm's adaptation is thus hindered for real-time applications, where training samples or tasks can arrive randomly. In this paper, the GPS approach is reformulated, by adapting a recently proposed, lifelong-learning method, and elastic weight consolidation. Specifically, Fisher information is incorporated to impart knowledge from previously learned tasks. The proposed algorithm, termed sequential multitask learning-GPS, is able to operate in sequential multitask learning settings and ensuring continuous policy learning, without catastrophic forgetting. Pendulum and robotic manipulation experiments demonstrate the new algorithms efficacy to learn control policies for handling sequentially arriving training samples, delivering comparable performance to the traditional, and batch-based GPS algorithm. In conclusion, the proposed algorithm is posited as a new benchmark for the real-time RL and robotics research community

    An 18.9-minute Blue Large-Amplitude Pulsator Crossing the 'Hertzsprung Gap' of Hot Subdwarfs

    Full text link
    Blue large-amplitude pulsators (BLAPs) represent a new and rare class of hot pulsating stars with unusually large amplitudes and short periods. Up to now, only 24 confirmed BLAPs have been identified from more than one billion monitored stars, including a group with pulsation period longer than ∼20\sim 20 min (classical BLAPs, hereafter) and the other group with pulsation period below ∼8\sim 8 min. The evolutionary path that could give rise to such kinds of stellar configurations is unclear. Here we report on a comprehensive study of the peculiar BLAP discovered by the Tsinghua University - Ma Huateng Telescopes for Survey (TMTS), TMTS J035143.63+584504.2 (TMTS-BLAP-1). This new BLAP has an 18.9 min pulsation period and is similar to the BLAPs with a low surface gravity and an extended helium-enriched envelope, suggesting that it is a low-gravity BLAP at the shortest-period end. In particular, the long-term monitoring data reveal that this pulsating star has an unusually large rate of period change, P_dot/P=2.2e-6/yr. Such a significant and positive value challenges its origins from both helium-core pre-white-dwarfs and core helium-burning subdwarfs, but is consistent with that derived from shell helium-burning subdwarfs. The particular pulsation period and unusual rate of period change indicate that TMTS-BLAP-1 is at a short-lived (~10^6 yr) phase of shell-helium ignition before the stable shell-helium burning; in other words, TMTS-BLAP-1 is going through a "Hertzsprung gap" of hot subdwarfs.Comment: 26 pages, 12 figures, 4 tables, published on Nature Astronomy, URL: https://www.nature.com/articles/s41550-022-01783-

    Encoding primitives generation policy learning for robotic arm to overcome catastrophic forgetting in sequential multi-tasks learning

    No full text
    Continual learning, a widespread ability in people and animals, aims to learn and acquire new knowledge and skills continuously. Catastrophic forgetting usually occurs in continual learning when an agent attempts to learn different tasks sequentially without storing or accessing previous task information. Unfortunately, current learning systems, e.g., neural networks, are prone to deviate the weights learned in previous tasks after training new tasks, leading to catastrophic forgetting, especially in a sequential multi-tasks scenario. To address this problem, in this paper, we propose to overcome catastrophic forgetting with the focus on learning a series of robotic tasks sequentially. Particularly, a novel hierarchical neural network’s framework called Encoding Primitives Generation Policy Learning (E-PGPL) is developed to enable continual learning with two components. By employing a variational autoencoder to project the original state space into a meaningful low-dimensional feature space, representative state primitives could be sampled to help learn corresponding policies for different tasks. In learning a new task, the feature space is required to be close to the previous ones so that previously learned tasks can be protected. Extensive experiments on several simulated robotic tasks demonstrate our method’s efficacy to learn control policies for handling sequentially arriving multi-tasks, delivering improvement substantially over some other continual learning methods, especially for the tasks with more diversity

    Somatostatin Neurons in the Basal Forebrain Promote High-Calorie Food Intake

    No full text
    Obesity has become a global issue, and the overconsumption of food is thought to be a major contributor. However, the regulatory neural circuits that regulate palatable food consumption remain unclear. Here, we report that somatostatin (SOM) neurons and GABAergic (VGAT) neurons in the basal forebrain (BF) play specific roles in regulating feeding. Optogenetic stimulation of BF SOM neurons increased fat and sucrose intake within minutes and promoted anxiety-like behaviors. Furthermore, optogenetic stimulation of projections from BF SOM neurons to the lateral hypothalamic area (LHA) selectively resulted in fat intake. In addition, activation of BF VGAT neurons rapidly induced general food intake and gnawing behaviors. Whole-brain mapping of inputs and outputs showed that BF SOM neurons form bidirectional connections with several brain areas important in feeding and regulation of emotion. Collectively, these results suggest that BF SOM neurons play a selective role in hedonic feeding

    Systemic inflammatory response syndrome in patients with severe fever with thrombocytopenia syndrome: prevalence, characteristics, and impact on prognosis

    No full text
    Abstract Background Severe fever with thrombocytopenia syndrome (SFTS) is an emerging zoonosis with a high fatality rate in China. Previous studies have reported that dysregulated inflammatory response is associated with disease pathogenesis and mortality in patients with SFTS. This investigation aimed to evaluate the prevalence and characteristics of systemic inflammatory response syndrome (SIRS), and its impact on prognosis. Methods Data on demographic characteristics, comorbid conditions, clinical manifestations, laboratory parameters, and survival time of patients with SFTS were collected. Patients were divided into the non-SIRS and SIRS groups according to the presence of SIRS, then their clinical data were compared. Results A total of 290 patients diagnosed with SFTS were retrospectively enrolled, including 126(43.4%) patients with SIRS. Patients in the non-survivor group had more prevalence of SIRS than patients in the survivor group (P < 0.001), and SIRS (adjusted OR 2.885, 95% CI 1.226–6.786; P = 0.005) was shown as an independent risk factor for prognosis of patients with SFTS. Compared with patients without SIRS, patients with SIRS had lower WBC and neutrophils counts, and fibrinogen levels, but higher AST, LDH, amylase, lipase, CK, CK-MB, troponin I, APTT, thrombin time, D-dimer, CRP, IL-6, SAA levels, and viral load. The cumulative survival rate of patients with SIRS was significantly lower than that of patients without SIRS. Patients with SIRS also showed a higher incidence of bacterial or fungal infections than patients without SIRS. Conclusions SIRS is highly frequent in patients with SFTS, and it is associated with high mortality

    A comparative study of muscle nutrition and intermuscular bone number in improved diploid carp

    No full text
    The common carp (Cyprinus carpio L., COC, 2n = 100) is one of the most widely consumed and distributed freshwater fish in the world. It ranks fourth in total freshwater aquaculture volume in various regions of China, behind Ctenopharyngodon idella, Hypophthalmichthys molitrix, and Aristichthys nobilis. However, in recent years, environmental degradation, inbreeding, disordered breeding, and other adverse effects have caused problems such as low seed quality and poor disease and stress resistance of the carp. In our laboratory, we developed two types of improved diploid carp the hybrid F1 of common carp (♀) × blunt snout bream (Megalobrama amblycephala, BSB, 2n = 48) (♂). To investigate the differences between IDC and IDMC compared to COC, in this study we compared the relevant characteristics of these two types of improved carp with those of COC in terms of muscle nutrient composition, intermuscular bone type, and number. The results showed that among the muscle nutrients, IDC had a higher protein content (18.50%) and lower carbohydrate content (0.70%). In addition, the unsaturated fatty acid content of IDC (3.45%) and IDMC (1.25%) were significantly higher than that of COC (P < 0.05) (0.50%). For monounsaturated fatty acids such as oleic acid (OA, C18:1 n-9) and linoleic acid (PA, C16:1), the content of IDC and IDMC were also abundant. In terms of intermuscular bone morphology, the morphology and type of intermuscular bone in IDC and IDMC medullary arch ossicles were consistent with those of COC. However, the number of intermuscular bones was different, and the average number of intermuscular bones of IDMC decreased by 14.77% (P < 0.05) compared with COC. The advantages of IDC and IDMC are different, with IDC having a higher value of muscle nutrients and IDMC having a lower intermuscular bone content. The distant hybridization of common carp (♀) × blunt snout bream (♂) developed two improved diploid carp varieties with excellent traits, which added carp germplasm resources with high muscle nutrition and lower intermuscular bone, and provided theoretical support for their production application

    A shock flash breaking out of a dusty red supergiant

    No full text
    Shock-breakout emission is light that arises when a shockwave, generated by the core-collapse explosion of a massive star, passes through its outer envelope. Hitherto, the earliest detection of such a signal was at several hours after the explosion1, although a few others had been reported2-7. The temporal evolution of early light curves should provide insights into the shock propagation, including explosion asymmetry and environment in the vicinity, but this has been hampered by the lack of multiwavelength observations. Here we report the instant multiband observations of a type II supernova (SN 2023ixf) in the galaxy M101 (at a distance of 6.85 ± 0.15 Mpc; ref. 8), beginning at about 1.4 h after the explosion. The exploding star was a red supergiant with a radius of about 440 solar radii. The light curves evolved rapidly, on timescales of 1-2 h, and appeared unusually fainter and redder than predicted by the models9-11 within the first few hours, which we attribute to an optically thick dust shell before it was disrupted by the shockwave. We infer that the breakout and perhaps the distribution of the surrounding dust were not spherically symmetric.6 month embargo; first published 13 December 2023This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
    corecore