158 research outputs found

    A Policy-Guided Imitation Approach for Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution generalization but suffer from erroneous off-policy evaluation. Imitation-based methods avoid off-policy evaluation but are too conservative to surpass the dataset. In this study, we propose an alternative approach, inheriting the training stability of imitation-style methods while still allowing logical out-of-distribution generalization. We decompose the conventional reward-maximizing policy in offline RL into a guide-policy and an execute-policy. During training, the guide-poicy and execute-policy are learned using only data from the dataset, in a supervised and decoupled manner. During evaluation, the guide-policy guides the execute-policy by telling where it should go so that the reward can be maximized, serving as the \textit{Prophet}. By doing so, our algorithm allows \textit{state-compositionality} from the dataset, rather than \textit{action-compositionality} conducted in prior imitation-style methods. We dumb this new approach Policy-guided Offline RL (\texttt{POR}). \texttt{POR} demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline RL. We also highlight the benefits of \texttt{POR} in terms of improving with supplementary suboptimal data and easily adapting to new tasks by only changing the guide-poicy.Comment: Oral @ NeurIPS 2022, code at https://github.com/ryanxhr/PO

    Aerobic exercise training at maximal fat oxidation intensity improves body composition, glycemic control, and physical capacity in older people with type 2 diabetes

    Get PDF
    Background: Aerobic training has been used as one of the common treatments for type 2 diabetes; however, further research on the individualized exercise program with the optimal intensity is still necessary. The purpose of this study was to investigate the effects of supervised exercise training at the maximal fat oxidation (FATmax) intensity on body composition, glycemic control, lipid profile, and physical capacity in older people with type 2 diabetes. Methods: Twenty-four women and 25 men with type 2 diabetes, aged 60–69 years. The exercise groups trained at the individualized FATmax intensity for 1 h/day for 3 days/week over 16 weeks. No dietary intervention was introduced during the experimental period. Whole body fat, abdominal fat, oral glucose tolerance test, lipid profile, and physical capacity were measured before and after the interventions. Results: FATmax intensity was at 41.3 ± 3.2% VO2max for women and 46.1 ± 10.3% VO2max for men. Exercise groups obtained significant improvements in body composition, with a special decrease in abdominal obesity; decreased resting blood glucose concentration and HbA1c; and increased VO2max, walking ability, and lower body strength, compared to the non-exercising controls. Daily energy intake and medication remained unchanged for all participants during the experimental period. Conclusion: Beside the improvements in the laboratory variables, the individualized FATmax training can also benefit daily physical capacity of older people with type 2 diabetes

    Comparison of lignocellulose composition in four major species of Miscanthus

    Get PDF
    Miscanthus is a perennial grass rich in lignocellulose that has attracted interest as a non-food crop for renewable bioenergy with major environmental and economic benefits for China. The lignocellulose composition of whole stems of four major species of Miscanthus was assessed. The average values of total moisture content (TMC) (61.90%) and hemicelluloses (34.86%) were the highest while cellulose (32.71%) and acid detergent lignin (ADL) (8.90%) were the lowest in Miscanthus floridulus. On the contrary, the contents of cellulose (42.11%) and ADL (13.64%) were the highest and total ash (TA) (2.89%) was the lowest in Miscanthus lutarioriparius. The Shannon–Weaver diversity indices of components for the four species showed that hemicellulose content (H’= 2.00±0.11) was the most variable trait followed by cellulose (H’= 1.84±0.07), then ADL (H’= 1.84±0.07). The variational range of each component was relatively higher in Miscanthus sacchariflorus. In M. lutarioriparius, the diversity indices of each component were moderate. The diversity of cellulose was the highest and hemicellulose, ADL, TA and TMC were low in Miscanthus sinensis. By correlation analysis, neutral detergent fiber (NDF) significantly and positively correlated with ADF, cellulose and ADL at P<0.01 as well as the relationship of cellulose and ADL in the four species. Hemicellulose showed significant (P<0.01) but negative correlation with cellulose and ADL in M. floridulus, M. lutarioriparius and M. sacchariflorus. By principal component analysis (PCA), the components ADF and cellulose were the PC1 that were considered the foremost for the evaluation and selection of resource in the four species. The conclusions show that lignocellulose composition contents of Miscanthus culms were different. M. floridulus was more fit to ethanol fermentation. Though the components contents in M. sinensis and M. sacchariflorus were moderate, the range of choice was large. It provided a possible means to screen the appropriate materials according to different utilization. M. lutarioriparius had more superiorities relatively. So the four species of Miscanthus were appropriate for extension as excellent herbaceous energy plants, though, reasonable species choice should be employed according to the conversion approach and the growth characteristics, productivity levels and biomass quality characteristics of these tall grasses.Keywords: Miscanthus, bioenergy, lignocellulose compositions, detergent fiber, diversity analysis, PC

    Time dependence of the orthotropic compression Young's moduli and Poisson's ratios of Chinese fir wood

    Get PDF
    The time dependency of the orthotropic compliance for Chinese fir wood [Cunninghamia lanceolata (Lamb.) Hook] has been investigated by performing compressive creep experiments in all orthotropic directions. Time evolution of the creep strain in the axial and lateral directions was recorded by means of the digital image correlation (DIC) technique, to determine the diagonal and nondiagonal elements of the viscoelastic compliance matrix. The results reveal the significant influence of time on the mechanical behavior. The orthotropic nature of the viscoelastic compliance is highlighted by the different time dependency of the Young's moduli and the Poisson's ratios obtained for the individual directions. Differences among the time-dependent stress-strain relationship determined at the 25, 50, and 75% stress levels indicate that the viscoelastic behavior of wood is also load-dependent. A Poisson's ratio values, which are increasing with time in νLR, νLT, νRT, νTR, and decreasing in νRL and νTL, demonstrate that the creep strain is influenced by loading directions. The substantially different time dependency of the nondiagonal elements of the compliance matrix further reveals the orthotropic compliance asymmetry and emphasizes the complexity of the viscoelastic character of wood

    Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

    Full text link
    Most offline reinforcement learning (RL) methods suffer from the trade-off between improving the policy to surpass the behavior policy and constraining the policy to limit the deviation from the behavior policy as computing QQ-values using out-of-distribution (OOD) actions will suffer from errors due to distributional shift. The recently proposed \textit{In-sample Learning} paradigm (i.e., IQL), which improves the policy by quantile regression using only data samples, shows great promise because it learns an optimal policy without querying the value function of any unseen actions. However, it remains unclear how this type of method handles the distributional shift in learning the value function. In this work, we make a key finding that the in-sample learning paradigm arises under the \textit{Implicit Value Regularization} (IVR) framework. This gives a deeper understanding of why the in-sample learning paradigm works, i.e., it applies implicit value regularization to the policy. Based on the IVR framework, we further propose two practical algorithms, Sparse QQ-learning (SQL) and Exponential QQ-learning (EQL), which adopt the same value regularization used in existing works, but in a complete in-sample manner. Compared with IQL, we find that our algorithms introduce sparsity in learning the value function, making them more robust in noisy data regimes. We also verify the effectiveness of SQL and EQL on D4RL benchmark datasets and show the benefits of in-sample learning by comparing them with CQL in small data regimes.Comment: ICLR 2023 notable top 5

    Reassessment of oxidative stress in idiopathic sudden hearing loss and preliminary exploration of the effect of physiological concentration of melatonin on prognosis

    Get PDF
    Background and purposeThe pathogenesis of idiopathic sudden sensorineural hearing loss (ISSNHL) is still unclear, and there is no targeted treatment. This research aimed to verify the role of oxidative stress in ISSNHL and explore whether melatonin has a protective effect on hearing.Materials and methodsA total of 43 patients with ISSNHL and 15 healthy controls were recruited to detect the level of melatonin, reactive oxygen species (ROS), and total antioxidant capacity (TAC) in the blood and compared before and after treatment. Multivariate logistic regression models were performed to assess the factors relevant to the occurrence and improvement of ISSNHL.ResultsThe patients with ISSNHL showed significantly higher ROS levels than controls (4.42 ± 4.40 vs. 2.30 ± 0.59; p = 0.031). The levels of basal melatonin were higher (1400.83 ± 784.89 vs. 1095.97 ± 689.08; p = 0.046) and ROS levels were lower (3.05 ± 1.81 vs. 5.62 ± 5.56; p = 0.042) in the effective group as compared with the ineffective group. Logistic regression analysis showed that melatonin (OR = 0.999, 95% CI 0.997–1.000, p = 0.049), ROS (OR = 1.154, 95% CI 1.025–2.236, p = 0.037), and vertigo (OR = 3.011, 95% CI 1.339–26.983, p = 0.019) were independent factors associated with hearing improvement. Besides, the level of melatonin (OR = 0.999, 95% CI 0.998–1.000, p = 0.023) and ROS (OR = 3.248, 95% CI 1.109–9.516, p = 0.032) were associated with the occurrence of ISSNHL.ConclusionOur findings may suggest oxidative stress involvement in ISSNHL etiopathogenesis. The level of melatonin and ROS, and vertigo appear to be predictive of the effectiveness of hearing improvement following ISSNHL treatment

    Study on the construction deformation of a slotted shield in loess tunnels with different buried depths and large sections

    Get PDF
    Since there is no precedent for the use of slotted shield tunneling in the large section of high-speed railways in China, the relevant technological accumulation and systematic research achievements are few. Therefore, this paper provides theoretical support for loess tunnel construction decision-making through the study of slotted shields and is expected to promote the mechanization and even intelligent construction of a high-speed iron-loess tunnel. Taking the Luochuan tunnel of the Xiyan high-speed railway as the engineering background, this paper uses the numerical simulation software packages of ANSYS and FLAC3D to study the tunnel deformation (surface settlement, vault settlement, tunnel bottom uplift, and horizontal convergence) caused by the slotted shield construction in three different buried depths of 30, 40, and 50 m surrounding rock. The deformation law and mechanical characteristics of a cutter shield construction of large cross-section loess tunnels under the influence of different buried depths are put forward. Results showed that 1) the mutual interference between the working procedures can be significantly reduced by inserting the cutting tool into the soil instead of the advanced tubule before excavation; 2) the settlement in the upper part of the longitudinal axis of the tunnel is the largest; the greater the depth of the tunnel is, the smaller the surface settlement is; and 3) the horizontal deformation of the arch waist and foot of the tunnel under different buried depths is symmetrically distributed into the tunnel during the whole process of slotted shield tunneling
    • …
    corecore