193 research outputs found

    A Policy-Guided Imitation Approach for Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution generalization but suffer from erroneous off-policy evaluation. Imitation-based methods avoid off-policy evaluation but are too conservative to surpass the dataset. In this study, we propose an alternative approach, inheriting the training stability of imitation-style methods while still allowing logical out-of-distribution generalization. We decompose the conventional reward-maximizing policy in offline RL into a guide-policy and an execute-policy. During training, the guide-poicy and execute-policy are learned using only data from the dataset, in a supervised and decoupled manner. During evaluation, the guide-policy guides the execute-policy by telling where it should go so that the reward can be maximized, serving as the \textit{Prophet}. By doing so, our algorithm allows \textit{state-compositionality} from the dataset, rather than \textit{action-compositionality} conducted in prior imitation-style methods. We dumb this new approach Policy-guided Offline RL (\texttt{POR}). \texttt{POR} demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline RL. We also highlight the benefits of \texttt{POR} in terms of improving with supplementary suboptimal data and easily adapting to new tasks by only changing the guide-poicy.Comment: Oral @ NeurIPS 2022, code at https://github.com/ryanxhr/PO

    Effective Action Recognition with Embedded Key Point Shifts

    Full text link
    Temporal feature extraction is an essential technique in video-based action recognition. Key points have been utilized in skeleton-based action recognition methods but they require costly key point annotation. In this paper, we propose a novel temporal feature extraction module, named Key Point Shifts Embedding Module (KPSEMKPSEM), to adaptively extract channel-wise key point shifts across video frames without key point annotation for temporal feature extraction. Key points are adaptively extracted as feature points with maximum feature values at split regions, while key point shifts are the spatial displacements of corresponding key points. The key point shifts are encoded as the overall temporal features via linear embedding layers in a multi-set manner. Our method achieves competitive performance through embedding key point shifts with trivial computational cost, achieving the state-of-the-art performance of 82.05% on Mini-Kinetics and competitive performance on UCF101, Something-Something-v1, and HMDB51 datasets.Comment: 35 pages, 10 figure

    Climatic Signals in Wood Property Variables of Picea Crassifolia

    Get PDF
    Little attention has been given to climatic signals in wood properties. In this study, ring width(RW), annual average microfibril angle (MFA), annual average tracheid radial diameter (TRD), andannual average density (DEN), as the annual and intra-annual wood property variables, were measured at high resolution by SilviScan-3 on dated Picea crassifolia trees. Dendroclimatological methods were used to analyze climatic signals registered in wood property variables. RW, MFA, and TRD negatively correlated with temperature and positively correlated with precipitation in the growing season, whereas the reverse was true for DEN. Climatic signals recorded in the earlywood were similar to those measured for the full width of the annual rings. Climatic signals recorded in latewood were very weak except for latewood MFA. This study showed that wood property variables could be extensive resources for learning more about the influences of climate on tree growth and how trees adapt to ongoing climate change

    Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

    Full text link
    Most offline reinforcement learning (RL) methods suffer from the trade-off between improving the policy to surpass the behavior policy and constraining the policy to limit the deviation from the behavior policy as computing QQ-values using out-of-distribution (OOD) actions will suffer from errors due to distributional shift. The recently proposed \textit{In-sample Learning} paradigm (i.e., IQL), which improves the policy by quantile regression using only data samples, shows great promise because it learns an optimal policy without querying the value function of any unseen actions. However, it remains unclear how this type of method handles the distributional shift in learning the value function. In this work, we make a key finding that the in-sample learning paradigm arises under the \textit{Implicit Value Regularization} (IVR) framework. This gives a deeper understanding of why the in-sample learning paradigm works, i.e., it applies implicit value regularization to the policy. Based on the IVR framework, we further propose two practical algorithms, Sparse QQ-learning (SQL) and Exponential QQ-learning (EQL), which adopt the same value regularization used in existing works, but in a complete in-sample manner. Compared with IQL, we find that our algorithms introduce sparsity in learning the value function, making them more robust in noisy data regimes. We also verify the effectiveness of SQL and EQL on D4RL benchmark datasets and show the benefits of in-sample learning by comparing them with CQL in small data regimes.Comment: ICLR 2023 notable top 5

    AIDX: Adaptive Inference Scheme to Mitigate State-Drift in Memristive VMM Accelerators

    Full text link
    An adaptive inference method for crossbar (AIDX) is presented based on an optimization scheme for adjusting the duration and amplitude of input voltage pulses. AIDX minimizes the long-term effects of memristance drift on artificial neural network accuracy. The sub-threshold behavior of memristor has been modeled and verified by comparing with fabricated device data. The proposed method has been evaluated by testing on different network structures and applications, e.g., image reconstruction and classification tasks. The results showed an average of 60% improvement in convolutional neural network (CNN) performance on CIFAR10 dataset after 10000 inference operations as well as 78.6% error reduction in image reconstruction.Comment: This paper is submitted to IEEE Transactions Circuits and Systems II: Express Brief

    Weaning Induced Hepatic Oxidative Stress, Apoptosis, and Aminotransferases through MAPK Signaling Pathways in Piglets

    Get PDF
    This study investigated the effects of weaning on the hepatic redox status, apoptosis, function, and the mitogen-activated protein kinase (MAPK) signaling pathways during the first week after weaning in piglets. A total of 12 litters of piglets were weaned at d 21 and divided into the weaning group (WG) and the control group (CG). Six piglets from each group were slaughtered at d 0 (d 20, referred to weaning), d 1, d 4, and d 7 after weaning. Results showed that weaning significantly increased the concentrations of hepatic free radicals H2O2 and NO, malondialdehyde (MDA), and 8-hydroxy-2′-deoxyguanosine (8-OHdG), while significantly decreasing the inhibitory hydroxyl ability (IHA) and glutathione peroxidase (GSH-Px), and altered the level of superoxide dismutase (SOD). The apoptosis results showed that weaning increased the concentrations of caspase-3, caspase-8, caspase-9 and the ratio of Bax/Bcl-2. In addition, aspartate aminotransferase transaminase (AST) and alanine aminotransferase (ALT) in liver homogenates increased after weaning. The phosphorylated JNK and ERK1/2 increased, while the activated p38 initially decreased and then increased. Our results suggested that weaning increased the hepatic oxidative stress and aminotransferases and initiated apoptosis, which may be related to the activated MAPK pathways in postweaning piglets
    • …
    corecore