198 research outputs found

    A Policy-Guided Imitation Approach for Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution generalization but suffer from erroneous off-policy evaluation. Imitation-based methods avoid off-policy evaluation but are too conservative to surpass the dataset. In this study, we propose an alternative approach, inheriting the training stability of imitation-style methods while still allowing logical out-of-distribution generalization. We decompose the conventional reward-maximizing policy in offline RL into a guide-policy and an execute-policy. During training, the guide-poicy and execute-policy are learned using only data from the dataset, in a supervised and decoupled manner. During evaluation, the guide-policy guides the execute-policy by telling where it should go so that the reward can be maximized, serving as the \textit{Prophet}. By doing so, our algorithm allows \textit{state-compositionality} from the dataset, rather than \textit{action-compositionality} conducted in prior imitation-style methods. We dumb this new approach Policy-guided Offline RL (\texttt{POR}). \texttt{POR} demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline RL. We also highlight the benefits of \texttt{POR} in terms of improving with supplementary suboptimal data and easily adapting to new tasks by only changing the guide-poicy.Comment: Oral @ NeurIPS 2022, code at https://github.com/ryanxhr/PO

    A bibliometric analysis of cerebral palsy from 2003 to 2022

    Get PDF
    PurposeThis bibliometric study explores cerebral palsy (CP) research from 2003 to 2022 to reveal the topic hotspots and collaborations.MethodsWe retrieved studies on CP from the Web of Science Core Collection from 2003 to 2022 and then used CiteSpace and Bibliometrix to perform a bibliometric analysis and attain knowledge mapping, including publication outputs, funding, journals, authors, institutions, countries/territories, keywords, collaborative relationships, and topic hotspots.ResultsIn total, 8,223 articles were published from 2003 to 2022. During this period, the number of publications increased continuously. Developmental Medicine and Child Neurology was the most productive and frequently co-cited journal. Boyd was the most productive and influential author, with 143 publications and 4,011 citations. The United States and Vrije Universiteit Amsterdam were the most productive countries and institutions, respectively. Researchers and institutions from the USA, Australia, and Canada constituted the core research forces, with extensive collaborations worldwide. The most common keywords were gait (553), rehabilitation (440), spasticity (325), botulinum toxin (174), therapy (148), upper extremity (141), quality of life (140), disability (115), pain (98), electromyography (97), kinematics (90), balance (88), participation (85), and walking (79).ConclusionThis study provides a systematic and comprehensive analysis of the CP-related literature. It reveals that Developmental Medicine and Child Neurology is the most active journal in this field. The USA, Vrije Universiteit Amsterdam, and Boyd are the top countries, institutions, and authors, respectively. Emerging treatment methods, complication management, and functional recovery comprise the future research directions and potential topic hotspots for CP

    Effective Action Recognition with Embedded Key Point Shifts

    Full text link
    Temporal feature extraction is an essential technique in video-based action recognition. Key points have been utilized in skeleton-based action recognition methods but they require costly key point annotation. In this paper, we propose a novel temporal feature extraction module, named Key Point Shifts Embedding Module (KPSEMKPSEM), to adaptively extract channel-wise key point shifts across video frames without key point annotation for temporal feature extraction. Key points are adaptively extracted as feature points with maximum feature values at split regions, while key point shifts are the spatial displacements of corresponding key points. The key point shifts are encoded as the overall temporal features via linear embedding layers in a multi-set manner. Our method achieves competitive performance through embedding key point shifts with trivial computational cost, achieving the state-of-the-art performance of 82.05% on Mini-Kinetics and competitive performance on UCF101, Something-Something-v1, and HMDB51 datasets.Comment: 35 pages, 10 figure

    Climatic Signals in Wood Property Variables of Picea Crassifolia

    Get PDF
    Little attention has been given to climatic signals in wood properties. In this study, ring width(RW), annual average microfibril angle (MFA), annual average tracheid radial diameter (TRD), andannual average density (DEN), as the annual and intra-annual wood property variables, were measured at high resolution by SilviScan-3 on dated Picea crassifolia trees. Dendroclimatological methods were used to analyze climatic signals registered in wood property variables. RW, MFA, and TRD negatively correlated with temperature and positively correlated with precipitation in the growing season, whereas the reverse was true for DEN. Climatic signals recorded in the earlywood were similar to those measured for the full width of the annual rings. Climatic signals recorded in latewood were very weak except for latewood MFA. This study showed that wood property variables could be extensive resources for learning more about the influences of climate on tree growth and how trees adapt to ongoing climate change

    Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

    Full text link
    Most offline reinforcement learning (RL) methods suffer from the trade-off between improving the policy to surpass the behavior policy and constraining the policy to limit the deviation from the behavior policy as computing QQ-values using out-of-distribution (OOD) actions will suffer from errors due to distributional shift. The recently proposed \textit{In-sample Learning} paradigm (i.e., IQL), which improves the policy by quantile regression using only data samples, shows great promise because it learns an optimal policy without querying the value function of any unseen actions. However, it remains unclear how this type of method handles the distributional shift in learning the value function. In this work, we make a key finding that the in-sample learning paradigm arises under the \textit{Implicit Value Regularization} (IVR) framework. This gives a deeper understanding of why the in-sample learning paradigm works, i.e., it applies implicit value regularization to the policy. Based on the IVR framework, we further propose two practical algorithms, Sparse QQ-learning (SQL) and Exponential QQ-learning (EQL), which adopt the same value regularization used in existing works, but in a complete in-sample manner. Compared with IQL, we find that our algorithms introduce sparsity in learning the value function, making them more robust in noisy data regimes. We also verify the effectiveness of SQL and EQL on D4RL benchmark datasets and show the benefits of in-sample learning by comparing them with CQL in small data regimes.Comment: ICLR 2023 notable top 5

    AIDX: Adaptive Inference Scheme to Mitigate State-Drift in Memristive VMM Accelerators

    Full text link
    An adaptive inference method for crossbar (AIDX) is presented based on an optimization scheme for adjusting the duration and amplitude of input voltage pulses. AIDX minimizes the long-term effects of memristance drift on artificial neural network accuracy. The sub-threshold behavior of memristor has been modeled and verified by comparing with fabricated device data. The proposed method has been evaluated by testing on different network structures and applications, e.g., image reconstruction and classification tasks. The results showed an average of 60% improvement in convolutional neural network (CNN) performance on CIFAR10 dataset after 10000 inference operations as well as 78.6% error reduction in image reconstruction.Comment: This paper is submitted to IEEE Transactions Circuits and Systems II: Express Brief
    • …
    corecore