322,543 research outputs found

    CrossNorm: Normalization for Off-Policy TD Reinforcement Learning

    Full text link
    Off-policy temporal difference (TD) methods are a powerful class of reinforcement learning (RL) algorithms. Intriguingly, deep off-policy TD algorithms are not commonly used in combination with feature normalization techniques, despite positive effects of normalization in other domains. We show that naive application of existing normalization techniques is indeed not effective, but that well-designed normalization improves optimization stability and removes the necessity of target networks. In particular, we introduce a normalization based on a mixture of on- and off-policy transitions, which we call cross-normalization. It can be regarded as an extension of batch normalization that re-centers data for two different distributions, as present in off-policy learning. Applied to DDPG and TD3, cross-normalization improves over the state of the art across a range of MuJoCo benchmark tasks

    Local-Global Temporal Difference Learning for Satellite Video Super-Resolution

    Full text link
    Optical-flow-based and kernel-based approaches have been widely explored for temporal compensation in satellite video super-resolution (VSR). However, these techniques involve high computational consumption and are prone to fail under complex motions. In this paper, we proposed to exploit the well-defined temporal difference for efficient and robust temporal compensation. To fully utilize the temporal information within frames, we separately modeled the short-term and long-term temporal discrepancy since they provide distinctive complementary properties. Specifically, a short-term temporal difference module is designed to extract local motion representations from residual maps between adjacent frames, which provides more clues for accurate texture representation. Meanwhile, the global dependency in the entire frame sequence is explored via long-term difference learning. The differences between forward and backward segments are incorporated and activated to modulate the temporal feature, resulting in holistic global compensation. Besides, we further proposed a difference compensation unit to enrich the interaction between the spatial distribution of the target frame and compensated results, which helps maintain spatial consistency while refining the features to avoid misalignment. Extensive objective and subjective evaluation of five mainstream satellite videos demonstrates that the proposed method performs favorably for satellite VSR. Code will be available at \url{https://github.com/XY-boy/TDMVSR}Comment: Submitted to IEEE TCSV

    Neural correlates of implicit knowledge about statistical regularities

    Get PDF
    In this study, we examined the neural correlates of implicit knowledge about statistical regularities of temporal order and item chunks using functional magnetic resonance imaging (fMRI). In a familiarization scan, participants viewed a stream of scenes consisting of structured (i.e., three scenes were always presented in the same order) and random triplets. In the subsequent test scan, participants were required to detect a target scene. Test sequences included both forward order of scenes presented during the familiarization scan and backward order of scenes (i.e., reverse order of forward scenes). Behavioral results showed a learning effect of temporal order in the forward condition and scene chunks in the backward condition. fMRI data from the familiarization scan showed the difference of activations between the structured and random blocks in the left posterior cingulate cortex including the retrosplenial cortex. More important, in the test scan, we observed brain activities in the left parietal lobe when participants detected target scenes on temporal order information. In contrast, the left precuneus activated when participants detected target scenes based on scene chunks. Our findings help clarify the brain mechanisms of implicit knowledge about acquired regularities
    corecore