167 research outputs found

    Offline Experience Replay for Continual Offline Reinforcement Learning

    Full text link
    The capability of continuously learning new skills via a sequence of pre-collected offline datasets is desired for an agent. However, consecutively learning a sequence of offline tasks likely leads to the catastrophic forgetting issue under resource-limited scenarios. In this paper, we formulate a new setting, continual offline reinforcement learning (CORL), where an agent learns a sequence of offline reinforcement learning tasks and pursues good performance on all learned tasks with a small replay buffer without exploring any of the environments of all the sequential tasks. For consistently learning on all sequential tasks, an agent requires acquiring new knowledge and meanwhile preserving old knowledge in an offline manner. To this end, we introduced continual learning algorithms and experimentally found experience replay (ER) to be the most suitable algorithm for the CORL problem. However, we observe that introducing ER into CORL encounters a new distribution shift problem: the mismatch between the experiences in the replay buffer and trajectories from the learned policy. To address such an issue, we propose a new model-based experience selection (MBES) scheme to build the replay buffer, where a transition model is learned to approximate the state distribution. This model is used to bridge the distribution bias between the replay buffer and the learned model by filtering the data from offline data that most closely resembles the learned model for storage. Moreover, in order to enhance the ability on learning new tasks, we retrofit the experience replay method with a new dual behavior cloning (DBC) architecture to avoid the disturbance of behavior-cloning loss on the Q-learning process. In general, we call our algorithm offline experience replay (OER). Extensive experiments demonstrate that our OER method outperforms SOTA baselines in widely-used Mujoco environments.Comment: 9 pages, 4 figure

    Nitrogen and Phosphorus Accumulation in Pasture Soil from Repeated Poultry Litter Application

    Get PDF
    Poultry litter (PL) is a traditionally inexpensive and effective fertilizer to improve soil quality and agricultural productivity. However, over application to soil has raised concern because excess nutrients in runoff could accelerate the eutrophication of fresh water. In this work, we determined the contents of total phosphorus (P), Mehlich 3 extracted P, total nitrogen (N), ammonium (NH4)-N, and nitrate (NO3)-N, in pasture soils receiving annual poultry litter applications of 0, 2.27, 2.27, 3.63, and 1.36 Mg/ha/ yr, respectively, for 0, 5, 10, 15, and 20 years. Samples were collected from three soil depths (0–20, 20–40, and 40–60 cm) of the Hartsells series (fine-loamy, siliceous, subactive, thermic, Typic Hapludults) on a 3–8% slope in the Sand Mountain region of north Alabama. PL application increased levels of total P, Mehlich-3 extractable P, and total N significantly. However, the change in NH4-N and NO3-N contents by the PL application was not statistically significant. Correlation analysis indicated that the contents of total P, Mehlich 3 extracted P, and total N were more related to cumulative amounts of poultry litter applied than the years of application or annual application rates alone. This observation suggested that N and P from poultry litter accumulated in soil. Predicting the build-up based on the cumulative amounts of PL application, rather than isolated factors (i.e., application year or rate), would improve the accuracy of evaluating long-term impacts of poultry litter application on soil nutrient levels

    CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected and labeled datasets, which eliminates the time-consuming data collection in online RL. However, offline RL still bears a large burden of specifying/handcrafting extrinsic rewards for each transition in the offline data. As a remedy for the labor-intensive labeling, we propose to endow offline RL tasks with a few expert data and utilize the limited expert data to drive intrinsic rewards, thus eliminating the need for extrinsic rewards. To achieve that, we introduce \textbf{C}alibrated \textbf{L}atent g\textbf{U}idanc\textbf{E} (CLUE), which utilizes a conditional variational auto-encoder to learn a latent space such that intrinsic rewards can be directly qualified over the latent space. CLUE's key idea is to align the intrinsic rewards consistent with the expert intention via enforcing the embeddings of expert data to a calibrated contextual representation. We instantiate the expert-driven intrinsic rewards in sparse-reward offline RL tasks, offline imitation learning (IL) tasks, and unsupervised offline RL tasks. Empirically, we find that CLUE can effectively improve the sparse-reward offline RL performance, outperform the state-of-the-art offline IL baselines, and discover diverse skills from static reward-free offline data

    Beyond Reward: Offline Preference-guided Policy Optimization

    Full text link
    This study focuses on the topic of offline preference-based reinforcement learning (PbRL), a variant of conventional reinforcement learning that dispenses with the need for online interaction or specification of reward functions. Instead, the agent is provided with fixed offline trajectories and human preferences between pairs of trajectories to extract the dynamics and task information, respectively. Since the dynamics and task information are orthogonal, a naive approach would involve using preference-based reward learning followed by an off-the-shelf offline RL algorithm. However, this requires the separate learning of a scalar reward function, which is assumed to be an information bottleneck of the learning process. To address this issue, we propose the offline preference-guided policy optimization (OPPO) paradigm, which models offline trajectories and preferences in a one-step process, eliminating the need for separately learning a reward function. OPPO achieves this by introducing an offline hindsight information matching objective for optimizing a contextual policy and a preference modeling objective for finding the optimal context. OPPO further integrates a well-performing decision policy by optimizing the two objectives iteratively. Our empirical results demonstrate that OPPO effectively models offline preferences and outperforms prior competing baselines, including offline RL algorithms performed over either true or pseudo reward function specifications. Our code is available on the project website: https://sites.google.com/view/oppo-icml-2023

    CEIL: Generalized Contextual Imitation Learning

    Full text link
    In this paper, we present \textbf{C}ont\textbf{E}xtual \textbf{I}mitation \textbf{L}earning~(CEIL), a general and broadly applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight information matching, we derive CEIL by explicitly learning a hindsight embedding function together with a contextual policy using the hindsight embeddings. To achieve the expert matching objective for IL, we advocate for optimizing a contextual variable such that it biases the contextual policy towards mimicking expert behaviors. Beyond the typical learning from demonstrations (LfD) setting, CEIL is a generalist that can be effectively applied to multiple settings including: 1)~learning from observations (LfO), 2)~offline IL, 3)~cross-domain IL (mismatched experts), and 4) one-shot IL settings. Empirically, we evaluate CEIL on the popular MuJoCo tasks (online) and the D4RL dataset (offline). Compared to prior state-of-the-art baselines, we show that CEIL is more sample-efficient in most online IL tasks and achieves better or competitive performances in offline tasks.Comment: NeurIPS 202

    Comparison of Different Transfer Learning Methods for Classification of Mangrove Communities Using MCCUNet and UAV Multispectral Images

    Get PDF
    Mangrove-forest classification by using deep learning algorithms has attracted increasing attention but remains challenging. The current studies on the transfer classification of mangrove communities between different regions and different sensors are especially still unclear. To fill the research gap, this study developed a new deep-learning algorithm (encoder–decoder with mixed depth-wise convolution and cascade upsampling, MCCUNet) by modifying the encoder and decoder sections of the DeepLabV3+ algorithm and presented three transfer-learning strategies, namely frozen transfer learning (F-TL), fine-tuned transfer learning (Ft-TL), and sensor-and-phase transfer learning (SaP-TL), to classify mangrove communities by using the MCCUNet algorithm and high-resolution UAV multispectral images. This study combined the deep-learning algorithms with recursive feature elimination and principal component analysis (RFE–PCA), using a high-dimensional dataset to map and classify mangrove communities, and evaluated their classification performance. The results of this study showed the following: (1) The MCCUNet algorithm outperformed the original DeepLabV3+ algorithm for classifying mangrove communities, achieving the highest overall classification accuracy (OA), i.e., 97.24%, in all scenarios. (2) The RFE–PCA dimension reduction improved the classification performance of deep-learning algorithms. The OA of mangrove species from using the MCCUNet algorithm was improved by 7.27% after adding dimension-reduced texture features and vegetation indices. (3) The Ft-TL strategy enabled the algorithm to achieve better classification accuracy and stability than the F-TL strategy. The highest improvement in the F1–score of Spartina alterniflora was 19.56%, using the MCCUNet algorithm with the Ft-TL strategy. (4) The SaP-TL strategy produced better transfer-learning classifications of mangrove communities between images of different phases and sensors. The highest improvement in the F1–score of Aegiceras corniculatum was 19.85%, using the MCCUNet algorithm with the SaP-TL strategy. (5) All three transfer-learning strategies achieved high accuracy in classifying mangrove communities, with the mean F1–score of 84.37~95.25%

    Collaborative multiple change detection methods for monitoring the spatio-temporal dynamics of mangroves in Beibu Gulf, China

    Get PDF
    Mangrove ecosystems are one of the most diverse and productive marine ecosystems around the world, although losses of global mangrove area have been occurring over the past decades. Therefore, tracking spatio-temporal changes and assessing the current state are essential for mangroves conservation. To solve the issues of inaccurate detection results of single algorithms and those limited to historical change detection, this study proposes the detect–monitor–predict (DMP) framework of mangroves for detecting time-series historical changes, monitoring abrupt near-real-time events, and predicting future trends in Beibu Gulf, China, through the synergetic use of multiple detection change algorithms. This study further developed a method for extracting mangroves using multi-source inter-annual time-series spectral indices images, and evaluated the performance of twenty-one spectral indices for capturing expansion events of mangroves. Finally, this study reveals the spatio-temporal dynamics of mangroves in Beibu Gulf from 1986 to 2021. In this study, we found that our method could extract mangrove growth regions from 1986 to 2021, and achieved 0.887 overall accuracy, which proved that this method is able to rapidly extract large-scale mangroves without field-based samples. We confirmed that the normalized difference vegetation index and tasseled cap angle outperform other spectral indexes in capturing mangrove expansion changes, while enhanced vegetation index and soil-adjusted vegetation index capture the change events with a time delay. This study revealed that mangrove changes displayed historical changes in the hierarchical gradient from land to sea with an average annual expansion of 239.822 ha in the Beibu Gulf during 1986–2021, detected slight improvements and deteriorations of some contemporary mangroves, and predicted 72.778% of mangroves with good growth conditions in the future
    • …
    corecore