167 research outputs found
Offline Experience Replay for Continual Offline Reinforcement Learning
The capability of continuously learning new skills via a sequence of
pre-collected offline datasets is desired for an agent. However, consecutively
learning a sequence of offline tasks likely leads to the catastrophic
forgetting issue under resource-limited scenarios. In this paper, we formulate
a new setting, continual offline reinforcement learning (CORL), where an agent
learns a sequence of offline reinforcement learning tasks and pursues good
performance on all learned tasks with a small replay buffer without exploring
any of the environments of all the sequential tasks. For consistently learning
on all sequential tasks, an agent requires acquiring new knowledge and
meanwhile preserving old knowledge in an offline manner. To this end, we
introduced continual learning algorithms and experimentally found experience
replay (ER) to be the most suitable algorithm for the CORL problem. However, we
observe that introducing ER into CORL encounters a new distribution shift
problem: the mismatch between the experiences in the replay buffer and
trajectories from the learned policy. To address such an issue, we propose a
new model-based experience selection (MBES) scheme to build the replay buffer,
where a transition model is learned to approximate the state distribution. This
model is used to bridge the distribution bias between the replay buffer and the
learned model by filtering the data from offline data that most closely
resembles the learned model for storage. Moreover, in order to enhance the
ability on learning new tasks, we retrofit the experience replay method with a
new dual behavior cloning (DBC) architecture to avoid the disturbance of
behavior-cloning loss on the Q-learning process. In general, we call our
algorithm offline experience replay (OER). Extensive experiments demonstrate
that our OER method outperforms SOTA baselines in widely-used Mujoco
environments.Comment: 9 pages, 4 figure
Nitrogen and Phosphorus Accumulation in Pasture Soil from Repeated Poultry Litter Application
Poultry litter (PL) is a traditionally inexpensive and effective fertilizer to improve soil quality and agricultural productivity. However, over application to soil has raised concern because excess nutrients in runoff could accelerate the eutrophication of fresh water. In this work, we determined the contents of total phosphorus (P), Mehlich 3 extracted P, total nitrogen (N), ammonium (NH4)-N, and nitrate (NO3)-N, in pasture soils receiving annual poultry litter applications of 0, 2.27, 2.27, 3.63, and 1.36 Mg/ha/ yr, respectively, for 0, 5, 10, 15, and 20 years. Samples were collected from three soil depths (0–20, 20–40, and 40–60 cm) of the Hartsells series (fine-loamy, siliceous, subactive, thermic, Typic Hapludults) on a 3–8% slope in the Sand Mountain region of north Alabama. PL application increased levels of total P, Mehlich-3 extractable P, and total N significantly. However, the change in NH4-N and NO3-N contents by the PL application was not statistically significant. Correlation analysis indicated that the contents of total P, Mehlich 3 extracted P, and total N were more related to cumulative amounts of poultry litter applied than the years of application or annual application rates alone. This observation suggested that N and P from poultry litter accumulated in soil. Predicting the build-up based on the cumulative amounts of PL application, rather than isolated factors (i.e., application year or rate), would improve the accuracy of evaluating long-term impacts of poultry litter application on soil nutrient levels
CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning
Offline reinforcement learning (RL) aims to learn an optimal policy from
pre-collected and labeled datasets, which eliminates the time-consuming data
collection in online RL. However, offline RL still bears a large burden of
specifying/handcrafting extrinsic rewards for each transition in the offline
data. As a remedy for the labor-intensive labeling, we propose to endow offline
RL tasks with a few expert data and utilize the limited expert data to drive
intrinsic rewards, thus eliminating the need for extrinsic rewards. To achieve
that, we introduce \textbf{C}alibrated \textbf{L}atent
g\textbf{U}idanc\textbf{E} (CLUE), which utilizes a conditional variational
auto-encoder to learn a latent space such that intrinsic rewards can be
directly qualified over the latent space. CLUE's key idea is to align the
intrinsic rewards consistent with the expert intention via enforcing the
embeddings of expert data to a calibrated contextual representation. We
instantiate the expert-driven intrinsic rewards in sparse-reward offline RL
tasks, offline imitation learning (IL) tasks, and unsupervised offline RL
tasks. Empirically, we find that CLUE can effectively improve the sparse-reward
offline RL performance, outperform the state-of-the-art offline IL baselines,
and discover diverse skills from static reward-free offline data
Beyond Reward: Offline Preference-guided Policy Optimization
This study focuses on the topic of offline preference-based reinforcement
learning (PbRL), a variant of conventional reinforcement learning that
dispenses with the need for online interaction or specification of reward
functions. Instead, the agent is provided with fixed offline trajectories and
human preferences between pairs of trajectories to extract the dynamics and
task information, respectively. Since the dynamics and task information are
orthogonal, a naive approach would involve using preference-based reward
learning followed by an off-the-shelf offline RL algorithm. However, this
requires the separate learning of a scalar reward function, which is assumed to
be an information bottleneck of the learning process. To address this issue, we
propose the offline preference-guided policy optimization (OPPO) paradigm,
which models offline trajectories and preferences in a one-step process,
eliminating the need for separately learning a reward function. OPPO achieves
this by introducing an offline hindsight information matching objective for
optimizing a contextual policy and a preference modeling objective for finding
the optimal context. OPPO further integrates a well-performing decision policy
by optimizing the two objectives iteratively. Our empirical results demonstrate
that OPPO effectively models offline preferences and outperforms prior
competing baselines, including offline RL algorithms performed over either true
or pseudo reward function specifications. Our code is available on the project
website: https://sites.google.com/view/oppo-icml-2023
Evaluation of Ecosystem Services in Strategic Environmental Assessment for River Basin Planning
Water Resources Planning and Managemen
CEIL: Generalized Contextual Imitation Learning
In this paper, we present \textbf{C}ont\textbf{E}xtual \textbf{I}mitation
\textbf{L}earning~(CEIL), a general and broadly applicable algorithm for
imitation learning (IL). Inspired by the formulation of hindsight information
matching, we derive CEIL by explicitly learning a hindsight embedding function
together with a contextual policy using the hindsight embeddings. To achieve
the expert matching objective for IL, we advocate for optimizing a contextual
variable such that it biases the contextual policy towards mimicking expert
behaviors. Beyond the typical learning from demonstrations (LfD) setting, CEIL
is a generalist that can be effectively applied to multiple settings including:
1)~learning from observations (LfO), 2)~offline IL, 3)~cross-domain IL
(mismatched experts), and 4) one-shot IL settings. Empirically, we evaluate
CEIL on the popular MuJoCo tasks (online) and the D4RL dataset (offline).
Compared to prior state-of-the-art baselines, we show that CEIL is more
sample-efficient in most online IL tasks and achieves better or competitive
performances in offline tasks.Comment: NeurIPS 202
Comparison of Different Transfer Learning Methods for Classification of Mangrove Communities Using MCCUNet and UAV Multispectral Images
Mangrove-forest classification by using deep learning algorithms has attracted increasing attention but remains challenging. The current studies on the transfer classification of mangrove communities between different regions and different sensors are especially still unclear. To fill the research gap, this study developed a new deep-learning algorithm (encoder–decoder with mixed depth-wise convolution and cascade upsampling, MCCUNet) by modifying the encoder and decoder sections of the DeepLabV3+ algorithm and presented three transfer-learning strategies, namely frozen transfer learning (F-TL), fine-tuned transfer learning (Ft-TL), and sensor-and-phase transfer learning (SaP-TL), to classify mangrove communities by using the MCCUNet algorithm and high-resolution UAV multispectral images. This study combined the deep-learning algorithms with recursive feature elimination and principal component analysis (RFE–PCA), using a high-dimensional dataset to map and classify mangrove communities, and evaluated their classification performance. The results of this study showed the following: (1) The MCCUNet algorithm outperformed the original DeepLabV3+ algorithm for classifying mangrove communities, achieving the highest overall classification accuracy (OA), i.e., 97.24%, in all scenarios. (2) The RFE–PCA dimension reduction improved the classification performance of deep-learning algorithms. The OA of mangrove species from using the MCCUNet algorithm was improved by 7.27% after adding dimension-reduced texture features and vegetation indices. (3) The Ft-TL strategy enabled the algorithm to achieve better classification accuracy and stability than the F-TL strategy. The highest improvement in the F1–score of Spartina alterniflora was 19.56%, using the MCCUNet algorithm with the Ft-TL strategy. (4) The SaP-TL strategy produced better transfer-learning classifications of mangrove communities between images of different phases and sensors. The highest improvement in the F1–score of Aegiceras corniculatum was 19.85%, using the MCCUNet algorithm with the SaP-TL strategy. (5) All three transfer-learning strategies achieved high accuracy in classifying mangrove communities, with the mean F1–score of 84.37~95.25%
Collaborative multiple change detection methods for monitoring the spatio-temporal dynamics of mangroves in Beibu Gulf, China
Mangrove ecosystems are one of the most diverse and productive marine ecosystems around the world, although losses of global mangrove area have been occurring over the past decades. Therefore, tracking spatio-temporal changes and assessing the current state are essential for mangroves conservation. To solve the issues of inaccurate detection results of single algorithms and those limited to historical change detection, this study proposes the detect–monitor–predict (DMP) framework of mangroves for detecting time-series historical changes, monitoring abrupt near-real-time events, and predicting future trends in Beibu Gulf, China, through the synergetic use of multiple detection change algorithms. This study further developed a method for extracting mangroves using multi-source inter-annual time-series spectral indices images, and evaluated the performance of twenty-one spectral indices for capturing expansion events of mangroves. Finally, this study reveals the spatio-temporal dynamics of mangroves in Beibu Gulf from 1986 to 2021. In this study, we found that our method could extract mangrove growth regions from 1986 to 2021, and achieved 0.887 overall accuracy, which proved that this method is able to rapidly extract large-scale mangroves without field-based samples. We confirmed that the normalized difference vegetation index and tasseled cap angle outperform other spectral indexes in capturing mangrove expansion changes, while enhanced vegetation index and soil-adjusted vegetation index capture the change events with a time delay. This study revealed that mangrove changes displayed historical changes in the hierarchical gradient from land to sea with an average annual expansion of 239.822 ha in the Beibu Gulf during 1986–2021, detected slight improvements and deteriorations of some contemporary mangroves, and predicted 72.778% of mangroves with good growth conditions in the future
- …