2,323 research outputs found
DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation
In this paper, we propose a novel approach called DIffusion-guided DIversity
(DIDI) for offline behavioral generation. The goal of DIDI is to learn a
diverse set of skills from a mixture of label-free offline data. We achieve
this by leveraging diffusion probabilistic models as priors to guide the
learning process and regularize the policy. By optimizing a joint objective
that incorporates diversity and diffusion-guided regularization, we encourage
the emergence of diverse behaviors while maintaining the similarity to the
offline data. Experimental results in four decision-making domains (Push,
Kitchen, Humanoid, and D4RL tasks) show that DIDI is effective in discovering
diverse and discriminative skills. We also introduce skill stitching and skill
interpolation, which highlight the generalist nature of the learned skill
space. Further, by incorporating an extrinsic reward function, DIDI enables
reward-guided behavior generation, facilitating the learning of diverse and
optimal behaviors from sub-optimal data.Comment: ICML202
An Effective Software Risk Prediction Management Analysis of Data Using Machine Learning and Data Mining Method
For one to guarantee higher-quality software development processes, risk
management is essential. Furthermore, risks are those that could negatively
impact an organization's operations or a project's progress. The appropriate
prioritisation of software project risks is a crucial factor in ascertaining
the software project's performance features and eventual success. They can be
used harmoniously with the same training samples and have good complement and
compatibility. We carried out in-depth tests on four benchmark datasets to
confirm the efficacy of our CIA approach in closed-world and open-world
scenarios, with and without defence. We also present a sequential augmentation
parameter optimisation technique that captures the interdependencies of the
latest deep learning state-of-the-art WF attack models. To achieve precise
software risk assessment, the enhanced crow search algorithm (ECSA) is used to
modify the ANFIS settings. Solutions that very slightly alter the local optimum
and stay inside it are extracted using the ECSA. ANFIS variable when utilising
the ANFIS technique. An experimental validation with NASA 93 dataset and 93
software project values was performed. This method's output presents a clear
image of the software risk elements that are essential to achieving project
performance. The results of our experiments show that, when compared to other
current methods, our integrative fuzzy techniques may perform more accurately
and effectively in the evaluation of software project risks
Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning
It is of significance for an agent to learn a widely applicable and
general-purpose policy that can achieve diverse goals including images and text
descriptions. Considering such perceptually-specific goals, the frontier of
deep reinforcement learning research is to learn a goal-conditioned policy
without hand-crafted rewards. To learn this kind of policy, recent works
usually take as the reward the non-parametric distance to a given goal in an
explicit embedding space. From a different viewpoint, we propose a novel
unsupervised learning approach named goal-conditioned policy with intrinsic
motivation (GPIM), which jointly learns both an abstract-level policy and a
goal-conditioned policy. The abstract-level policy is conditioned on a latent
variable to optimize a discriminator and discovers diverse states that are
further rendered into perceptually-specific goals for the goal-conditioned
policy. The learned discriminator serves as an intrinsic reward function for
the goal-conditioned policy to imitate the trajectory induced by the
abstract-level policy. Experiments on various robotic tasks demonstrate the
effectiveness and efficiency of our proposed GPIM method which substantially
outperforms prior techniques.Comment: Accepted by AAAI-2
CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning
Offline reinforcement learning (RL) aims to learn an optimal policy from
pre-collected and labeled datasets, which eliminates the time-consuming data
collection in online RL. However, offline RL still bears a large burden of
specifying/handcrafting extrinsic rewards for each transition in the offline
data. As a remedy for the labor-intensive labeling, we propose to endow offline
RL tasks with a few expert data and utilize the limited expert data to drive
intrinsic rewards, thus eliminating the need for extrinsic rewards. To achieve
that, we introduce \textbf{C}alibrated \textbf{L}atent
g\textbf{U}idanc\textbf{E} (CLUE), which utilizes a conditional variational
auto-encoder to learn a latent space such that intrinsic rewards can be
directly qualified over the latent space. CLUE's key idea is to align the
intrinsic rewards consistent with the expert intention via enforcing the
embeddings of expert data to a calibrated contextual representation. We
instantiate the expert-driven intrinsic rewards in sparse-reward offline RL
tasks, offline imitation learning (IL) tasks, and unsupervised offline RL
tasks. Empirically, we find that CLUE can effectively improve the sparse-reward
offline RL performance, outperform the state-of-the-art offline IL baselines,
and discover diverse skills from static reward-free offline data
Modeling and Analyzing for the Friction Torque of a Sliding Bearing Based on Grey System Theory
Based on the grey system theory, the grey relational analysis method is proposed and used in analyzing the influence of various factors on the friction torque of a sliding bearing. On the basis of the grey relational analysis the multidimensional grey model GM(1,N,D) for the friction torque of a sliding bearing is built up. Taking Al-based alloy sliding bearing as an example, the calculation results show that, compared with other influence factors, friction coefficient, load, temperature and rotational speed have more significant influence on the bearing friction torque.Comparingexperimental results and the calculated value of the GM(1,N,D) model based on these important influence factors, the maximum relative residuals is 9.09%, the average relative residuals is 7.9% and the accuracy is 92.1%. It verify that GM(1,N,D) model has good accuracy and is applicable for predicting friction torque of a sliding bearing
Design from Policies: Conservative Test-Time Adaptation for Offline Policy Optimization
In this work, we decouple the iterative bi-level offline RL (value estimation
and policy extraction) from the offline training phase, forming a non-iterative
bi-level paradigm and avoiding the iterative error propagation over two levels.
Specifically, this non-iterative paradigm allows us to conduct inner-level
optimization (value estimation) in training, while performing outer-level
optimization (policy extraction) in testing. Naturally, such a paradigm raises
three core questions that are not fully answered by prior non-iterative offline
RL counterparts like reward-conditioned policy: (q1) What information should we
transfer from the inner-level to the outer-level? (q2) What should we pay
attention to when exploiting the transferred information for safe/confident
outer-level optimization? (q3) What are the benefits of concurrently conducting
outer-level optimization during testing? Motivated by model-based optimization
(MBO), we propose DROP (design from policies), which fully answers the above
questions. Specifically, in the inner-level, DROP decomposes offline data into
multiple subsets, and learns an MBO score model (a1). To keep safe exploitation
to the score model in the outer-level, we explicitly learn a behavior embedding
and introduce a conservative regularization (a2). During testing, we show that
DROP permits deployment adaptation, enabling an adaptive inference across
states (a3). Empirically, we evaluate DROP on various tasks, showing that DROP
gains comparable or better performance compared to prior methods.Comment: NeurIPS 202
Improving Offline-to-Online Reinforcement Learning with Q Conditioned State Entropy Exploration
Studying how to fine-tune offline reinforcement learning (RL) pre-trained
policy is profoundly significant for enhancing the sample efficiency of RL
algorithms. However, directly fine-tuning pre-trained policies often results in
sub-optimal performance. This is primarily due to the distribution shift
between offline pre-training and online fine-tuning stages. Specifically, the
distribution shift limits the acquisition of effective online samples,
ultimately impacting the online fine-tuning performance. In order to narrow
down the distribution shift between offline and online stages, we proposed Q
conditioned state entropy (QCSE) as intrinsic reward. Specifically, QCSE
maximizes the state entropy of all samples individually, considering their
respective Q values. This approach encourages exploration of low-frequency
samples while penalizing high-frequency ones, and implicitly achieves State
Marginal Matching (SMM), thereby ensuring optimal performance, solving the
asymptotic sub-optimality of constraint-based approaches. Additionally, QCSE
can seamlessly integrate into various RL algorithms, enhancing online
fine-tuning performance. To validate our claim, we conduct extensive
experiments, and observe significant improvements with QCSE (about 13% for CQL
and 8% for Cal-QL). Furthermore, we extended experimental tests to other
algorithms, affirming the generality of QCSE
Mass Transfer Performance of a Water-Sparged Aerocyclone Reactor and Its Application in Wastewater Treatment
- …
