112 research outputs found
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning
We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a new
practical algorithm for offline reinforcement learning (RL) in complex
environments with insufficient data coverage. Our algorithm combines the
marginalized importance sampling framework with the actor-critic paradigm,
where the critic returns evaluations of the actor (policy) that are pessimistic
relative to the offline data and have a small average (importance-weighted)
Bellman error. Compared to existing methods, our algorithm simultaneously
offers a number of advantages: (1) It achieves the optimal statistical rate of
-- where is the size of offline dataset -- in converging to
the best policy covered in the offline dataset, even when combined with general
function approximators. (2) It relies on a weaker average notion of policy
coverage (compared to the single-policy concentrability) that
exploits the structure of policy visitations. (3) It outperforms the
data-collection behavior policy over a wide range of specific hyperparameters.
We provide both theoretical analysis and experimental results to validate the
effectiveness of our proposed algorithm.Comment: 24 pages, 3 figure
Comparative analysis of partitioned stator flux reversal PM machine and magnetically geared machine operating in Stator-PM and Rotor-PM modes
In this paper, the partitioned stator flux reversal permanent magnet (PM) (PS-FRPM) machine and the conventional magnetically geared (MG) machine operating in both stator-PM (SPM) and rotor-PM (RPM) modes are comparatively analyzed in terms of electromagnetic performance to provide design guides for a MG machine regarding: (a) a SPM or RPM type machine and (b) a higher or lower gear ratio machine. It is found that a SPM type machine is recommended, since both PS-FRPM and MG machines operating in SPM modes have a higher phase back-EMF and hence torque than their RPM counterparts, respectively, as a result of a similar phase flux-linkage but a higher electric frequency since the iron piece number is larger than the PM pole-pair number. Moreover, a smaller gear ratio machine is preferred from the perspective of a higher power factor and hence a lower inverter power rating, as the conventional MG machines with higher gear ratios suffer from larger flux-leakage, higher synchronous reactance and hence lower power factors, as well as higher iron losses, than the PS-FRPM machines. However, higher gear ratio machines feature lower cogging torques and torque ripples due to the smaller difference between the PM pole-pair number and iron piece number. Both prototypes of PS-FRPM machine operating in SPM mode and MG machine operating in RPM mode are built and tested to verify the FE predicted results
Vector-Matrix-Vector Queries for Solving Linear Algebra, Statistics, and Graph Problems
We consider the general problem of learning about a matrix through vector-matrix-vector queries. These queries provide the value of u^{T}Mv over a fixed field ? for a specified pair of vectors u,v ? ??. To motivate these queries, we observe that they generalize many previously studied models, such as independent set queries, cut queries, and standard graph queries. They also specialize the recently studied matrix-vector query model. Our work is exploratory and broad, and we provide new upper and lower bounds for a wide variety of problems, spanning linear algebra, statistics, and graphs. Many of our results are nearly tight, and we use diverse techniques from linear algebra, randomized algorithms, and communication complexity
SeisCLIP: A seismology foundation model pre-trained by multi-modal data for multi-purpose seismic feature extraction
Training specific deep learning models for particular tasks is common across
various domains within seismology. However, this approach encounters two
limitations: inadequate labeled data for certain tasks and limited
generalization across regions. To address these challenges, we develop
SeisCLIP, a seismology foundation model trained through contrastive learning
from multi-modal data. It consists of a transformer encoder for extracting
crucial features from time-frequency seismic spectrum and an MLP encoder for
integrating the phase and source information of the same event. These encoders
are jointly pre-trained on a vast dataset and the spectrum encoder is
subsequently fine-tuned on smaller datasets for various downstream tasks.
Notably, SeisCLIP's performance surpasses that of baseline methods in event
classification, localization, and focal mechanism analysis tasks, employing
distinct datasets from different regions. In conclusion, SeisCLIP holds
significant potential as a foundational model in the field of seismology,
paving the way for innovative directions in foundation-model-based seismology
research.Comment: 27 pages, 9 figures, 4 table
Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian
Offline reinforcement learning (RL), which refers to decision-making from a
previously-collected dataset of interactions, has received significant
attention over the past years. Much effort has focused on improving offline RL
practicality by addressing the prevalent issue of partial data coverage through
various forms of conservative policy learning. While the majority of algorithms
do not have finite-sample guarantees, several provable conservative offline RL
algorithms are designed and analyzed within the single-policy concentrability
framework that handles partial coverage. Yet, in the nonlinear function
approximation setting where confidence intervals are difficult to obtain,
existing provable algorithms suffer from computational intractability,
prohibitively strong assumptions, and suboptimal statistical rates. In this
paper, we leverage the marginalized importance sampling (MIS) formulation of RL
and present the first set of offline RL algorithms that are statistically
optimal and practical under general function approximation and single-policy
concentrability, bypassing the need for uncertainty quantification. We identify
that the key to successfully solving the sample-based approximation of the MIS
problem is ensuring that certain occupancy validity constraints are nearly
satisfied. We enforce these constraints by a novel application of the augmented
Lagrangian method and prove the following result: with the MIS formulation,
augmented Lagrangian is enough for statistically optimal offline RL. In stark
contrast to prior algorithms that induce additional conservatism through
methods such as behavior regularization, our approach provably eliminates this
need and reinterprets regularizers as "enforcers of occupancy validity" than
"promoters of conservatism."Comment: 49 pages, 1 figur
Using Vesicular Dispersion for Stabilizing Suspensions of Dense Colloidal Particles against Sedimentation
Colloidal dispersions, like inks and paints, are often required to remain stable for long times, i.e., the dispersed colloidal particles should remain suspended. In most cases, a stable dispersion requires preventing the agglomeration of the suspended colloidal particles. If the particles agglomerate, their sizes will increase and rapid sedimentation will occur. Nevertheless, many colloidal particles of commercial interest have high densities. Thus, they quickly settle even without agglomeration. One novel approach to preventing the settling of high density particles is the use of close-packed vesicular dispersions (CPVDs) made of the surfactant DDAB (didodecyldimethylamine bromide). Previous work demonstrated the ability of these CPVDs to prohibit the settling of high density titania particles. However, only a limited range of particle sizes that were found to remain stable with CPVDs were investigated. Also, the effects of the method of preparation of the CPVDs was not fully explored, as an effecitve CPVD should be generated from the smallest possible amount of added DDAB. Thus, the impact of various preparation methods on the resulting properties of the DDAB vesicular dispersions are examined. DDAB vesicular dispersions are generated via stirring only to form primarily liposomes, sonication to break down large multi-layer vesicles, and extrusion through membranes to obtain specifically sized vesicles. Various light scattering and absorbance techniques are also used to probe the structure of the vesicular dispersions, important information needed for improving the ability of CPVDs to stabilize against sedimentation a broader range of colloidal particle sizes
End-to-end Story Plot Generator
Story plots, while short, carry most of the essential information of a full
story that may contain tens of thousands of words. We study the problem of
automatic generation of story plots, which includes story premise, character
descriptions, plot outlines, etc. To generate a single engaging plot, existing
plot generators (e.g., DOC (Yang et al., 2022a)) require hundreds to thousands
of calls to LLMs (e.g., OpenAI API) in the planning stage of the story plot,
which is costly and takes at least several minutes. Moreover, the hard-wired
nature of the method makes the pipeline non-differentiable, blocking fast
specialization and personalization of the plot generator. In this paper, we
propose three models, , and
, to address these challenges. replaces
expensive OpenAI API calls with LLaMA2 (Touvron et al., 2023) calls via careful
prompt designs, which leads to inexpensive generation of high-quality training
datasets of story plots. We then train an end-to-end story plot generator,
, by supervised fine-tuning (SFT) using approximately 13000
story plots generated by . generates
story plots of comparable quality to , and is > 10
faster (1k tokens in only 30 seconds on average). Finally, we obtain
that is further fine-tuned with RLHF on several different
reward models for different aspects of story quality, which yields 60.0
winning rate against along the aspect of suspense and
surprise.Comment: 17 page
- …