112 research outputs found

    Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

    Full text link
    We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a new practical algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage. Our algorithm combines the marginalized importance sampling framework with the actor-critic paradigm, where the critic returns evaluations of the actor (policy) that are pessimistic relative to the offline data and have a small average (importance-weighted) Bellman error. Compared to existing methods, our algorithm simultaneously offers a number of advantages: (1) It achieves the optimal statistical rate of 1/N1/\sqrt{N} -- where NN is the size of offline dataset -- in converging to the best policy covered in the offline dataset, even when combined with general function approximators. (2) It relies on a weaker average notion of policy coverage (compared to the ℓ∞\ell_\infty single-policy concentrability) that exploits the structure of policy visitations. (3) It outperforms the data-collection behavior policy over a wide range of specific hyperparameters. We provide both theoretical analysis and experimental results to validate the effectiveness of our proposed algorithm.Comment: 24 pages, 3 figure

    Comparative analysis of partitioned stator flux reversal PM machine and magnetically geared machine operating in Stator-PM and Rotor-PM modes

    Get PDF
    In this paper, the partitioned stator flux reversal permanent magnet (PM) (PS-FRPM) machine and the conventional magnetically geared (MG) machine operating in both stator-PM (SPM) and rotor-PM (RPM) modes are comparatively analyzed in terms of electromagnetic performance to provide design guides for a MG machine regarding: (a) a SPM or RPM type machine and (b) a higher or lower gear ratio machine. It is found that a SPM type machine is recommended, since both PS-FRPM and MG machines operating in SPM modes have a higher phase back-EMF and hence torque than their RPM counterparts, respectively, as a result of a similar phase flux-linkage but a higher electric frequency since the iron piece number is larger than the PM pole-pair number. Moreover, a smaller gear ratio machine is preferred from the perspective of a higher power factor and hence a lower inverter power rating, as the conventional MG machines with higher gear ratios suffer from larger flux-leakage, higher synchronous reactance and hence lower power factors, as well as higher iron losses, than the PS-FRPM machines. However, higher gear ratio machines feature lower cogging torques and torque ripples due to the smaller difference between the PM pole-pair number and iron piece number. Both prototypes of PS-FRPM machine operating in SPM mode and MG machine operating in RPM mode are built and tested to verify the FE predicted results

    Vector-Matrix-Vector Queries for Solving Linear Algebra, Statistics, and Graph Problems

    Get PDF
    We consider the general problem of learning about a matrix through vector-matrix-vector queries. These queries provide the value of u^{T}Mv over a fixed field ? for a specified pair of vectors u,v ? ??. To motivate these queries, we observe that they generalize many previously studied models, such as independent set queries, cut queries, and standard graph queries. They also specialize the recently studied matrix-vector query model. Our work is exploratory and broad, and we provide new upper and lower bounds for a wide variety of problems, spanning linear algebra, statistics, and graphs. Many of our results are nearly tight, and we use diverse techniques from linear algebra, randomized algorithms, and communication complexity

    SeisCLIP: A seismology foundation model pre-trained by multi-modal data for multi-purpose seismic feature extraction

    Full text link
    Training specific deep learning models for particular tasks is common across various domains within seismology. However, this approach encounters two limitations: inadequate labeled data for certain tasks and limited generalization across regions. To address these challenges, we develop SeisCLIP, a seismology foundation model trained through contrastive learning from multi-modal data. It consists of a transformer encoder for extracting crucial features from time-frequency seismic spectrum and an MLP encoder for integrating the phase and source information of the same event. These encoders are jointly pre-trained on a vast dataset and the spectrum encoder is subsequently fine-tuned on smaller datasets for various downstream tasks. Notably, SeisCLIP's performance surpasses that of baseline methods in event classification, localization, and focal mechanism analysis tasks, employing distinct datasets from different regions. In conclusion, SeisCLIP holds significant potential as a foundational model in the field of seismology, paving the way for innovative directions in foundation-model-based seismology research.Comment: 27 pages, 9 figures, 4 table

    Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

    Full text link
    Offline reinforcement learning (RL), which refers to decision-making from a previously-collected dataset of interactions, has received significant attention over the past years. Much effort has focused on improving offline RL practicality by addressing the prevalent issue of partial data coverage through various forms of conservative policy learning. While the majority of algorithms do not have finite-sample guarantees, several provable conservative offline RL algorithms are designed and analyzed within the single-policy concentrability framework that handles partial coverage. Yet, in the nonlinear function approximation setting where confidence intervals are difficult to obtain, existing provable algorithms suffer from computational intractability, prohibitively strong assumptions, and suboptimal statistical rates. In this paper, we leverage the marginalized importance sampling (MIS) formulation of RL and present the first set of offline RL algorithms that are statistically optimal and practical under general function approximation and single-policy concentrability, bypassing the need for uncertainty quantification. We identify that the key to successfully solving the sample-based approximation of the MIS problem is ensuring that certain occupancy validity constraints are nearly satisfied. We enforce these constraints by a novel application of the augmented Lagrangian method and prove the following result: with the MIS formulation, augmented Lagrangian is enough for statistically optimal offline RL. In stark contrast to prior algorithms that induce additional conservatism through methods such as behavior regularization, our approach provably eliminates this need and reinterprets regularizers as "enforcers of occupancy validity" than "promoters of conservatism."Comment: 49 pages, 1 figur

    Using Vesicular Dispersion for Stabilizing Suspensions of Dense Colloidal Particles against Sedimentation

    Get PDF
    Colloidal dispersions, like inks and paints, are often required to remain stable for long times, i.e., the dispersed colloidal particles should remain suspended. In most cases, a stable dispersion requires preventing the agglomeration of the suspended colloidal particles. If the particles agglomerate, their sizes will increase and rapid sedimentation will occur. Nevertheless, many colloidal particles of commercial interest have high densities. Thus, they quickly settle even without agglomeration. One novel approach to preventing the settling of high density particles is the use of close-packed vesicular dispersions (CPVDs) made of the surfactant DDAB (didodecyldimethylamine bromide). Previous work demonstrated the ability of these CPVDs to prohibit the settling of high density titania particles. However, only a limited range of particle sizes that were found to remain stable with CPVDs were investigated. Also, the effects of the method of preparation of the CPVDs was not fully explored, as an effecitve CPVD should be generated from the smallest possible amount of added DDAB. Thus, the impact of various preparation methods on the resulting properties of the DDAB vesicular dispersions are examined. DDAB vesicular dispersions are generated via stirring only to form primarily liposomes, sonication to break down large multi-layer vesicles, and extrusion through membranes to obtain specifically sized vesicles. Various light scattering and absorbance techniques are also used to probe the structure of the vesicular dispersions, important information needed for improving the ability of CPVDs to stabilize against sedimentation a broader range of colloidal particle sizes

    End-to-end Story Plot Generator

    Full text link
    Story plots, while short, carry most of the essential information of a full story that may contain tens of thousands of words. We study the problem of automatic generation of story plots, which includes story premise, character descriptions, plot outlines, etc. To generate a single engaging plot, existing plot generators (e.g., DOC (Yang et al., 2022a)) require hundreds to thousands of calls to LLMs (e.g., OpenAI API) in the planning stage of the story plot, which is costly and takes at least several minutes. Moreover, the hard-wired nature of the method makes the pipeline non-differentiable, blocking fast specialization and personalization of the plot generator. In this paper, we propose three models, OpenPlot\texttt{OpenPlot}, E2EPlot\texttt{E2EPlot} and RLPlot\texttt{RLPlot}, to address these challenges. OpenPlot\texttt{OpenPlot} replaces expensive OpenAI API calls with LLaMA2 (Touvron et al., 2023) calls via careful prompt designs, which leads to inexpensive generation of high-quality training datasets of story plots. We then train an end-to-end story plot generator, E2EPlot\texttt{E2EPlot}, by supervised fine-tuning (SFT) using approximately 13000 story plots generated by OpenPlot\texttt{OpenPlot}. E2EPlot\texttt{E2EPlot} generates story plots of comparable quality to OpenPlot\texttt{OpenPlot}, and is > 10×\times faster (1k tokens in only 30 seconds on average). Finally, we obtain RLPlot\texttt{RLPlot} that is further fine-tuned with RLHF on several different reward models for different aspects of story quality, which yields 60.0%\% winning rate against E2EPlot\texttt{E2EPlot} along the aspect of suspense and surprise.Comment: 17 page
    • …
    corecore