30 research outputs found

    Preference-grounded Token-level Guidance for Language Model Fine-tuning

    Full text link
    Aligning language models (LMs) with preferences is an important problem in natural language generation. A key challenge is that preferences are typically provided at the sequence level while LM training and generation both occur at the token level. There is, therefore, a granularity mismatch between the preference and the LM training losses, which may complicate the learning problem. In this paper, we address this issue by developing an alternate training process, where we iterate between grounding the sequence-level preference into token-level training guidance, and improving the LM with the learned guidance. For guidance learning, we design a framework that extends the pairwise-preference learning in imitation learning to both variable-length LM generation and utilizing the preference among multiple generations. For LM training, based on the amount of supervised data, we present two minimalist learning objectives that utilize the learned guidance. In experiments, our method performs competitively on two distinct representative LM tasks -- discrete-prompt generation and text summarization

    A Truncated IL‐17RC Peptide Ameliorates Synovitis and Bone Destruction of Arthritic Mice

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/134880/1/adhm201600668_am.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/134880/2/adhm201600668-sup-0001-S1.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/134880/3/adhm201600668.pd

    A Model for Demand Planning in Supply Chains with Congestion Effects

    No full text
    This paper is concerned with demand planning for internal supply chains consisting of workstations, production facilities, warehouses, and transportation links. We address the issue of how to help a supplier firmly accept orders and subsequently plan to fulfill demand. We first formulate a linear aggregate planning model for demand management that incorporates elements of order promising, recipe run constraints, and capacity limitations. Using several scenarios, we discuss the use of the model in demand planning and capacity planning to help a supplier firmly respond to requests for quotations. We extend the model to incorporate congestion effects at assembly and blending nodes using clearing functions; the resulting model is nonlinear. We develop and test two algorithms to solve the nonlinear model: one based on inner approximation and the other on outer approximation

    Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learning from static datasets, without interacting with the underlying environment during the learning process. A key challenge of offline RL is the instability of policy training, caused by the mismatch between the distribution of the offline data and the undiscounted stationary state-action distribution of the learned policy. To avoid the detrimental impact of distribution mismatch, we regularize the undiscounted stationary distribution of the current policy towards the offline data during the policy optimization process. Further, we train a dynamics model to both implement this regularization and better estimate the stationary distribution of the current policy, reducing the error induced by distribution mismatch. On a wide range of continuous-control offline RL datasets, our method indicates competitive performance, which validates our algorithm. The code is publicly available.Comment: International Conference on Machine Learning (ICML) 202

    A Regularized Implicit Policy for Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. The lack of environmental interactions makes the policy training vulnerable to state-action pairs far from the training dataset and prone to missing rewarding actions. For training more effective agents, we propose a framework that supports learning a flexible yet well-regularized fully-implicit policy. We further propose a simple modification to the classical policy-matching methods for regularizing with respect to the dual form of the Jensen--Shannon divergence and the integral probability metrics. We theoretically show the correctness of the policy-matching approach, and the correctness and a good finite-sample property of our modification. An effective instantiation of our framework through the GAN structure is provided, together with techniques to explicitly smooth the state-action mapping for robust generalization beyond the static dataset. Extensive experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs

    Behaviors of Silicon, Aluminum and Iron and Kinetics of Silicon from the Roasted Clinker of Silver Tailings in Water–Acid Leaching Process

    No full text
    In order to achieve efficient resource utilization of metal tailings, taking the roasted clinker of silver tailings (RCST) as the object, the dissolution behaviors of Si, Al and Fe in the water–acid two-stage leaching process and the water leaching kinetics of Si were investigated in this study. Single-factor experiments were performed to investigate the effects of the leaching parameters; the XRF, XRD and SEM-EDS methods were used to characterize the leaching residues with different leaching times, and the leaching kinetics models of Si were established. The results demonstrate that, in the water leaching stage, the sodium silicate and a small part of the structurally unstable sodium aluminosilicate in RCST are dissolved, while the nepheline, most of the sodium aluminosilicate and the mixed materials containing iron enter the water leaching residue. The first 5 min of water leaching is controlled by both interfacial transfer and diffusion across the product layer, with an apparent activation energy of 22.36 kJ/mol, and the dissolution reaction during 5–15 min is controlled by the unsteady diffusion of the liquid film, with an apparent activation energy of 14.22 kJ/mol. The structure of the materials in the clinker is completely destroyed, and a great number of fissures and pores are produced by the continued dissolving action of the water. Thus, in the acid leaching stage, the amorphous Si-, Al- and Fe-containing substances in the water leaching residue are rapidly dissolved in the sulfuric acid solution at a lower temperature

    Designation of a Novel DKK1 Multiepitope DNA Vaccine and Inhibition of Bone Loss in Collagen-Induced Arthritic Mice

    Get PDF
    Dickkopf-1 (DKK1), a secretory inhibitor of canonical Wnt signaling, plays a critical role in certain bone loss diseases. Studies have shown that serum levels of DKK1 are significantly higher in rheumatoid arthritis (RA) patients and are correlated with the severity of the disease, which indicates the possibility that bone erosion in RA may be inhibited by neutralizing the biological activity of DKK1. In this study, we selected a panel of twelve peptides using the software DNASTAR 7.1 and screened high affinity and immunogenicity epitopes in vitro and in vivo assays. Furthermore, we optimized four B cell epitopes to design a novel DKK1 multiepitope DNA vaccine and evaluated its bone protective effects in collagen-induced arthritis (CIA), a mouse model of RA. High level expression of the designed vaccine was measured in supernatant of COS7 cells. In addition, intramuscular immunization of BALB/c mice with this vaccine was also highly expressed and sufficient to induce the production of long-term IgG, which neutralized natural DKK1 in vivo. Importantly, this vaccine significantly attenuated bone erosion in CIA mice compared with positive control mice. These results provide evidence for the development of a DNA vaccine targeted against DKK1 to attenuate bone erosion

    Characterization of a novel esterase Rv0045c from Mycobacterium tuberculosis.

    Get PDF
    It was proposed that there are at least 250 enzymes in M. tuberculosis involved in lipid metabolism. Rv0045c was predicted to be a hydrolase by amino acid sequence similarity, although its precise biochemical characterization and function remained to be defined.We expressed the Rv0045c protein to high levels in E. coli and purified the protein to high purity. We confirmed that the prepared protein was the Rv0045c protein by mass spectrometry analysis. Circular dichroism spectroscopy analysis showed that the protein possessed abundant β-sheet secondary structure, and confirmed that its conformation was stable in the range pH 6.0-10.0 and at temperatures ≤ 40 °C. Enzyme activity analysis indicated that the Rv0045c protein could efficiently hydrolyze short chain p-nitrophenyl esters (C₂-C₈), and its suitable substrate was p-nitrophenyl caproate (C₆) with optimal catalytic conditions of 39 °C and pH 8.0.Our results demonstrated that the Rv0045c protein is a novel esterase. These experiments will be helpful in understanding ester/lipid metabolism related to M. tuberculosis
    corecore