204 research outputs found

    Policy Optimization in RLHF: The Impact of Out-of-preference Data

    Full text link
    Aligning intelligent agents with human preferences and values is important. This paper examines two popular alignment methods: Direct Preference Optimization (DPO) and Reward-Model-Based Policy Optimization (RMB-PO). A variant of RMB-PO, referred to as RMB-PO+ is also considered. These methods, either explicitly or implicitly, learn a reward model from preference data and differ in the data used for policy optimization to unlock the generalization ability of the reward model. In particular, compared with DPO, RMB-PO additionally uses policy-generated data, and RMB-PO+ further leverages new, preference-free data. We examine the impact of such out-of-preference data. Our study, conducted through controlled and synthetic experiments, demonstrates that DPO performs poorly, whereas RMB-PO+ performs the best. In particular, even when providing the policy model with a good feature representation, we find that policy optimization with adequate out-of-preference data significantly improves performance by harnessing the reward model's generalization capabilities

    Comparative mitogenomic analyses of three scallops (Bivalvia: Pectinidae) reveal high level variation of genomic organization and a diversity of transfer RNA gene sets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It can be seen from the available mollusk mitogenomes that the family Pectinidae exhibits the most variation in genome organization. In this study, comparative mitogenomic analyses were performed for three scallops from the subfamily Chlamydinae (Pectinidae), with the goal of characterizing the degree of variability of mitogenome organization and other characteristics among species from the same subfamily and exploring their possible evolution route.</p> <p>Findings</p> <p>The complete or nearly complete mtDNA sequences of scallop <it>Mimachlamys nobilis </it>(17 935 bp), <it>Mizuhopecten yessoensis </it>(20 964 bp) and <it>Chlamys farreri </it>(17 035 bp) were determined using long PCR amplification and primer walking sequencing strategy. Highly variable size difference of the three genomes resulted primarily from length and number variations of non-coding regions, and the major difference in gene content of the three scallop species are due to varying tRNA gene sets. Only 21, 16, and 17 tRNA genes were detected in the mitogenomes of <it>M. nobilis</it>, <it>M. yessoensis </it>and <it>C. farreri</it>, respectively. Remarkably, no <it>trnS </it>gene could be identified in any of the three scallops. A newly-detected <it>trnA</it>-like sequence within the mitogenome of <it>M. yessoensis </it>seems to exemplify the functional loss of a tRNA gene, and the duplication of <it>trnD </it>in <it>M. yessoensis </it>raises a fundamental question of whether the retention of the tRNA gene copy of 2-tRNAs is easier than that of 4-tRNAs. Analysis of putative evolutionary pathways of gene rearrangement indicates that transposition of neighboring gene blocks may play an important role in the evolution of mitogenomes in scallops. Parsimonious analysis of the genomic variations implies that the mitogenomes of <it>M. yessoensis </it>and <it>C. farreri </it>are likely to derive independently from a common ancestor that was closely related to <it>M. nobilis</it>.</p> <p>Conclusion</p> <p>Comparative mitogenomic analyses among three species from the subfamily Chlamydinae show that the three genomes exhibit a high level of genomic variation and a diversity of tRNA gene sets, characterized by extensive translocation of genes. These features provide useful clues and information for evolutionary analysis of scallop mitogenomes.</p

    Genetic analysis of selected strains of eastern oyster (Crassostrea virginica Gmelin) using AFLP and microsatellite

    Get PDF
    Abstract: Amplified fragment length polymorphisms (AFLPs) and microsatellite markers were used to examine genetic variation and divergence in 4 selected strains (DBH, NEH, FMF, and CTS) and 1 wild population (DBW) of the eastern oyster Crassostrea virginica Gmelin. Eighty-six AFLP markers (from 3 primer pairs) and 5 microsatellite loci were used for the analysis of 30 oysters from each of the 5 populations. Microsatellite loci were considerably more variable than AFLPs. The observed heterozygosity ranged from 0.560 to 0.640 across populations for microsatellites, and from 0.186 to 0.207 for AFLPs. Both F st and / PT of microsatellite data and / PT statistics of AFLP data revealed significant divergence between all pairs of populations. There was no significant reduction in heterozygosity in all 4 selected strains; however, the number of alleles per locus was considerably lower in the selected strains than in the wild population. Two strains subjected to long-term selection for disease resistance shared frequency shifts at a few loci, which deserve further analysis to determine if they are linked to disease-resistance genes

    Provably Efficient Adversarial Imitation Learning with Unknown Transitions

    Full text link
    Imitation learning (IL) has proven to be an effective method for learning good policies from expert demonstrations. Adversarial imitation learning (AIL), a subset of IL methods, is particularly promising, but its theoretical foundation in the presence of unknown transitions has yet to be fully developed. This paper explores the theoretical underpinnings of AIL in this context, where the stochastic and uncertain nature of environment transitions presents a challenge. We examine the expert sample complexity and interaction complexity required to recover good policies. To this end, we establish a framework connecting reward-free exploration and AIL, and propose an algorithm, MB-TAIL, that achieves the minimax optimal expert sample complexity of O~(H3/2S/ε)\widetilde{O} (H^{3/2} |S|/\varepsilon) and interaction complexity of O~(H3S2A/ε2)\widetilde{O} (H^{3} |S|^2 |A|/\varepsilon^2). Here, HH represents the planning horizon, S|S| is the state space size, A|A| is the action space size, and ε\varepsilon is the desired imitation gap. MB-TAIL is the first algorithm to achieve this level of expert sample complexity in the unknown transition setting and improves upon the interaction complexity of the best-known algorithm, OAL, by O(H)O(H). Additionally, we demonstrate the generalization ability of MB-TAIL by extending it to the function approximation setting and proving that it can achieve expert sample and interaction complexity independent of S|S|Comment: arXiv admin note: text overlap with arXiv:2106.1042

    Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis

    Full text link
    Imitation learning learns a policy from expert trajectories. While the expert data is believed to be crucial for imitation quality, it was found that a kind of imitation learning approach, adversarial imitation learning (AIL), can have exceptional performance. With as little as only one expert trajectory, AIL can match the expert performance even in a long horizon, on tasks such as locomotion control. There are two mysterious points in this phenomenon. First, why can AIL perform well with only a few expert trajectories? Second, why does AIL maintain good performance despite the length of the planning horizon? In this paper, we theoretically explore these two questions. For a total-variation-distance-based AIL (called TV-AIL), our analysis shows a horizon-free imitation gap O({min{1,S/N})\mathcal O(\{\min\{1, \sqrt{|\mathcal S|/N} \}) on a class of instances abstracted from locomotion control tasks. Here S|\mathcal S| is the state space size for a tabular Markov decision process, and NN is the number of expert trajectories. We emphasize two important features of our bound. First, this bound is meaningful in both small and large sample regimes. Second, this bound suggests that the imitation gap of TV-AIL is at most 1 regardless of the planning horizon. Therefore, this bound can explain the empirical observation. Technically, we leverage the structure of multi-stage policy optimization in TV-AIL and present a new stage-coupled analysis via dynamic programmin

    l-2,3-Diaminopropionate: One of the building blocks for the biosynthesis of Zwittermicin A in Bacillus thuringiensis subsp. kurstaki strain YBT-1520

    Get PDF
    AbstractZwittermicin A (ZwA) is a hybrid polyketide–non-ribosomal peptide that is thought to be biosynthesized from five proposed building blocks, including the 2,3-diaminopropionate. Candidate genes for de novo biosynthesis of 2,3-diaminopropionate, zwa5A and zwa5B, have been identified in a previous study. In this research, zwa5A was interrupted and chemically synthesized 2,3-diaminopropionate was used to feed the zwa5A− mutant. Results showed that feeding with 2,3-diaminopropionate restored the ability of the zwa5A− mutant to produce ZwA. Another non-ribosomal peptide synthase gene, designated orf3, was identified. Amino acid dependent PPi release assay showed that the adenylation domain ZWAA2 of ORF3 acyl-adenylated l-2,3-diaminopropionate effectively. Taken together, it can be concluded that l-2,3-diaminopropionate is indeed one of the building blocks for the biosynthesis of Zwittermicin A

    Surface display of heterologous proteins in Bacillus thuringiensis using a peptidoglycan hydrolase anchor

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Previous studies have revealed that the lysin motif (LysM) domains of bacterial cell wall-degrading enzymes are able to bind to peptidoglycan moieties of the cell wall. This suggests an approach for a cell surface display system in Gram-positive bacteria using a LysM-containing protein as the anchoring motif. In this study, we developed a new surface display system in <it>B. thuringiensis </it>using a LysM-containing peptidoglycan hydrolase, endo-<it>β</it>-<it>N</it>-acetylglucosaminidase (Mbg), as the anchor protein.</p> <p>Results</p> <p>Homology searching in the <it>B. thuringiensis </it>YBT-1520 genome revealed a putative peptidoglycan hydrolase gene. The encoded protein, Mbg, exhibited substantial cell-wall binding capacity. The deduced amino acid sequence of Mbg was structurally distinguished as an N-terminal domain with two tandemly aligned LysMs and a C-terminal catalytic domain. A GFP-fusion protein was expressed and used to verify the surface localization by Western blot, flow cytometry, protease accessibility, SDS sensitivity, immunofluorescence, and electron microscopy assays. Low-level constitutive expression of Mbg was elevated by introducing a sporulation-independent promoter of <it>cry3Aa</it>. Truncated Mbg domains with separate N-terminus (Mbgn), C-terminus (Mbgc), LysM<sub>1</sub>, or LysM<sub>2 </sub>were further compared for their cell-wall displaying efficiencies. The Mbgn moiety contributed to cell-wall anchoring, while LysM<sub>1 </sub>was the active domain. Two tandemly repeated Mbgns exhibited the highest display activity, while the activity of three repeated Mbgns was decreased. A heterologous bacterial multicopper oxidase (WlacD) was successfully displayed onto the surface of <it>B. thuringiensis </it>target cells using the optimum (Mbgn)<sub>2 </sub>anchor, without radically altering its catalytic activity.</p> <p>Conclusion</p> <p>Mbg can be a functional anchor protein to target different heterologous proteins onto the surface of <it>B. thuringiensis </it>cells. Since the LysM domain appears to be universal in Gram-positive bacteria, the strategy presented here could be applicable in other bacteria for developing this type of system.</p
    corecore