7 research outputs found

    Reparameterized Policy Learning for Multimodal Trajectory Optimization

    Full text link
    We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used Gaussian parameterization. To achieve this, we propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. By conditioning the policy on a latent variable, we derive a novel variational bound as the optimization objective, which promotes exploration of the environment. We then present a practical model-based RL method, called Reparameterized Policy Gradient (RPG), which leverages the multimodal policy parameterization and learned world model to achieve strong exploration capabilities and high data efficiency. Empirical results demonstrate that our method can help agents evade local optima in tasks with dense rewards and solve challenging sparse-reward environments by incorporating an object-centric intrinsic reward. Our method consistently outperforms previous approaches across a range of tasks. Code and supplementary materials are available on the project page https://haosulab.github.io/RPG

    Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

    Full text link
    Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher's language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of our proposed approaches. Our code will be released at https://github.com/xuanlinli17/large_vlm_distillation_oo

    Deductive Verification of Chain-of-Thought Reasoning

    Full text link
    Large Language Models (LLMs) significantly benefit from Chain-of-Thought (CoT) prompting in performing various reasoning tasks. While CoT allows models to produce more comprehensive reasoning processes, its emphasis on intermediate reasoning steps can inadvertently introduce hallucinations and accumulated errors, thereby limiting models' ability to solve complex reasoning tasks. Inspired by how humans engage in careful and meticulous deductive logical reasoning processes to solve tasks, we seek to enable language models to perform explicit and rigorous deductive reasoning, and also ensure the trustworthiness of their reasoning process through self-verification. However, directly verifying the validity of an entire deductive reasoning process is challenging, even with advanced models like ChatGPT. In light of this, we propose to decompose a reasoning verification process into a series of step-by-step subprocesses, each only receiving their necessary context and premises. To facilitate this procedure, we propose Natural Program, a natural language-based deductive reasoning format. Our approach enables models to generate precise reasoning steps where subsequent steps are more rigorously grounded on prior steps. It also empowers language models to carry out reasoning self-verification in a step-by-step manner. By integrating this verification process into each deductive reasoning stage, we significantly enhance the rigor and trustfulness of generated reasoning steps. Along this process, we also improve the answer correctness on complex reasoning tasks. Code will be released at https://github.com/lz1oceani/verify_cot

    Genomic Analyses Reveal Mutational Signatures and Frequently Altered Genes in Esophageal Squamous Cell Carcinoma

    Get PDF
    Esophageal squamous cell carcinoma (ESCC) is one of the most common cancers worldwide and the fourth most lethal cancer in China. However, although genomic studies have identified some mutations associated with ESCC, we know little of the mutational processes responsible. To identify genome-wide mutational signatures, we performed either whole-genome sequencing (WGS) or whole-exome sequencing (WES) on 104 ESCC individuals and combined our data with those of 88 previously reported samples. An APOBEC-mediated mutational signature in 47% of 192 tumors suggests that APOBEC-catalyzed deamination provides a source of DNA damage in ESCC. Moreover, PIK3CA hotspot mutations (c.1624G>A [p.Glu542Lys] and c.1633G>A [p.Glu545Lys]) were enriched in APOBEC-signature tumors, and no smoking-associated signature was observed in ESCC. In the samples analyzed by WGS, we identified focal (<100 kb) amplifications of CBX4 and CBX8. In our combined cohort, we identified frequent inactivating mutations in AJUBA, ZNF750, and PTCH1 and the chromatin-remodeling genes CREBBP and BAP1, in addition to known mutations. Functional analyses suggest roles for several genes (CBX4, CBX8, AJUBA, and ZNF750) in ESCC. Notably, high activity of hedgehog signaling and the PI3K pathway in approximately 60% of 104 ESCC tumors indicates that therapies targeting these pathways might be particularly promising strategies for ESCC. Collectively, our data provide comprehensive insights into the mutational signatures of ESCC and identify markers for early diagnosis and potential therapeutic targets

    Integrated Transcriptome and Metabolome Analysis Reveals Phenylpropanoid Biosynthesis and Phytohormone Signaling Contribute to “<i>Candidatus</i> Liberibacter asiaticus” Accumulation in Citrus Fruit Piths (Fluffy Albedo)

    No full text
    “Candidatus Liberibacter asiaticus” (CLas) is a phloem-restricted α-proteobacterium that is associated with citrus huanglongbing (HLB), which is the most destructive disease that affects all varieties of citrus. Although midrib is usually used as a material for CLas detection, we recently found that the bacterium was enriched in fruits, especially in the fruit pith. However, no study has revealed the molecular basis of these two parts in responding to CLas infection. Therefore, we performed transcriptome and UHPLC–MS-based targeted and untargeted metabolomics analyses in order to organize the essential genes and metabolites that are involved. Transcriptome and metabolome characterized 4834 differentially expressed genes (DEGs) and 383 differentially accumulated metabolites (DAMs) between the two materials, wherein 179 DEGs and 44 DAMs were affected by HLB in both of the tissues, involving the pathways of phenylpropanoid biosynthesis, phytohormone signaling transduction, starch and sucrose metabolism, and photosynthesis. Notably, we discovered that the gene expression that is related to beta-glucosidase and endoglucanase was up-regulated in fruits. In addition, defense-related gene expression and metabolite accumulation were significantly down-regulated in infected fruits. Taken together, the decreased amount of jasmonic acid, coupled with the reduced accumulation of phenylpropanoid and the increased proliferation of indole-3-acetic acid, salicylic acid, and abscisic acid, compared to leaf midribs, may contribute largely to the enrichment of CLas in fruit piths, resulting in disorders of photosynthesis and starch and sucrose metabolism
    corecore