4 research outputs found

    Generative Exploration and Exploitation

    Full text link
    Sparse reward is one of the biggest challenges in reinforcement learning (RL). In this paper, we propose a novel method called Generative Exploration and Exploitation (GENE) to overcome sparse reward. GENE automatically generates start states to encourage the agent to explore the environment and to exploit received reward signals. GENE can adaptively tradeoff between exploration and exploitation according to the varying distributions of states experienced by the agent as the learning progresses. GENE relies on no prior knowledge about the environment and can be combined with any RL algorithm, no matter on-policy or off-policy, single-agent or multi-agent. Empirically, we demonstrate that GENE significantly outperforms existing methods in three tasks with only binary rewards, including Maze, Maze Ant, and Cooperative Navigation. Ablation studies verify the emergence of progressive exploration and automatic reversing.Comment: AAAI'2

    Quantum Machine Learning Technique for Automatic Retrosynthetic Reaction Pathway Search Method

    Get PDF
    Retrosynthetic analysis often involves evaluating many potential candidate reaction pathways and molecules at multiple stages of the reaction, resulting in complex retrosynthesis trees that need to be searched and parsed efficiently. Computational approaches could significantly aid the chemist in  solving different aspects of the retrosynthesis problem, such as the graph-theoretic search methodologies for efficient tree traversal to identify feasible reaction pathways, dictionary-based methods to evaluate a large search space of precursors, and chemistry-driven heuristics to eliminate practically infeasible routes. In this research, a new single-step retrosynthesis prediction method of the Retro TRAE SMILES-based translation technique is proposed. Accordingly, quantum computing with tree-tensor network topology is presented to construct an automatic data-driven end-to-end retrosynthetic route planning system (Auto-Syn-Route), which is presented based on the heuristic scoring function. AutoSynRoute successfully reproduced published synthesis routes for the four case products. The model is trained in an end-to-end and fully data-driven fashion. Unlike previous models translating the SMILES strings of reactants and products, a new way of representing a chemical reaction based on molecular fragments is introduced. It is demonstrated that the new approach yields better prediction results than current state-of-the-art computational methods. The new approach resolves the major drawbacks of existing retrosynthetic methods such as generating invalid SMILES strings. The proposed method is implemented using Python software. The proposed approach predicts highly similar reactant molecules with an accuracy of 68%. In addition, the proposed method yields more robust predictions than existing methods. However, the experiments demonstrate that the proposed scheme significantly improves the success rate of solving the retrosynthetic problem by 97% while maintaining the performance of the quantum tree tensor for predicting valid reactions

    Generative Exploration and Exploitation

    No full text
    corecore