65 research outputs found

    Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

    Full text link
    Reinforcement Learning (RL) methods are typically applied directly in environments to learn policies. In some complex environments with continuous state-action spaces, sparse rewards, and/or long temporal horizons, learning a good policy in the original environments can be difficult. Focusing on the offline RL setting, we aim to build a simple and discrete world model that abstracts the original environment. RL methods are applied to our world model instead of the environment data for simplified policy learning. Our world model, dubbed Value Memory Graph (VMG), is designed as a directed-graph-based Markov decision process (MDP) of which vertices and directed edges represent graph states and graph actions, separately. As the state-action spaces of VMG are finite and relatively small compared to the original environment, we can directly apply the value iteration algorithm on VMG to estimate graph state values and figure out the best graph actions. VMG is trained from and built on the offline RL dataset. Together with an action translator that converts the abstract graph actions in VMG to real actions in the original environment, VMG controls agents to maximize episode returns. Our experiments on the D4RL benchmark show that VMG can outperform state-of-the-art offline RL methods in several tasks, especially when environments have sparse rewards and long temporal horizons. Code will be made publicly available

    Video ChatCaptioner: Towards the Enriched Spatiotemporal Descriptions

    Full text link
    Video captioning aims to convey dynamic scenes from videos using natural language, facilitating the understanding of spatiotemporal information within our environment. Although there have been recent advances, generating detailed and enriched video descriptions continues to be a substantial challenge. In this work, we introduce Video ChatCaptioner, an innovative approach for creating more comprehensive spatiotemporal video descriptions. Our method employs a ChatGPT model as a controller, specifically designed to select frames for posing video content-driven questions. Subsequently, a robust algorithm is utilized to answer these visual queries. This question-answer framework effectively uncovers intricate video details and shows promise as a method for enhancing video content. Following multiple conversational rounds, ChatGPT can summarize enriched video content based on previous conversations. We qualitatively demonstrate that our Video ChatCaptioner can generate captions containing more visual details about the videos. The code is publicly available at https://github.com/Vision-CAIR/ChatCaptione

    MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

    Full text link
    The recent GPT-4 has demonstrated extraordinary multi-modal abilities, such as directly generating websites from handwritten text and identifying humorous elements within images. These features are rarely observed in previous vision-language models. We believe the primary reason for GPT-4's advanced multi-modal generation capabilities lies in the utilization of a more advanced large language model (LLM). To examine this phenomenon, we present MiniGPT-4, which aligns a frozen visual encoder with a frozen LLM, Vicuna, using just one projection layer. Our findings reveal that MiniGPT-4 possesses many capabilities similar to those exhibited by GPT-4 like detailed image description generation and website creation from hand-written drafts. Furthermore, we also observe other emerging capabilities in MiniGPT-4, including writing stories and poems inspired by given images, providing solutions to problems shown in images, teaching users how to cook based on food photos, etc. In our experiment, we found that only performing the pretraining on raw image-text pairs could produce unnatural language outputs that lack coherency including repetition and fragmented sentences. To address this problem, we curate a high-quality, well-aligned dataset in the second stage to finetune our model using a conversational template. This step proved crucial for augmenting the model's generation reliability and overall usability. Notably, our model is highly computationally efficient, as we only train a projection layer utilizing approximately 5 million aligned image-text pairs. Our code, pre-trained model, and collected dataset are available at https://minigpt-4.github.io/.Comment: Project Website: https://minigpt-4.github.io/; Code, Pretrained Model, and Dataset: https://github.com/Vision-CAIR/MiniGPT-4; Deyao Zhu and Jun Chen contributed equally to this wor

    Profiling of mismatch discrimination in RNAi enabled rational design of allele-specific siRNAs

    Get PDF
    Silencing specificity is a critical issue in the therapeutic applications of siRNA, particularly in the treatment of single nucleotide polymorphism (SNP) diseases where discrimination against single nucleotide variation is demanded. However, no generally applicable guidelines are available for the design of such allele-specific siRNAs. In this paper, the issue was approached by using a reporter-based assay. With a panel of 20 siRNAs and 240 variously mismatched target reporters, we first demonstrated that the mismatches were discriminated in a position-dependent order, which was however independent of their sequence contexts using position 4th, 12th and 17th as examples. A general model was further built for mismatch discrimination at all positions using 230 additional reporter constructs specifically designed to contain mismatches distributed evenly along the target regions of different siRNAs. This model was successfully employed to design allele-specific siRNAs targeting disease-causing mutations of PIK3CA gene at two SNP sites. Furthermore, conformational distortion of siRNA-target duplex was observed to correlate with the compromise of gene silencing. In summary, these findings could dramatically simplify the design of allele-specific siRNAs and might also provide guide to increase the specificity of therapeutic siRNAs

    A flow-through electroporation chip integrated with viable cell sorting based on dielectrophoresis

    No full text
    Flow-through electroporation on a chip is a promising technique to introduce molecules into cells in applications of biological research, drug delivery and gene therapy. During the electroporation process, the electrical field also causes large amount of non-viable cells, which significantly influence the following biological procedure. Off-chip separation of the variable cells increases experimental complexity dramatically and results in further cell death. In this paper, we have developed a flow-through electroporation chip, in which dielectrophoresis (DEP) is employed to sort the viable cells and non-viable cells on chip. For the standard expression cell line HEK-293a (Human embryonic kidney cells), the ratio of the viable cells in the sorted sample was increased to 90% from 20% in the as-electroporated sample.http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000312912800250&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=8e1609b174ce4e31116a60747a720701Engineering, Electrical & ElectronicNanoscience & NanotechnologyEICPCI-S(ISTP)

    Short-Term Traffic-Flow Forecasting Based on an Integrated Model Combining Bagging and Stacking Considering Weight Coefficient

    No full text
    This work proposed an integrated model combining bagging and stacking considering the weight coefficient for short-time traffic-flow prediction, which incorporates vacation and peak time features, as well as occupancy and speed information, in order to improve prediction accuracy and accomplish deeper traffic flow data feature mining. To address the limitations of a single prediction model in traffic forecasting, a stacking model with ridge regression as the meta-learner is first established, then the stacking model is optimized from the perspective of the learner using the bagging model, and lastly the optimized learner is embedded into the stacking model as the new base learner to obtain the Ba-Stacking model. Finally, to address the Ba-Stacking modelā€™s shortcomings in terms of low base learner utilization, the information structure of the base learners is modified by weighting the error coefficients while taking into account the modelā€™s external features, resulting in a DW-Ba-Stacking model that can change the weights of the base learners to adjust the feature distribution and thus improve utilization. Using 76,896 data from the I5NB highway as the empirical study object, the DW-Ba-Stacking model is compared and assessed with the traditional model in this paper. The empirical results show that the DW-Ba-Stacking model has the highest prediction accuracy, demonstrating that the model is successful in predicting short-term traffic flows and can effectively solve traffic-congestion problems

    Intrinsic Delocalization during the Decay of Excitons in Polymeric Solar Cells

    No full text
    In bulk heterojunction polymer solar cells, external photoexcitation results in localized excitons in the polymer chain. After hot exciton formation and subsequent relaxation, the dipole moment drives the electron to partially transfer to extended orbitals from the original localized ones, leading to self-delocalization. Based on the dynamic fluorescence spectra, the delocalization of excitons is revealed to be an intrinsic property dominated by exciton decay, acting as a bridge for the exciton to diffuse in the polymeric solar cell. The modification of the dipole moment enhances the efficiency of polymer solar cells
    • ā€¦
    corecore