65 research outputs found
Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning
Reinforcement Learning (RL) methods are typically applied directly in
environments to learn policies. In some complex environments with continuous
state-action spaces, sparse rewards, and/or long temporal horizons, learning a
good policy in the original environments can be difficult. Focusing on the
offline RL setting, we aim to build a simple and discrete world model that
abstracts the original environment. RL methods are applied to our world model
instead of the environment data for simplified policy learning. Our world
model, dubbed Value Memory Graph (VMG), is designed as a directed-graph-based
Markov decision process (MDP) of which vertices and directed edges represent
graph states and graph actions, separately. As the state-action spaces of VMG
are finite and relatively small compared to the original environment, we can
directly apply the value iteration algorithm on VMG to estimate graph state
values and figure out the best graph actions. VMG is trained from and built on
the offline RL dataset. Together with an action translator that converts the
abstract graph actions in VMG to real actions in the original environment, VMG
controls agents to maximize episode returns. Our experiments on the D4RL
benchmark show that VMG can outperform state-of-the-art offline RL methods in
several tasks, especially when environments have sparse rewards and long
temporal horizons. Code will be made publicly available
Video ChatCaptioner: Towards the Enriched Spatiotemporal Descriptions
Video captioning aims to convey dynamic scenes from videos using natural
language, facilitating the understanding of spatiotemporal information within
our environment. Although there have been recent advances, generating detailed
and enriched video descriptions continues to be a substantial challenge. In
this work, we introduce Video ChatCaptioner, an innovative approach for
creating more comprehensive spatiotemporal video descriptions. Our method
employs a ChatGPT model as a controller, specifically designed to select frames
for posing video content-driven questions. Subsequently, a robust algorithm is
utilized to answer these visual queries. This question-answer framework
effectively uncovers intricate video details and shows promise as a method for
enhancing video content. Following multiple conversational rounds, ChatGPT can
summarize enriched video content based on previous conversations. We
qualitatively demonstrate that our Video ChatCaptioner can generate captions
containing more visual details about the videos. The code is publicly available
at https://github.com/Vision-CAIR/ChatCaptione
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
The recent GPT-4 has demonstrated extraordinary multi-modal abilities, such
as directly generating websites from handwritten text and identifying humorous
elements within images. These features are rarely observed in previous
vision-language models. We believe the primary reason for GPT-4's advanced
multi-modal generation capabilities lies in the utilization of a more advanced
large language model (LLM). To examine this phenomenon, we present MiniGPT-4,
which aligns a frozen visual encoder with a frozen LLM, Vicuna, using just one
projection layer. Our findings reveal that MiniGPT-4 possesses many
capabilities similar to those exhibited by GPT-4 like detailed image
description generation and website creation from hand-written drafts.
Furthermore, we also observe other emerging capabilities in MiniGPT-4,
including writing stories and poems inspired by given images, providing
solutions to problems shown in images, teaching users how to cook based on food
photos, etc. In our experiment, we found that only performing the pretraining
on raw image-text pairs could produce unnatural language outputs that lack
coherency including repetition and fragmented sentences. To address this
problem, we curate a high-quality, well-aligned dataset in the second stage to
finetune our model using a conversational template. This step proved crucial
for augmenting the model's generation reliability and overall usability.
Notably, our model is highly computationally efficient, as we only train a
projection layer utilizing approximately 5 million aligned image-text pairs.
Our code, pre-trained model, and collected dataset are available at
https://minigpt-4.github.io/.Comment: Project Website: https://minigpt-4.github.io/; Code, Pretrained
Model, and Dataset: https://github.com/Vision-CAIR/MiniGPT-4; Deyao Zhu and
Jun Chen contributed equally to this wor
Profiling of mismatch discrimination in RNAi enabled rational design of allele-specific siRNAs
Silencing specificity is a critical issue in the therapeutic applications of siRNA, particularly in the treatment of single nucleotide polymorphism (SNP) diseases where discrimination against single nucleotide variation is demanded. However, no generally applicable guidelines are available for the design of such allele-specific siRNAs. In this paper, the issue was approached by using a reporter-based assay. With a panel of 20 siRNAs and 240 variously mismatched target reporters, we first demonstrated that the mismatches were discriminated in a position-dependent order, which was however independent of their sequence contexts using position 4th, 12th and 17th as examples. A general model was further built for mismatch discrimination at all positions using 230 additional reporter constructs specifically designed to contain mismatches distributed evenly along the target regions of different siRNAs. This model was successfully employed to design allele-specific siRNAs targeting disease-causing mutations of PIK3CA gene at two SNP sites. Furthermore, conformational distortion of siRNA-target duplex was observed to correlate with the compromise of gene silencing. In summary, these findings could dramatically simplify the design of allele-specific siRNAs and might also provide guide to increase the specificity of therapeutic siRNAs
A flow-through electroporation chip integrated with viable cell sorting based on dielectrophoresis
Flow-through electroporation on a chip is a promising technique to introduce molecules into cells in applications of biological research, drug delivery and gene therapy. During the electroporation process, the electrical field also causes large amount of non-viable cells, which significantly influence the following biological procedure. Off-chip separation of the variable cells increases experimental complexity dramatically and results in further cell death. In this paper, we have developed a flow-through electroporation chip, in which dielectrophoresis (DEP) is employed to sort the viable cells and non-viable cells on chip. For the standard expression cell line HEK-293a (Human embryonic kidney cells), the ratio of the viable cells in the sorted sample was increased to 90% from 20% in the as-electroporated sample.http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000312912800250&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=8e1609b174ce4e31116a60747a720701Engineering, Electrical & ElectronicNanoscience & NanotechnologyEICPCI-S(ISTP)
Short-Term Traffic-Flow Forecasting Based on an Integrated Model Combining Bagging and Stacking Considering Weight Coefficient
This work proposed an integrated model combining bagging and stacking considering the weight coefficient for short-time traffic-flow prediction, which incorporates vacation and peak time features, as well as occupancy and speed information, in order to improve prediction accuracy and accomplish deeper traffic flow data feature mining. To address the limitations of a single prediction model in traffic forecasting, a stacking model with ridge regression as the meta-learner is first established, then the stacking model is optimized from the perspective of the learner using the bagging model, and lastly the optimized learner is embedded into the stacking model as the new base learner to obtain the Ba-Stacking model. Finally, to address the Ba-Stacking modelās shortcomings in terms of low base learner utilization, the information structure of the base learners is modified by weighting the error coefficients while taking into account the modelās external features, resulting in a DW-Ba-Stacking model that can change the weights of the base learners to adjust the feature distribution and thus improve utilization. Using 76,896 data from the I5NB highway as the empirical study object, the DW-Ba-Stacking model is compared and assessed with the traditional model in this paper. The empirical results show that the DW-Ba-Stacking model has the highest prediction accuracy, demonstrating that the model is successful in predicting short-term traffic flows and can effectively solve traffic-congestion problems
Intrinsic Delocalization during the Decay of Excitons in Polymeric Solar Cells
In bulk heterojunction polymer solar cells, external photoexcitation results in localized excitons in the polymer chain. After hot exciton formation and subsequent relaxation, the dipole moment drives the electron to partially transfer to extended orbitals from the original localized ones, leading to self-delocalization. Based on the dynamic fluorescence spectra, the delocalization of excitons is revealed to be an intrinsic property dominated by exciton decay, acting as a bridge for the exciton to diffuse in the polymeric solar cell. The modification of the dipole moment enhances the efficiency of polymer solar cells
- ā¦