73 research outputs found
VIM: Probing Multimodal Large Language Models for Visual Embedded Instruction Following
We introduce VISUAL EMBEDDED INSTRUCTION (VIM), a new framework designed to
evaluate the visual instruction following capability of Multimodal Large
Language Models (MLLMs). As illustrated in Figure 2, VIM challenges the MLLMs
by embedding the instructions into the visual scenes, demanding strong visual
interpretative skills for instruction following. We adapt VIM to various
benchmarks, including VQAv2, MME, MM-Vet, and RefCOCO series, compose a VIM
bench, and probe diverse MLLMs across three distinct in-context learning
settings: Zero Shot, One Shot, and Pair Shot. We observe that there is a
significant performance disparity between the open-source MLLMs and GPT-4V,
implying that their proficiency in visual instruction comprehension is not up
to par. Our results highlight a promising direction for the enhancement of
MLLMs capabilities on instruction following. We aim VIM to serve as a useful
norm for advancing the state of the art and driving further progress in the
field.Comment: 20 pages, 8 figures, 20 table
CancerGPT: Few-shot Drug Pair Synergy Prediction using Large Pre-trained Language Models
Large pre-trained language models (LLMs) have been shown to have significant
potential in few-shot learning across various fields, even with minimal
training data. However, their ability to generalize to unseen tasks in more
complex fields, such as biology, has yet to be fully evaluated. LLMs can offer
a promising alternative approach for biological inference, particularly in
cases where structured data and sample size are limited, by extracting prior
knowledge from text corpora. Our proposed few-shot learning approach uses LLMs
to predict the synergy of drug pairs in rare tissues that lack structured data
and features. Our experiments, which involved seven rare tissues from different
cancer types, demonstrated that the LLM-based prediction model achieved
significant accuracy with very few or zero samples. Our proposed model, the
CancerGPT (with 124M parameters), was even comparable to the larger
fine-tuned GPT-3 model (with 175B parameters). Our research is the first
to tackle drug pair synergy prediction in rare tissues with limited data. We
are also the first to utilize an LLM-based prediction model for biological
reaction prediction tasks
Anticancer drug synergy prediction in understudied tissues using transfer learning
ocaa212Objective: Drug combination screening has advantages in identifying cancer treatment options with higher efficacy without degradation in terms of safety. A key challenge is that the accumulated number of observations in in-vitro drug responses varies greatly among different cancer types, where some tissues are more understudied than the others. Thus, we aim to develop a drug synergy prediction model for understudied tissues as a way of overcoming data scarcity problems. Materials and Methods: We collected a comprehensive set of genetic, molecular, phenotypic features for cancer cell lines. We developed a drug synergy prediction model based on multitask deep neural networks to integrate multimodal input and multiple output. We also utilized transfer learning from data-rich tissues to data-poor tissues. Results: We showed improved accuracy in predicting synergy in both data-rich tissues and understudied tissues. In data-rich tissue, the prediction model accuracy was 0.9577 AUROC for binarized classification task and 174.3 mean squared error for regression task. We observed that an adequate transfer learning strategy significantly increases accuracy in the understudied tissues. Conclusions: Our synergy prediction model can be used to rank synergistic drug combinations in understudied tissues and thus help to prioritize future in-vitro experiments. Code is available at https://github.com/yejinjkim/synergy-transfer.Peer reviewe
DHRL-FNMR: An Intelligent Multicast Routing Approach Based on Deep Hierarchical Reinforcement Learning in SDN
The optimal multicast tree problem in the Software-Defined Networking (SDN)
multicast routing is an NP-hard combinatorial optimization problem. Although
existing SDN intelligent solution methods, which are based on deep
reinforcement learning, can dynamically adapt to complex network link state
changes, these methods are plagued by problems such as redundant branches,
large action space, and slow agent convergence. In this paper, an SDN
intelligent multicast routing algorithm based on deep hierarchical
reinforcement learning is proposed to circumvent the aforementioned problems.
First, the multicast tree construction problem is decomposed into two
sub-problems: the fork node selection problem and the construction of the
optimal path from the fork node to the destination node. Second, based on the
information characteristics of SDN global network perception, the multicast
tree state matrix, link bandwidth matrix, link delay matrix, link packet loss
rate matrix, and sub-goal matrix are designed as the state space of intrinsic
and meta controllers. Then, in order to mitigate the excessive action space,
our approach constructs different action spaces at the upper and lower levels.
The meta-controller generates an action space using network nodes to select the
fork node, and the intrinsic controller uses the adjacent edges of the current
node as its action space, thus implementing four different action selection
strategies in the construction of the multicast tree. To facilitate the
intelligent agent in constructing the optimal multicast tree with greater
speed, we developed alternative reward strategies that distinguish between
single-step node actions and multi-step actions towards multiple destination
nodes
In Search of the Long-Tail: Systematic Generation of Long-Tail Knowledge via Logical Rule Guided Search
Since large language models have approached human-level performance on many
tasks, it has become increasingly harder for researchers to find tasks that are
still challenging to the models. Failure cases usually come from the long-tail
distribution - data that an oracle language model could assign a probability on
the lower end of its distribution. Current methodology such as prompt
engineering or crowdsourcing are insufficient for creating long-tail examples
because humans are constrained by cognitive bias. We propose a
Logic-Induced-Knowledge-Search (LINK) framework for systematically generating
long-tail knowledge statements. Grounded by a symbolic rule, we search for
long-tail values for each variable of the rule by first prompting a LLM, then
verifying the correctness of the values with a critic, and lastly pushing for
the long-tail distribution with a reranker. With this framework we construct a
dataset, Logic-Induced-Long-Tail (LINT), consisting of 200 symbolic rules and
50K knowledge statements spanning across four domains. Human annotations find
that 84% of the statements in LINT are factually correct. In contrast, ChatGPT
and GPT4 struggle with directly generating long-tail statements under the
guidance of logic rules, each only getting 56% and 78% of their statements
correct. Moreover, their "long-tail" generations in fact fall into the higher
likelihood range, and thus are not really long-tail. Our findings suggest that
LINK is effective for generating data in the long-tail distribution while
enforcing quality. LINT can be useful for systematically evaluating LLMs'
capabilities in the long-tail distribution. We challenge the models with a
simple entailment classification task using samples from LINT. We find that
ChatGPT and GPT4's capability in identifying incorrect knowledge drop by ~3% in
the long-tail distribution compared to head distribution
- …