153 research outputs found
The Entity-Deduction Arena: A playground for probing the conversational reasoning and planning capabilities of LLMs
Large language models (LLMs) are effective at answering questions that are
clearly asked. However, when faced with ambiguous queries they can act
unpredictably and produce incorrect outputs. This underscores the need for the
development of intelligent agents capable of asking clarification questions to
resolve ambiguities effectively. This capability requires complex
understanding, state tracking, reasoning and planning over multiple
conversational turns. However, directly measuring this can be challenging. In
this paper, we offer a surrogate problem which assesses an LLMs's capability to
deduce an entity unknown to itself, but revealed to a judge, by asking the
judge a series of queries. This entity-deducing game can serve as an evaluation
framework to probe the conversational reasoning and planning capabilities of
language models. We systematically evaluate various LLMs and discover
significant differences in their performance on this task. We find that strong
LLMs like GPT-4 outperform human players by a large margin. We further employ
Behavior Cloning (BC) to examine whether a weaker model is capable of imitating
a stronger model and generalizing to data or domains, using only the
demonstrations from a stronger model. We finally propose to use Reinforcement
Learning to enhance reasoning and planning capacity of Vicuna models through
episodes of game playing, which lead to significant performance improvement. We
hope that this problem offers insights into how autonomous agents could be
trained to behave more intelligently in ambiguous circumstances.Comment: 22 page
Statistical Inference For High Dimensional Models In Genomics And Microbiome
Human microbiome consists of all living microorganisms that are in and on human body. Largescale microbiome studies such as the NIH Human Microbiome Project (HMP), have shown that this complex ecosystem has large impact on human health through multiple ways. The analysis of these datasets leads to new statistical challenges that require the development of novel methodologies. Motivated by several microbiome studies, we develop several methods of statistical inference for high dimensional models to address the association between microbiome compositions and certain outcomes. The high-dimensionality and compositional nature of the microbiome data make the naive application of the classical regression models invalid. To study the association between microbiome
compositions with a disease’s risk, we develop a generalized linear model with linear constraints on regression coefficients and a related debiased procedure to obtain asymptotically unbiased and normally distributed estimates. Application of this method to an inflammatory bowel disease (IBD) study identifies several gut bacterial species that are associated with the risk of IBD. We also consider the post-selection inference for models with linear equality constraints, where we develop methods for constructing the confidence intervals for the selected non-zero coefficients chosen by a Lasso-type estimator with linear constraints. These confidence intervals are shown to have desired coverage probabilities when conditioned on the selected model. Finally, the last chapter of this dissertation presents a method for inference of high dimensional instrumental variable regression. Gene expression and phenotype association can be affected by potential unmeasured confounders, leading to biased estimates of the associations. Using genetic variants as instruments, we consider the problem of hypothesis testing for sparse IV regression models and present methods for testing both single and multiple regression coefficients. A multiple testing procedure is developed for selecting variables and is shown to control the false discovery rate. These methods are illustrated by an analysis of a yeast dataset in order to identify genes that are associated with growth in the presence of hydrogen peroxide
Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling
The dynamic nature of proteins is crucial for determining their biological
functions and properties, for which Monte Carlo (MC) and molecular dynamics
(MD) simulations stand as predominant tools to study such phenomena. By
utilizing empirically derived force fields, MC or MD simulations explore the
conformational space through numerically evolving the system via Markov chain
or Newtonian mechanics. However, the high-energy barrier of the force fields
can hamper the exploration of both methods by the rare event, resulting in
inadequately sampled ensemble without exhaustive running. Existing
learning-based approaches perform direct sampling yet heavily rely on
target-specific simulation data for training, which suffers from high data
acquisition cost and poor generalizability. Inspired by simulated annealing, we
propose Str2Str, a novel structure-to-structure translation framework capable
of zero-shot conformation sampling with roto-translation equivariant property.
Our method leverages an amortized denoising score matching objective trained on
general crystal structures and has no reliance on simulation data during both
training and inference. Experimental results across several benchmarking
protein systems demonstrate that Str2Str outperforms previous state-of-the-art
generative structure prediction models and can be orders of magnitude faster
compared to long MD simulations. Our open-source implementation is available at
https://github.com/lujiarui/Str2StrComment: Published as a conference paper at ICLR 2024, see
https://openreview.net/forum?id=C4BikKsgm
Fusing Neural and Physical: Augment Protein Conformation Sampling with Tractable Simulations
The protein dynamics are common and important for their biological functions
and properties, the study of which usually involves time-consuming molecular
dynamics (MD) simulations in silico. Recently, generative models has been
leveraged as a surrogate sampler to obtain conformation ensembles with orders
of magnitude faster and without requiring any simulation data (a "zero-shot"
inference). However, being agnostic of the underlying energy landscape, the
accuracy of such generative model may still be limited. In this work, we
explore the few-shot setting of such pre-trained generative sampler which
incorporates MD simulations in a tractable manner. Specifically, given a target
protein of interest, we first acquire some seeding conformations from the
pre-trained sampler followed by a number of physical simulations in parallel
starting from these seeding samples. Then we fine-tuned the generative model
using the simulation trajectories above to become a target-specific sampler.
Experimental results demonstrated the superior performance of such few-shot
conformation sampler at a tractable computational cost.Comment: Published at the GEM workshop, ICLR 202
NOC: High-Quality Neural Object Cloning with 3D Lifting of Segment Anything
With the development of the neural field, reconstructing the 3D model of a
target object from multi-view inputs has recently attracted increasing
attention from the community. Existing methods normally learn a neural field
for the whole scene, while it is still under-explored how to reconstruct a
certain object indicated by users on-the-fly. Considering the Segment Anything
Model (SAM) has shown effectiveness in segmenting any 2D images, in this paper,
we propose Neural Object Cloning (NOC), a novel high-quality 3D object
reconstruction method, which leverages the benefits of both neural field and
SAM from two aspects. Firstly, to separate the target object from the scene, we
propose a novel strategy to lift the multi-view 2D segmentation masks of SAM
into a unified 3D variation field. The 3D variation field is then projected
into 2D space and generates the new prompts for SAM. This process is iterative
until convergence to separate the target object from the scene. Then, apart
from 2D masks, we further lift the 2D features of the SAM encoder into a 3D SAM
field in order to improve the reconstruction quality of the target object. NOC
lifts the 2D masks and features of SAM into the 3D neural field for
high-quality target object reconstruction. We conduct detailed experiments on
several benchmark datasets to demonstrate the advantages of our method. The
code will be released
MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation
Automatic detection of multimodal misinformation has gained a widespread
attention recently. However, the potential of powerful Large Language Models
(LLMs) for multimodal misinformation detection remains underexplored. Besides,
how to teach LLMs to interpret multimodal misinformation in cost-effective and
accessible way is still an open question. To address that, we propose MMIDR, a
framework designed to teach LLMs in providing fluent and high-quality textual
explanations for their decision-making process of multimodal misinformation. To
convert multimodal misinformation into an appropriate instruction-following
format, we present a data augmentation perspective and pipeline. This pipeline
consists of a visual information processing module and an evidence retrieval
module. Subsequently, we prompt the proprietary LLMs with processed contents to
extract rationales for interpreting the authenticity of multimodal
misinformation. Furthermore, we design an efficient knowledge distillation
approach to distill the capability of proprietary LLMs in explaining multimodal
misinformation into open-source LLMs. To explore several research questions
regarding the performance of LLMs in multimodal misinformation detection tasks,
we construct an instruction-following multimodal misinformation dataset and
conduct comprehensive experiments. The experimental findings reveal that our
MMIDR exhibits sufficient detection performance and possesses the capacity to
provide compelling rationales to support its assessments.Comment: 10 pages, 3 figure
PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding
We are now witnessing significant progress of deep learning methods in a
variety of tasks (or datasets) of proteins. However, there is a lack of a
standard benchmark to evaluate the performance of different methods, which
hinders the progress of deep learning in this field. In this paper, we propose
such a benchmark called PEER, a comprehensive and multi-task benchmark for
Protein sEquence undERstanding. PEER provides a set of diverse protein
understanding tasks including protein function prediction, protein localization
prediction, protein structure prediction, protein-protein interaction
prediction, and protein-ligand interaction prediction. We evaluate different
types of sequence-based methods for each task including traditional feature
engineering approaches, different sequence encoding methods as well as
large-scale pre-trained protein language models. In addition, we also
investigate the performance of these methods under the multi-task learning
setting. Experimental results show that large-scale pre-trained protein
language models achieve the best performance for most individual tasks, and
jointly training multiple tasks further boosts the performance. The datasets
and source codes of this benchmark are all available at
https://github.com/DeepGraphLearning/PEER_BenchmarkComment: Accepted by NeurIPS 2022 Dataset and Benchmark Track. arXiv v2:
source code released; arXiv v1: release all benchmark result
5IDER: Unified Query Rewriting for Steering, Intent Carryover, Disfluencies, Entity Carryover and Repair
Providing voice assistants the ability to navigate multi-turn conversations
is a challenging problem. Handling multi-turn interactions requires the system
to understand various conversational use-cases, such as steering, intent
carryover, disfluencies, entity carryover, and repair. The complexity of this
problem is compounded by the fact that these use-cases mix with each other,
often appearing simultaneously in natural language. This work proposes a
non-autoregressive query rewriting architecture that can handle not only the
five aforementioned tasks, but also complex compositions of these use-cases. We
show that our proposed model has competitive single task performance compared
to the baseline approach, and even outperforms a fine-tuned T5 model in
use-case compositions, despite being 15 times smaller in parameters and 25
times faster in latency.Comment: Interspeech 202
- …