105 research outputs found
Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs
This paper considers the best policy identification (BPI) problem in online
Constrained Markov Decision Processes (CMDPs). We are interested in algorithms
that are model-free, have low regret, and identify an optimal policy with a
high probability. Existing model-free algorithms for online CMDPs with
sublinear regret and constraint violation do not provide any convergence
guarantee to an optimal policy and provide only average performance guarantees
when a policy is uniformly sampled at random from all previously used policies.
In this paper, we develop a new algorithm, named
Pruning-Refinement-Identification (PRI), based on a fundamental structural
property of CMDPs proved in Koole(1988); Ross(1989), which we call limited
stochasticity. The property says for a CMDP with constraints, there exists
an optimal policy with at most stochastic decisions.
The proposed algorithm first identifies at which step and in which state a
stochastic decision has to be taken and then fine-tunes the distributions of
these stochastic decisions. PRI achieves trio objectives: (i) PRI is a
model-free algorithm; and (ii) it outputs a near-optimal policy with a high
probability at the end of learning; and (iii) in the tabular setting, PRI
guarantees regret and constraint violation,
which significantly improves the best existing regret bound
under a model-free algorithm, where
is the total number of episodes
Computational and Statistical Thresholds in Multi-layer Stochastic Block Models
We study the problem of community recovery and detection in multi-layer
stochastic block models, focusing on the critical network density threshold for
consistent community structure inference. Using a prototypical two-block model,
we reveal a computational barrier for such multi-layer stochastic block models
that does not exist for its single-layer counterpart: When there are no
computational constraints, the density threshold depends linearly on the number
of layers. However, when restricted to polynomial-time algorithms, the density
threshold scales with the square root of the number of layers, assuming
correctness of a low-degree polynomial hardness conjecture. Our results provide
a nearly complete picture of the optimal inference in multiple-layer stochastic
block models and partially settle the open question in Lei and Lin (2022)
regarding the optimality of the bias-adjusted spectral method.Comment: 31 page
ChatGPT Informed Graph Neural Network for Stock Movement Prediction
ChatGPT has demonstrated remarkable capabilities across various natural
language processing (NLP) tasks. However, its potential for inferring dynamic
network structures from temporal textual data, specifically financial news,
remains an unexplored frontier. In this research, we introduce a novel
framework that leverages ChatGPT's graph inference capabilities to enhance
Graph Neural Networks (GNN). Our framework adeptly extracts evolving network
structures from textual data, and incorporates these networks into graph neural
networks for subsequent predictive tasks. The experimental results from stock
movement forecasting indicate our model has consistently outperformed the
state-of-the-art Deep Learning-based benchmarks. Furthermore, the portfolios
constructed based on our model's outputs demonstrate higher annualized
cumulative returns, alongside reduced volatility and maximum drawdown. This
superior performance highlights the potential of ChatGPT for text-based network
inferences and underscores its promising implications for the financial sector.Comment: Under Review. 10 pages, 2 figure
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant
It is still a pipe dream that personal AI assistants on the phone and AR
glasses can assist our daily life in addressing our questions like ``how to
adjust the date for this watch?'' and ``how to set its heating duration? (while
pointing at an oven)''. The queries used in conventional tasks (i.e. Video
Question Answering, Video Retrieval, Moment Localization) are often factoid and
based on pure text. In contrast, we present a new task called Task-oriented
Question-driven Video Segment Retrieval (TQVSR). Each of our questions is an
image-box-text query that focuses on affordance of items in our daily life and
expects relevant answer segments to be retrieved from a corpus of instructional
video-transcript segments. To support the study of this TQVSR task, we
construct a new dataset called AssistSR. We design novel guidelines to create
high-quality samples. This dataset contains 3.2k multimodal questions on 1.6k
video segments from instructional videos on diverse daily-used items. To
address TQVSR, we develop a simple yet effective model called Dual Multimodal
Encoders (DME) that significantly outperforms several baseline methods while
still having large room for improvement in the future. Moreover, we present
detailed ablation analyses. Code and data are available at
\url{https://github.com/StanLei52/TQVSR}.Comment: 20 pages, 12 figure
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Recent research on Large Language Models (LLMs) has led to remarkable
advancements in general NLP AI assistants. Some studies have further explored
the use of LLMs for planning and invoking models or APIs to address more
general multi-modal user queries. Despite this progress, complex visual-based
tasks still remain challenging due to the diverse nature of visual tasks. This
diversity is reflected in two aspects: 1) Reasoning paths. For many real-life
applications, it is hard to accurately decompose a query simply by examining
the query itself. Planning based on the specific visual content and the results
of each step is usually required. 2) Flexible inputs and intermediate results.
Input forms could be flexible for in-the-wild cases, and involves not only a
single image or video but a mixture of videos and images, e.g., a user-view
image with some reference videos. Besides, a complex reasoning process will
also generate diverse multimodal intermediate results, e.g., video narrations,
segmented video clips, etc. To address such general cases, we propose a
multi-modal AI assistant, AssistGPT, with an interleaved code and language
reasoning approach called Plan, Execute, Inspect, and Learn (PEIL) to integrate
LLMs with various tools. Specifically, the Planner is capable of using natural
language to plan which tool in Executor should do next based on the current
reasoning progress. Inspector is an efficient memory manager to assist the
Planner to feed proper visual information into a specific tool. Finally, since
the entire reasoning process is complex and flexible, a Learner is designed to
enable the model to autonomously explore and discover the optimal solution. We
conducted experiments on A-OKVQA and NExT-QA benchmarks, achieving
state-of-the-art results. Moreover, showcases demonstrate the ability of our
system to handle questions far more complex than those found in the benchmarks.Comment: Project page: https://showlab.github.io/assistgpt
- …