129 research outputs found
Agents meet OKR: An Object and Key Results Driven Agent System with Hierarchical Self-Collaboration and Self-Evaluation
In this study, we introduce the concept of OKR-Agent designed to enhance the
capabilities of Large Language Models (LLMs) in task-solving. Our approach
utilizes both self-collaboration and self-correction mechanism, facilitated by
hierarchical agents, to address the inherent complexities in task-solving. Our
key observations are two-fold: first, effective task-solving demands in-depth
domain knowledge and intricate reasoning, for which deploying specialized
agents for individual sub-tasks can markedly enhance LLM performance. Second,
task-solving intrinsically adheres to a hierarchical execution structure,
comprising both high-level strategic planning and detailed task execution.
Towards this end, our OKR-Agent paradigm aligns closely with this hierarchical
structure, promising enhanced efficacy and adaptability across a range of
scenarios. Specifically, our framework includes two novel modules: hierarchical
Objects and Key Results generation and multi-level evaluation, each
contributing to more efficient and robust task-solving. In practical,
hierarchical OKR generation decomposes Objects into multiple sub-Objects and
assigns new agents based on key results and agent responsibilities. These
agents subsequently elaborate on their designated tasks and may further
decompose them as necessary. Such generation operates recursively and
hierarchically, culminating in a comprehensive set of detailed solutions. The
multi-level evaluation module of OKR-Agent refines solution by leveraging
feedback from all associated agents, optimizing each step of the process. This
ensures solution is accurate, practical, and effectively address intricate task
requirements, enhancing the overall reliability and quality of the outcome.
Experimental results also show our method outperforms the previous methods on
several tasks. Code and demo are available at https://okr-agent.github.io
Embedding Heterogeneous Networks into Hyperbolic Space Without Meta-path
Networks found in the real-world are numerous and varied. A common type of
network is the heterogeneous network, where the nodes (and edges) can be of
different types. Accordingly, there have been efforts at learning
representations of these heterogeneous networks in low-dimensional space.
However, most of the existing heterogeneous network embedding methods suffer
from the following two drawbacks: (1) The target space is usually Euclidean.
Conversely, many recent works have shown that complex networks may have
hyperbolic latent anatomy, which is non-Euclidean. (2) These methods usually
rely on meta-paths, which require domain-specific prior knowledge for meta-path
selection. Additionally, different down-streaming tasks on the same network
might require different meta-paths in order to generate task-specific
embeddings. In this paper, we propose a novel self-guided random walk method
that does not require meta-path for embedding heterogeneous networks into
hyperbolic space. We conduct thorough experiments for the tasks of network
reconstruction and link prediction on two public datasets, showing that our
model outperforms a variety of well-known baselines across all tasks.Comment: In proceedings of the 35th AAAI Conference on Artificial Intelligenc
Implicit Neural Deformation for Sparse-View Face Reconstruction
In this work, we present a new method for 3D face reconstruction from
sparse-view RGB images. Unlike previous methods which are built upon 3D
morphable models (3DMMs) with limited details, we leverage an implicit
representation to encode rich geometric features. Our overall pipeline consists
of two major components, including a geometry network, which learns a
deformable neural signed distance function (SDF) as the 3D face representation,
and a rendering network, which learns to render on-surface points of the neural
SDF to match the input images via self-supervised optimization. To handle
in-the-wild sparse-view input of the same target with different expressions at
test time, we propose residual latent code to effectively expand the shape
space of the learned implicit face representation as well as a novel
view-switch loss to enforce consistency among different views. Our experimental
results on several benchmark datasets demonstrate that our approach outperforms
alternative baselines and achieves superior face reconstruction results
compared to state-of-the-art methods.Comment: 10 pages, 6 figures, The 30th Pacific Conference on Computer Graphics
and Applications. Pacific Graphics(PG) 202
Multi-Modal Face Stylization with a Generative Prior
In this work, we introduce a new approach for artistic face stylization.
Despite existing methods achieving impressive results in this task, there is
still room for improvement in generating high-quality stylized faces with
diverse styles and accurate facial reconstruction. Our proposed framework,
MMFS, supports multi-modal face stylization by leveraging the strengths of
StyleGAN and integrates it into an encoder-decoder architecture. Specifically,
we use the mid-resolution and high-resolution layers of StyleGAN as the decoder
to generate high-quality faces, while aligning its low-resolution layer with
the encoder to extract and preserve input facial details. We also introduce a
two-stage training strategy, where we train the encoder in the first stage to
align the feature maps with StyleGAN and enable a faithful reconstruction of
input faces. In the second stage, the entire network is fine-tuned with
artistic data for stylized face generation. To enable the fine-tuned model to
be applied in zero-shot and one-shot stylization tasks, we train an additional
mapping network from the large-scale Contrastive-Language-Image-Pre-training
(CLIP) space to a latent space of fine-tuned StyleGAN. Qualitative and
quantitative experiments show that our framework achieves superior face
stylization performance in both one-shot and zero-shot stylization tasks,
outperforming state-of-the-art methods by a large margin
Task-Aware Sampling Layer for Point-Wise Analysis
Sampling, grouping, and aggregation are three important components in the
multi-scale analysis of point clouds. In this paper, we present a novel
data-driven sampler learning strategy for point-wise analysis tasks. Unlike the
widely used sampling technique, Farthest Point Sampling (FPS), we propose to
learn sampling and downstream applications jointly. Our key insight is that
uniform sampling methods like FPS are not always optimal for different tasks:
sampling more points around boundary areas can make the point-wise
classification easier for segmentation. Towards this end, we propose a novel
sampler learning strategy that learns sampling point displacement supervised by
task-related ground truth information and can be trained jointly with the
underlying tasks. We further demonstrate our methods in various point-wise
analysis tasks, including semantic part segmentation, point cloud completion,
and keypoint detection. Our experiments show that jointly learning of the
sampler and task brings better performance than using FPS in various
point-based networks.Comment: 14 pages, 13 figures and 14 table
An Empirical Survey of Unsupervised Text Representation Methods on Twitter Data
The field of NLP has seen unprecedented achievements in recent years. Most
notably, with the advent of large-scale pre-trained Transformer-based language
models, such as BERT, there has been a noticeable improvement in text
representation. It is, however, unclear whether these improvements translate to
noisy user-generated text, such as tweets. In this paper, we present an
experimental survey of a wide range of well-known text representation
techniques for the task of text clustering on noisy Twitter data. Our results
indicate that the more advanced models do not necessarily work best on tweets
and that more exploration in this area is needed.Comment: In proceedings of the 6th Workshop on Noisy User-generated Text
(W-NUT) at EMNLP 202
- …