257 research outputs found
TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation
The emergence of Large Language Models (LLMs) like ChatGPT has inspired the
development of LLM-based agents capable of addressing complex, real-world
tasks. However, these agents often struggle during task execution due to
methodological constraints, such as error propagation and limited adaptability.
To address this issue, we propose a multi-agent framework based on dynamic Task
Decomposition and Agent Generation (TDAG). This framework dynamically
decomposes complex tasks into smaller subtasks and assigns each to a
specifically generated subagent, thereby enhancing adaptability in diverse and
unpredictable real-world tasks. Simultaneously, existing benchmarks often lack
the granularity needed to evaluate incremental progress in complex, multi-step
tasks. In response, we introduce ItineraryBench in the context of travel
planning, featuring interconnected, progressively complex tasks with a
fine-grained evaluation system. ItineraryBench is designed to assess agents'
abilities in memory, planning, and tool usage across tasks of varying
complexity. Our experimental results reveal that TDAG significantly outperforms
established baselines, showcasing its superior adaptability and context
awareness in complex task scenarios
Perceive, Excavate and Purify: A Novel Object Mining Framework for Instance Segmentation
Recently, instance segmentation has made great progress with the rapid
development of deep neural networks. However, there still exist two main
challenges including discovering indistinguishable objects and modeling the
relationship between instances. To deal with these difficulties, we propose a
novel object mining framework for instance segmentation. In this framework, we
first introduce the semantics perceiving subnetwork to capture pixels that may
belong to an obvious instance from the bottom up. Then, we propose an object
excavating mechanism to discover indistinguishable objects. In the mechanism,
preliminary perceived semantics are regarded as original instances with
classifications and locations, and then indistinguishable objects around these
original instances are mined, which ensures that hard objects are fully
excavated. Next, an instance purifying strategy is put forward to model the
relationship between instances, which pulls the similar instances close and
pushes away different instances to keep intra-instance similarity and
inter-instance discrimination. In this manner, the same objects are combined as
the one instance and different objects are distinguished as independent
instances. Extensive experiments on the COCO dataset show that the proposed
approach outperforms state-of-the-art methods, which validates the
effectiveness of the proposed object mining framework.Comment: Accepted by CVPR Workshops 202
Motion-state Alignment for Video Semantic Segmentation
In recent years, video semantic segmentation has made great progress with
advanced deep neural networks. However, there still exist two main challenges
\ie, information inconsistency and computation cost. To deal with the two
difficulties, we propose a novel motion-state alignment framework for video
semantic segmentation to keep both motion and state consistency. In the
framework, we first construct a motion alignment branch armed with an efficient
decoupled transformer to capture dynamic semantics, guaranteeing region-level
temporal consistency. Then, a state alignment branch composed of a stage
transformer is designed to enrich feature spaces for the current frame to
extract static semantics and achieve pixel-level state consistency. Next, by a
semantic assignment mechanism, the region descriptor of each semantic category
is gained from dynamic semantics and linked with pixel descriptors from static
semantics. Benefiting from the alignment of these two kinds of effective
information, the proposed method picks up dynamic and static semantics in a
targeted way, so that video semantic regions are consistently segmented to
obtain precise locations with low computational complexity. Extensive
experiments on Cityscapes and CamVid datasets show that the proposed approach
outperforms state-of-the-art methods and validates the effectiveness of the
motion-state alignment framework.Comment: Accepted by CVPR Workshops 202
Text2Street: Controllable Text-to-image Generation for Street Views
Text-to-image generation has made remarkable progress with the emergence of
diffusion models. However, it is still a difficult task to generate images for
street views based on text, mainly because the road topology of street scenes
is complex, the traffic status is diverse and the weather condition is various,
which makes conventional text-to-image models difficult to deal with. To
address these challenges, we propose a novel controllable text-to-image
framework, named \textbf{Text2Street}. In the framework, we first introduce the
lane-aware road topology generator, which achieves text-to-map generation with
the accurate road structure and lane lines armed with the counting adapter,
realizing the controllable road topology generation. Then, the position-based
object layout generator is proposed to obtain text-to-layout generation through
an object-level bounding box diffusion strategy, realizing the controllable
traffic object layout generation. Finally, the multiple control image generator
is designed to integrate the road topology, object layout and weather
description to realize controllable street-view image generation. Extensive
experiments show that the proposed approach achieves controllable street-view
text-to-image generation and validates the effectiveness of the Text2Street
framework for street views
Revisiting Non-Autoregressive Translation at Scale
In real-world systems, scaling has been critical for improving the
translation quality in autoregressive translation (AT), which however has not
been well studied for non-autoregressive translation (NAT). In this work, we
bridge the gap by systematically studying the impact of scaling on NAT
behaviors. Extensive experiments on six WMT benchmarks over two advanced NAT
models show that scaling can alleviate the commonly-cited weaknesses of NAT
models, resulting in better translation performance. To reduce the side-effect
of scaling on decoding speed, we empirically investigate the impact of NAT
encoder and decoder on the translation performance. Experimental results on the
large-scale WMT20 En-De show that the asymmetric architecture (e.g. bigger
encoder and smaller decoder) can achieve comparable performance with the
scaling model, while maintaining the superiority of decoding speed with
standard NAT models. To this end, we establish a new benchmark by validating
scaled NAT models on the scaled dataset, which can be regarded as a strong
baseline for future works. We release code and system outputs at
https://github.com/DeepLearnXMU/Scaling4NAT.Comment: 13 pages, Findings of ACL 202
Changes in interleukin-27 levels in patients with acute coronary syndrome and their clinical significance
Background This study evaluated changes in interleukin (IL)-27 levels in patients with acute coronary syndrome (ACS) and their influence on Th1, Th2, and Th17 cells. Methods Serum levels of IL-27, IL-4, IL-17, and interferon (IFN)-γ in healthy subjects as well as patients with ACS, including stable angina pectoris (SA), unstable angina pectoris (UA), and acute myocardial infarction (AMI), were determined using an enzyme-linked immunosorbent assay. The proportions of Th1, Th2, and Th17 cells among peripheral blood mononuclear cells (PBMCs), were measured using flow cytometry, after incubation with phorbol myristate acetate (PMA) for 4 h. The proportions of Th1 and Th17 cells among PBMCs in AMI and UA were detected after stimulation with IL-27 or PMA + IL-27 for 4, 8, and 12 h. Results Serum levels of IL-27 in patients with AMI and UA were significantly lower than those in SA and control groups, while serum levels of IL-17 and IFN-γ in AMI and UA groups were dramatically increased compared to those in SA and healthy control groups. However, there were no statistically significant differences in serum IL-4. The proportions of Th1 and Th17 cells among PBMCs were statistically significantly higher in the AMI and UA groups than those in the SA and control groups, while there was no statistically significant difference in the proportion of Th2 cells among different groups. For patients with AMI and UA, the effect of co-stimulation of PBMCs with PMA and IL-27 was not significantly different from that of PMA single stimulation, while PMA + IL-27 co-stimulation lowered the Th17 cell proportion significantly compared to PMA single stimulation. Discussion Compared to SA patients and healthy controls, patients with ACS (AMI + UA) had lower serum levels of IL-27 and higher proportions of PBMC Th1 and Th17 cells, which could be attributed to the inhibitory effects of IL-27 on the proliferation of Th17 cells. These results indicated that IL-27 could be a novel therapeutic target in ACS patients
Representation Learning with Large Language Models for Recommendation
Recommender systems have seen significant advancements with the influence of
deep learning and graph neural networks, particularly in capturing complex
user-item relationships. However, these graph-based recommenders heavily depend
on ID-based data, potentially disregarding valuable textual information
associated with users and items, resulting in less informative learned
representations. Moreover, the utilization of implicit feedback data introduces
potential noise and bias, posing challenges for the effectiveness of user
preference learning. While the integration of large language models (LLMs) into
traditional ID-based recommenders has gained attention, challenges such as
scalability issues, limitations in text-only reliance, and prompt input
constraints need to be addressed for effective implementation in practical
recommender systems. To address these challenges, we propose a model-agnostic
framework RLMRec that aims to enhance existing recommenders with LLM-empowered
representation learning. It proposes a recommendation paradigm that integrates
representation learning with LLMs to capture intricate semantic aspects of user
behaviors and preferences. RLMRec incorporates auxiliary textual signals,
develops a user/item profiling paradigm empowered by LLMs, and aligns the
semantic space of LLMs with the representation space of collaborative
relational signals through a cross-view alignment framework. This work further
establish a theoretical foundation demonstrating that incorporating textual
signals through mutual information maximization enhances the quality of
representations. In our evaluation, we integrate RLMRec with state-of-the-art
recommender models, while also analyzing its efficiency and robustness to noise
data. Our implementation codes are available at
https://github.com/HKUDS/RLMRec.Comment: Published as a WWW'24 full pape
Metallic surface states in a correlated d-electron topological Kondo insulator candidate FeSb2
The resistance of a conventional insulator diverges as temperature approaches
zero. The peculiar low temperature resistivity saturation in the 4f Kondo
insulator (KI) SmB6 has spurred proposals of a correlation-driven topological
Kondo insulator (TKI) with exotic ground states. However, the scarcity of model
TKI material families leaves difficulties in disentangling key ingredients from
irrelevant details. Here we use angle-resolved photoemission spectroscopy
(ARPES) to study FeSb2, a correlated d-electron KI candidate that also exhibits
a low temperature resistivity saturation. On the (010) surface, we find a rich
assemblage of metallic states with two-dimensional dispersion. Measurements of
the bulk band structure reveal band renormalization, a large
temperature-dependent band shift, and flat spectral features along certain high
symmetry directions, providing spectroscopic evidence for strong correlations.
Our observations suggest that exotic insulating states resembling those in SmB6
and YbB12 may also exist in systems with d instead of f electrons
- …