63 research outputs found
DCA: Diversified Co-Attention towards Informative Live Video Commenting
We focus on the task of Automatic Live Video Commenting (ALVC), which aims to
generate real-time video comments with both video frames and other viewers'
comments as inputs. A major challenge in this task is how to properly leverage
the rich and diverse information carried by video and text. In this paper, we
aim to collect diversified information from video and text for informative
comment generation. To achieve this, we propose a Diversified Co-Attention
(DCA) model for this task. Our model builds bidirectional interactions between
video frames and surrounding comments from multiple perspectives via metric
learning, to collect a diversified and informative context for comment
generation. We also propose an effective parameter orthogonalization technique
to avoid excessive overlap of information learned from different perspectives.
Results show that our approach outperforms existing methods in the ALVC task,
achieving new state-of-the-art results
E4SRec: An Elegant Effective Efficient Extensible Solution of Large Language Models for Sequential Recommendation
The recent advancements in Large Language Models (LLMs) have sparked interest
in harnessing their potential within recommender systems. Since LLMs are
designed for natural language tasks, existing recommendation approaches have
predominantly transformed recommendation tasks into open-domain natural
language generation tasks. However, this approach necessitates items to possess
rich semantic information, often generates out-of-range results, and suffers
from notably low efficiency and limited extensibility. Furthermore, practical
ID-based recommendation strategies, reliant on a huge number of unique
identities (IDs) to represent users and items, have gained prominence in
real-world recommender systems due to their effectiveness and efficiency.
Nevertheless, the incapacity of LLMs to model IDs presents a formidable
challenge when seeking to leverage LLMs for personalized recommendations. In
this paper, we introduce an Elegant Effective Efficient Extensible solution for
large language models for Sequential Recommendation (E4SRec), which seamlessly
integrates LLMs with traditional recommender systems that exclusively utilize
IDs to represent items. Specifically, E4SRec takes ID sequences as inputs,
ensuring that the generated outputs fall within the candidate lists.
Furthermore, E4SRec possesses the capability to generate the entire ranking
list in a single forward process, and demands only a minimal set of pluggable
parameters, which are trained for each dataset while keeping the entire LLM
frozen. We substantiate the effectiveness, efficiency, and extensibility of our
proposed E4SRec through comprehensive experiments conducted on four widely-used
real-world datasets. The implementation code is accessible at
https://github.com/HestiaSky/E4SRec/
Fair Division of Mixed Divisible and Indivisible Goods
We study the problem of fair division when the resources contain both
divisible and indivisible goods. Classic fairness notions such as envy-freeness
(EF) and envy-freeness up to one good (EF1) cannot be directly applied to the
mixed goods setting. In this work, we propose a new fairness notion
envy-freeness for mixed goods (EFM), which is a direct generalization of both
EF and EF1 to the mixed goods setting. We prove that an EFM allocation always
exists for any number of agents. We also propose efficient algorithms to
compute an EFM allocation for two agents and for agents with piecewise
linear valuations over the divisible goods. Finally, we relax the envy-free
requirement, instead asking for -envy-freeness for mixed goods
(-EFM), and present an algorithm that finds an -EFM
allocation in time polynomial in the number of agents, the number of
indivisible goods, and .Comment: Appears in the 34th AAAI Conference on Artificial Intelligence
(AAAI), 202
Balanced Order Batching with Task-Oriented Graph Clustering
Balanced order batching problem (BOBP) arises from the process of warehouse
picking in Cainiao, the largest logistics platform in China. Batching orders
together in the picking process to form a single picking route, reduces travel
distance. The reason for its importance is that order picking is a labor
intensive process and, by using good batching methods, substantial savings can
be obtained. The BOBP is a NP-hard combinational optimization problem and
designing a good problem-specific heuristic under the quasi-real-time system
response requirement is non-trivial. In this paper, rather than designing
heuristics, we propose an end-to-end learning and optimization framework named
Balanced Task-orientated Graph Clustering Network (BTOGCN) to solve the BOBP by
reducing it to balanced graph clustering optimization problem. In BTOGCN, a
task-oriented estimator network is introduced to guide the type-aware
heterogeneous graph clustering networks to find a better clustering result
related to the BOBP objective. Through comprehensive experiments on
single-graph and multi-graphs, we show: 1) our balanced task-oriented graph
clustering network can directly utilize the guidance of target signal and
outperforms the two-stage deep embedding and deep clustering method; 2) our
method obtains an average 4.57m and 0.13m picking distance ("m" is the
abbreviation of the meter (the SI base unit of length)) reduction than the
expert-designed algorithm on single and multi-graph set and has a good
generalization ability to apply in practical scenario.Comment: 10 pages, 6 figure
CryoFormer: Continuous Reconstruction of 3D Structures from Cryo-EM Data using Transformer-based Neural Representations
High-resolution heterogeneous reconstruction of 3D structures of proteins and
other biomolecules using cryo-electron microscopy (cryo-EM) is essential for
understanding fundamental processes of life. However, it is still challenging
to reconstruct the continuous motions of 3D structures from hundreds of
thousands of noisy and randomly oriented 2D cryo-EM images. Existing methods
based on coordinate-based neural networks show compelling results to model
continuous conformations of 3D structures in the Fourier domain, but they
suffer from a limited ability to model local flexible regions and lack
interpretability. We propose a novel approach, cryoFormer, that utilizes a
transformer-based network architecture for continuous heterogeneous cryo-EM
reconstruction. We for the first time directly reconstruct continuous
conformations of 3D structures using an implicit feature volume in the 3D
spatial domain. A novel deformation transformer decoder further improves
reconstruction quality and, more importantly, locates and robustly tackles
flexible 3D regions caused by conformations. In experiments, our method
outperforms current approaches on three public datasets (1 synthetic and 2
experimental) and a new synthetic dataset of PEDV spike protein. The code and
new synthetic dataset will be released for better reproducibility of our
results. Project page: https://cryoformer.github.io
CapsuleBot: A Novel Compact Hybrid Aerial-Ground Robot with Two Actuated-wheel-rotors
This paper presents the design, modeling, and experimental validation of
CapsuleBot, a compact hybrid aerial-ground vehicle designed for long-term
covert reconnaissance. CapsuleBot combines the manoeuvrability of bicopter in
the air with the energy efficiency and noise reduction of ground vehicles on
the ground. To accomplish this, a structure named actuated-wheel-rotor has been
designed, utilizing a sole motor for both the unilateral rotor tilting in the
bicopter configuration and the wheel movement in ground mode. CapsuleBot comes
equipped with two of these structures, enabling it to attain hybrid
aerial-ground propulsion with just four motors. Importantly, the decoupling of
motion modes is achieved without the need for additional drivers, enhancing the
versatility and robustness of the system. Furthermore, we have designed the
full dynamics and control for aerial and ground locomotion based on the
bicopter model and the two-wheeled self-balancing vehicle model. The
performance of CapsuleBot has been validated through experiments. The results
demonstrate that CapsuleBot produces 40.53% less noise in ground mode and
consumes 99.35% less energy, highlighting its potential for long-term covert
reconnaissance applications.Comment: 7 pages, 10 figures, submitted to 2024 IEEE International Conference
on Robotics and Automation (ICRA). This work has been submitted to the IEEE
for possible publication. Copyright may be transferred without notice, after
which this version may no longer be accessibl
Contrastive Continual Multi-view Clustering with Filtered Structural Fusion
Multi-view clustering thrives in applications where views are collected in
advance by extracting consistent and complementary information among views.
However, it overlooks scenarios where data views are collected sequentially,
i.e., real-time data. Due to privacy issues or memory burden, previous views
are not available with time in these situations. Some methods are proposed to
handle it but are trapped in a stability-plasticity dilemma. In specific, these
methods undergo a catastrophic forgetting of prior knowledge when a new view is
attained. Such a catastrophic forgetting problem (CFP) would cause the
consistent and complementary information hard to get and affect the clustering
performance. To tackle this, we propose a novel method termed Contrastive
Continual Multi-view Clustering with Filtered Structural Fusion (CCMVC-FSF).
Precisely, considering that data correlations play a vital role in clustering
and prior knowledge ought to guide the clustering process of a new view, we
develop a data buffer with fixed size to store filtered structural information
and utilize it to guide the generation of a robust partition matrix via
contrastive learning. Furthermore, we theoretically connect CCMVC-FSF with
semi-supervised learning and knowledge distillation. Extensive experiments
exhibit the excellence of the proposed method
Joint Learning of CNN and LSTM for Image Captioning
Abstract. In this paper, we describe the details of our methods for the participation in the subtask of the ImageCLEF 2016 Scalable Image Annotation task: Natural Language Caption Generation. The model we used is the combination of a procedure of encoding and a procedure of decoding, which includes a Convolutional neural network(CNN) and a Long Short-Term Memory(LSTM) based Recurrent Neural Network. We first train a model on the MSCOCO dataset and then fine tune the model on different target datasets collected by us to get a more suitable model for the natural language caption generation task. Both of the parameters of CNN and LSTM are learned together
- …