221 research outputs found
Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection
Human-Object Interaction (HOI) detection is a core task for high-level image
understanding. Recently, Detection Transformer (DETR)-based HOI detectors have
become popular due to their superior performance and efficient structure.
However, these approaches typically adopt fixed HOI queries for all testing
images, which is vulnerable to the location change of objects in one specific
image. Accordingly, in this paper, we propose to enhance DETR's robustness by
mining hard-positive queries, which are forced to make correct predictions
using partial visual cues. First, we explicitly compose hard-positive queries
according to the ground-truth (GT) position of labeled human-object pairs for
each training image. Specifically, we shift the GT bounding boxes of each
labeled human-object pair so that the shifted boxes cover only a certain
portion of the GT ones. We encode the coordinates of the shifted boxes for each
labeled human-object pair into an HOI query. Second, we implicitly construct
another set of hard-positive queries by masking the top scores in
cross-attention maps of the decoder layers. The masked attention maps then only
cover partial important cues for HOI predictions. Finally, an alternate
strategy is proposed that efficiently combines both types of hard queries. In
each iteration, both DETR's learnable queries and one selected type of
hard-positive queries are adopted for loss computation. Experimental results
show that our proposed approach can be widely applied to existing DETR-based
HOI detectors. Moreover, we consistently achieve state-of-the-art performance
on three benchmarks: HICO-DET, V-COCO, and HOI-A. Code is available at
https://github.com/MuchHair/HQM.Comment: Accepted by ECCV202
A convexity approach to dynamic output feedback robust MPC for LPV systems with bounded disturbances
International audienceA convexity approach to dynamic output feedback robust model predictive control (OFRMPC) is proposed for linear parameter varying (LPV) systems with bounded disturbances. At each sampling time, the model parameters and disturbances are assumed to be unknown but bounded within pre-specified convex sets. Robust stability conditions on the augmented closed-loop system are derived using the techniques of robust positively invariant (RPI) set and the S-procedure. A convexity method reformulates the non-convex bilinear matrix inequalities (BMIs) problem as a convex optimization one such that the on-line computational burden is significantly reduced. The on-line optimized dynamic output feedback controller parameters steer the augmented states to converge within RPI sets and recursive feasibility of the optimization problem is guaranteed. Furthermore, bounds of the estimation error set are refreshed by updating the shape matrix of the future ellipsoidal estimation error set. The dynamic OFRMPC approach guarantees that the disturbance-free augmented closed-loop system (without consideration of disturbances) converges to the origin. In addition, when the system is subject to bounded disturbances, the augmented closed-loop system converges to a neighborhood of the origin. Two simulation examples are given to verify the effectiveness of the approach
Long Short-Term Planning for Conversational Recommendation Systems
In Conversational Recommendation Systems (CRS), the central question is how
the conversational agent can naturally ask for user preferences and provide
suitable recommendations. Existing works mainly follow the hierarchical
architecture, where a higher policy decides whether to invoke the conversation
module (to ask questions) or the recommendation module (to make
recommendations). This architecture prevents these two components from fully
interacting with each other. In contrast, this paper proposes a novel
architecture, the long short-term feedback architecture, to connect these two
essential components in CRS. Specifically, the recommendation predicts the
long-term recommendation target based on the conversational context and the
user history. Driven by the targeted recommendation, the conversational model
predicts the next topic or attribute to verify if the user preference matches
the target. The balance feedback loop continues until the short-term planner
output matches the long-term planner output, that is when the system should
make the recommendation.Comment: 14 pages, 3 figures. Accepted by ICONIP 202
Graph Transformer for Recommendation
This paper presents a novel approach to representation learning in
recommender systems by integrating generative self-supervised learning with
graph transformer architecture. We highlight the importance of high-quality
data augmentation with relevant self-supervised pretext tasks for improving
performance. Towards this end, we propose a new approach that automates the
self-supervision augmentation process through a rationale-aware generative SSL
that distills informative user-item interaction patterns. The proposed
recommender with Graph TransFormer (GFormer) that offers parameterized
collaborative rationale discovery for selective augmentation while preserving
global-aware user-item relationships. In GFormer, we allow the rationale-aware
SSL to inspire graph collaborative filtering with task-adaptive invariant
rationalization in graph transformer. The experimental results reveal that our
GFormer has the capability to consistently improve the performance over
baselines on different datasets. Several in-depth experiments further
investigate the invariant rationale-aware augmentation from various aspects.
The source code for this work is publicly available at:
https://github.com/HKUDS/GFormer.Comment: Accepted by SIGIR'202
Constructing Media-based Enterprise Networks for Stock Market Risk Analysis
Stock comovement analysis is essential to understand the mechanism of stock markets. Previous studies focus on the comovement from the perspectives of fundamentals or preferences of investors. In this article, we propose a framework to explore the comovements of stocks in terms of their relationships in Web media. This is achieved by constructing media-based enterprise networks in terms of the co-exposure in news reports of stocks and mutual attentions among them. Our experiments based on CSI 300 listed firms show the significant comovements of stocks brought out by their behaviors in Web media. Furthermore, utilizing media based enterprise networks can help us identify the most influential firms which can stir up the stock markets
Multi-Scenario Ranking with Adaptive Feature Learning
Recently, Multi-Scenario Learning (MSL) is widely used in recommendation and
retrieval systems in the industry because it facilitates transfer learning from
different scenarios, mitigating data sparsity and reducing maintenance cost.
These efforts produce different MSL paradigms by searching more optimal network
structure, such as Auxiliary Network, Expert Network, and Multi-Tower Network.
It is intuitive that different scenarios could hold their specific
characteristics, activating the user's intents quite differently. In other
words, different kinds of auxiliary features would bear varying importance
under different scenarios. With more discriminative feature representations
refined in a scenario-aware manner, better ranking performance could be easily
obtained without expensive search for the optimal network structure.
Unfortunately, this simple idea is mainly overlooked but much desired in
real-world systems.Further analysis also validates the rationality of adaptive
feature learning under a multi-scenario scheme. Moreover, our A/B test results
on the Alibaba search advertising platform also demonstrate that Maria is
superior in production environments.Comment: 10 pages
- …