Search CORE

47 research outputs found

Reproducibility Analysis and Enhancements for Multi-Aspect Dense Retriever with Aspect Learning

Author: Bi Keping
Cheng Xueqi
Guo Jiafeng
Sun Xiaojie
Publication venue
Publication date: 16/01/2024
Field of study

Multi-aspect dense retrieval aims to incorporate aspect information (e.g., brand and category) into dual encoders to facilitate relevance matching. As an early and representative multi-aspect dense retriever, MADRAL learns several extra aspect embeddings and fuses the explicit aspects with an implicit aspect "OTHER" for final representation. MADRAL was evaluated on proprietary data and its code was not released, making it challenging to validate its effectiveness on other datasets. We failed to reproduce its effectiveness on the public MA-Amazon data, motivating us to probe the reasons and re-examine its components. We propose several component alternatives for comparisons, including replacing "OTHER" with "CLS" and representing aspects with the first several content tokens. Through extensive experiments, we confirm that learning "OTHER" from scratch in aspect fusion is harmful. In contrast, our proposed variants can greatly enhance the retrieval performance. Our research not only sheds light on the limitations of MADRAL but also provides valuable insights for future studies on more powerful multi-aspect dense retrieval models. Code will be released at: https://github.com/sunxiaojie99/Reproducibility-for-MADRAL.Comment: accepted by ecir2024 as a reproducibility pape

arXiv.org e-Print Archive

Feature-Enhanced Network with Hybrid Debiasing Strategies for Unbiased Learning to Rank

Author: Bi Keping
Guo Jiafeng
Sun Xiaojie
Wang Yiting
Yu Lulu
Publication venue
Publication date: 15/02/2023
Field of study

Unbiased learning to rank (ULTR) aims to mitigate various biases existing in user clicks, such as position bias, trust bias, presentation bias, and learn an effective ranker. In this paper, we introduce our winning approach for the "Unbiased Learning to Rank" task in WSDM Cup 2023. We find that the provided data is severely biased so neural models trained directly with the top 10 results with click information are unsatisfactory. So we extract multiple heuristic-based features for multi-fields of the results, adjust the click labels, add true negatives, and re-weight the samples during model training. Since the propensities learned by existing ULTR methods are not decreasing w.r.t. positions, we also calibrate the propensities according to the click ratios and ensemble the models trained in two different ways. Our method won the 3rd prize with a DCG@10 score of 9.80, which is 1.1% worse than the 2nd and 25.3% higher than the 4th.Comment: 5 pages, 1 figure, WSDM Cup 202

arXiv.org e-Print Archive

Pre-training with Aspect-Content Text Mutual Prediction for Multi-Aspect Dense Retrieval

Author: Bi Keping
Guo Jiafeng
Liu Zhongyi
Ma Xinyu
Shan Hongyu
Sun Xiaojie
Yixing Fan
Zhang Qishen
Publication venue
Publication date: 22/08/2023
Field of study

Grounded on pre-trained language models (PLMs), dense retrieval has been studied extensively on plain text. In contrast, there has been little research on retrieving data with multiple aspects using dense models. In the scenarios such as product search, the aspect information plays an essential role in relevance matching, e.g., category: Electronics, Computers, and Pet Supplies. A common way of leveraging aspect information for multi-aspect retrieval is to introduce an auxiliary classification objective, i.e., using item contents to predict the annotated value IDs of item aspects. However, by learning the value embeddings from scratch, this approach may not capture the various semantic similarities between the values sufficiently. To address this limitation, we leverage the aspect information as text strings rather than class IDs during pre-training so that their semantic similarities can be naturally captured in the PLMs. To facilitate effective retrieval with the aspect strings, we propose mutual prediction objectives between the text of the item aspect and content. In this way, our model makes more sufficient use of aspect information than conducting undifferentiated masked language modeling (MLM) on the concatenated text of aspects and content. Extensive experiments on two real-world datasets (product and mini-program search) show that our approach can outperform competitive baselines both treating aspect values as classes and conducting the same MLM for aspect and content strings. Code and related dataset will be available at the URL \footnote{https://github.com/sunxiaojie99/ATTEMPT}.Comment: accepted by cikm202

arXiv.org e-Print Archive

A Multi-Granularity-Aware Aspect Learning Model for Multi-Aspect Dense Retrieval

Author: Bi Keping
Cheng Xueqi
Guo Jiafeng
Liu Zhongyi
Sun Xiaojie
Yang Sihui
Zhang Guannan
Zhang Qishen
Publication venue
Publication date: 16/01/2024
Field of study

Dense retrieval methods have been mostly focused on unstructured text and less attention has been drawn to structured data with various aspects, e.g., products with aspects such as category and brand. Recent work has proposed two approaches to incorporate the aspect information into item representations for effective retrieval by predicting the values associated with the item aspects. Despite their efficacy, they treat the values as isolated classes (e.g., "Smart Homes", "Home, Garden & Tools", and "Beauty & Health") and ignore their fine-grained semantic relation. Furthermore, they either enforce the learning of aspects into the CLS token, which could confuse it from its designated use for representing the entire content semantics, or learn extra aspect embeddings only with the value prediction objective, which could be insufficient especially when there are no annotated values for an item aspect. Aware of these limitations, we propose a MUlti-granulaRity-aware Aspect Learning model (MURAL) for multi-aspect dense retrieval. It leverages aspect information across various granularities to capture both coarse and fine-grained semantic relations between values. Moreover, MURAL incorporates separate aspect embeddings as input to transformer encoders so that the masked language model objective can assist implicit aspect learning even without aspect-value annotations. Extensive experiments on two real-world datasets of products and mini-programs show that MURAL outperforms state-of-the-art baselines significantly.Comment: Accepted by WSDM2024, updat

arXiv.org e-Print Archive

Facilitating Interaction with Large Displays in Smart Spaces

Author: Peifeng Xiang
Xiaojie Chen
Xiaojun Bi
Yuanchun Shi
Publication venue
Publication date: 01/01/2005
Field of study

Large displays are widely equipped in Smart Spaces these days. However, traditional interaction devices which are designed to suit desktop screen, such as mice, keyboards, have various limitations in such environments. In this paper, we present a novel human-computer interaction system, known as the CollabPointer, for facilitating interaction with large displays in Smart Spaces. A laser pointer integrated with three additional buttons and wireless communication modules is induced as input device in our system and three features distinguish the CollabPointer from other interaction technologies. First, the coordinates of the red laser point on the screen emitted by the laser pointer are interpreted as the cursor’s position and the additional buttons on it wirelessly emulate a mouse’s buttons through radio frequency. It enables remote interaction at any distance. Second, when multiple users are interacting, with two-steps associating methods described in this paper, our system can identify different laser pointers and support multi-user collaboration. Last but not least, the laser pointer emits its identity through radio frequency during interaction. The system receives it and treats different users separately. In the end, the CollabPointer has been implemented in the Smart Classroom [1]- a prototype of Smart Space, and the results of user studies show the benefit of it

CiteSeerX

Crossref

uPen: Laser-based, Personalized, Multi-User Interaction on Large Displays

Author: Peifeng Xiang
Xiaojie Chen
Xiaojun Bi
Yuanchun Shi
Publication venue
Publication date
Field of study

We present the uPen, a laser pointer combined with a contactpushed switch, three press buttons and a wireless communication module. This novel interaction device allows users to interact on large displays at a distance or directly on the surface with fullfunction of mouse. Onboard software enable the uPen system to identify different users and provide personalized services to them, such as associating users with corresponding privileges, giving access to each participant’s private content (e.g., home pages, personal calendars). Additionally, with our two-step association method, the uPen system has the ability to distinguish strokes of different uPens working simultaneously and support multi-user simultaneous interaction. A prototype system has been implemented in our Smart Classroom [1]. And user studies show the benefit of using it. Categories and Subject Descriptors H.5.2 [Information Interfaces and Presentation]: User Interfaces- interaction styles,input devices and strategies,theory and methods

CiteSeerX