Search CORE

70 research outputs found

MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching

Author: Cheng Xueqi
Fan Yixing
Guo Jiafeng
Ji Xiang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/07/2019
Field of study

Text matching is the core problem in many natural language processing (NLP) tasks, such as information retrieval, question answering, and conversation. Recently, deep leaning technology has been widely adopted for text matching, making neural text matching a new and active research domain. With a large number of neural matching models emerging rapidly, it becomes more and more difficult for researchers, especially those newcomers, to learn and understand these new models. Moreover, it is usually difficult to try these models due to the tedious data pre-processing, complicated parameter configuration, and massive optimization tricks, not to mention the unavailability of public codes sometimes. Finally, for researchers who want to develop new models, it is also not an easy task to implement a neural text matching model from scratch, and to compare with a bunch of existing models. In this paper, therefore, we present a novel system, namely MatchZoo, to facilitate the learning, practicing and designing of neural text matching models. The system consists of a powerful matching library and a user-friendly and interactive studio, which can help researchers: 1) to learn state-of-the-art neural text matching models systematically, 2) to train, test and apply these models with simple configurable steps; and 3) to develop their own models with rich APIs and assistance

arXiv.org e-Print Archive

Crossref

Learning Visual Features from Snapshots for Web Search

Author: Cheng Xueqi
Fan Yixing
Guo Jiafeng
Lan Yanyan
Pang Liang
Xu Jun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/10/2017
Field of study

When applying learning to rank algorithms to Web search, a large number of features are usually designed to capture the relevance signals. Most of these features are computed based on the extracted textual elements, link analysis, and user logs. However, Web pages are not solely linked texts, but have structured layout organizing a large variety of elements in different styles. Such layout itself can convey useful visual information, indicating the relevance of a Web page. For example, the query-independent layout (i.e., raw page layout) can help identify the page quality, while the query-dependent layout (i.e., page rendered with matched query words) can further tell rich structural information (e.g., size, position and proximity) of the matching signals. However, such visual information of layout has been seldom utilized in Web search in the past. In this work, we propose to learn rich visual features automatically from the layout of Web pages (i.e., Web page snapshots) for relevance ranking. Both query-independent and query-dependent snapshots are considered as the new inputs. We then propose a novel visual perception model inspired by human's visual search behaviors on page viewing to extract the visual features. This model can be learned end-to-end together with traditional human-crafted features. We also show that such visual features can be efficiently acquired in the online setting with an extended inverted indexing scheme. Experiments on benchmark collections demonstrate that learning visual features from Web page snapshots can significantly improve the performance of relevance ranking in ad-hoc Web retrieval tasks.Comment: CIKM 201

arXiv.org e-Print Archive

Crossref

PD-L1 aptamer-functionalized degradable hafnium oxide nanoparticles for near infrared-II diagnostic imaging and radiosensitization

Author: Jingwen Bai
Jingwen Bai
Jingwen Bai
Jingwen Bai
Jingwen Bai
Jingwen Bai
Jiwei Li
Min Wei
Min Wei
Min Wei
Min Wei
Min Wei
Xiao Shen
Xiao Shen
Xiao Shen
Xiao Shen
Xiao Shen
Xueqi Fan
Xueqi Fan
Xueqi Fan
Xueqi Fan
Xueqi Fan
Publication venue: 'Frontiers Media SA'
Publication date: 01/06/2023
Field of study

Immune checkpoint blockade is now recognized as a paradigm-shifting cancer therapeutic strategy, whereas there remains difficulty in accurately predicting immunotherapy efficacy by PD-L1 expression. In addition, radiotherapy for cancer patients faces the problem of insufficient dose of radiotherapy at the tumor site while which have been not tolerated by normal tissues. In this study, we created PD-L1 aptamer-anchored spherical nucleic acids (SNAs) with a shell made of PD-L1 aptamer and indocyanine green (ICG) embedded in a mesoporous hafnium oxide nanoparticle core (Hf@ICG-Apt). Upon low pH irradiation in the tumor sites, the nano-system enabled the release of ICG in the high PD-L1 expression tumor to develop a high tumor-to-background ratio of 7.97 ± 0.76 and enhanced the ICG tumor retention to more than 48 h. Moreover, Hf@ICG-Apt improved radiation therapy (RT) when combined with radiation. Notably, Hf@ICG-Apt showed scarcely any systemic toxicity in vivo. Overall, this research offered a novel approach for applying reliable monitoring of PD-L1 expression and localization and robust RT sensitization against cancer with good biosafety

Directory of Open Access Journals

Visual Named Entity Linking: A New Dataset and A Baseline

Author: Cheng Xueqi
Fan Yixing
Guo Jiafeng
Sun Wenxiang
Zhang Ruqing
Publication venue
Publication date: 09/11/2022
Field of study

Visual Entity Linking (VEL) is a task to link regions of images with their corresponding entities in Knowledge Bases (KBs), which is beneficial for many computer vision tasks such as image retrieval, image caption, and visual question answering. While existing tasks in VEL either rely on textual data to complement a multi-modal linking or only link objects with general entities, which fails to perform named entity linking on large amounts of image data. In this paper, we consider a purely Visual-based Named Entity Linking (VNEL) task, where the input only consists of an image. The task is to identify objects of interest (i.e., visual entity mentions) in images and link them to corresponding named entities in KBs. Since each entity often contains rich visual and textual information in KBs, we thus propose three different sub-tasks, i.e., visual to visual entity linking (V2VEL), visual to textual entity linking (V2TEL), and visual to visual-textual entity linking (V2VTEL). In addition, we present a high-quality human-annotated visual person linking dataset, named WIKIPerson. Based on WIKIPerson, we establish a series of baseline algorithms for the solution of each sub-task, and conduct experiments to verify the quality of proposed datasets and the effectiveness of baseline methods. We envision this work to be helpful for soliciting more works regarding VNEL in the future. The codes and datasets are publicly available at https://github.com/ict-bigdatalab/VNEL.Comment: 13 pages, 11 figures, published to EMNLP 2022(findings

arXiv.org e-Print Archive

L^2R: Lifelong Learning for First-stage Retrieval with Backward-Compatible Representations

Author: Bi Keping
Cai Yinqiong
Chen Wei
Cheng Xueqi
Fan Yixing
Guo Jiafeng
Publication venue
Publication date: 22/08/2023
Field of study

First-stage retrieval is a critical task that aims to retrieve relevant document candidates from a large-scale collection. While existing retrieval models have achieved impressive performance, they are mostly studied on static data sets, ignoring that in the real-world, the data on the Web is continuously growing with potential distribution drift. Consequently, retrievers trained on static old data may not suit new-coming data well and inevitably produce sub-optimal results. In this work, we study lifelong learning for first-stage retrieval, especially focusing on the setting where the emerging documents are unlabeled since relevance annotation is expensive and may not keep up with data emergence. Under this setting, we aim to develop model updating with two goals: (1) to effectively adapt to the evolving distribution with the unlabeled new-coming data, and (2) to avoid re-inferring all embeddings of old documents to efficiently update the index each time the model is updated. We first formalize the task and then propose a novel Lifelong Learning method for the first-stage Retrieval, namely L^2R. L^2R adopts the typical memory mechanism for lifelong learning, and incorporates two crucial components: (1) selecting diverse support negatives for model training and memory updating for effective model adaptation, and (2) a ranking alignment objective to ensure the backward-compatibility of representations to save the cost of index rebuilding without hurting the model performance. For evaluation, we construct two new benchmarks from LoTTE and Multi-CPR datasets to simulate the document distribution drift in realistic retrieval scenarios. Extensive experiments show that L^2R significantly outperforms competitive lifelong learning baselines.Comment: accepted by CIKM202

arXiv.org e-Print Archive