Search CORE

130 research outputs found

The State-of-the-arts in Focused Search

Author: Li Rongmei
Publication venue: University of Twente, Centre for Telematics and Information Technology
Publication date: 01/01/2009
Field of study

The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems

University of Twente Research Information

Using Parsimonious Language Models on Web Data

Author: Hiemstra Djoerd
Kamps Jaap
Kaptein Rianne
Li Rongmei
Publication venue: ACM Press
Publication date: 01/01/2008
Field of study

In this paper we explore the use of parsimonious language models for web retrieval. These models are smaller thus more efficient than the standard language models and are therefore well suited for large-scale web retrieval. We have conducted experiments on four TREC topic sets, and found that the parsimonious language model results in improvement of retrieval effectiveness over the standard language model for all data-sets and measures. In all cases the improvement is significant, and more substantial than in earlier experiments\ud on newspaper/newswire data

CiteSeerX

Radboud Repository

University of Twente Research Information

International Migration, Integration and Social Cohesion online publications

Exploring Topic-based Language Models for Effective Web Information Retrieval

Author: Hiemstra Djoerd
Kamps Jaap
Kaptein Rianne
Li Rongmei
Publication venue: Neslia Paniculata
Publication date: 01/01/2008
Field of study

The main obstacle for providing focused search is the relative opaqueness of search request -- searchers tend to express their complex information needs in only a couple of keywords. Our overall aim is to find out if, and how, topic-based language models can lead to more effective web information retrieval. In this paper we explore retrieval performance of a topic-based model that combines topical models with other language models based on cross-entropy. We first define our topical categories and train our topical models on the .GOV2 corpus by building parsimonious language models. We then test the topic-based model on TREC8 small Web data collection for ad-hoc search.Our experimental results show that the topic-based model outperforms the standard language model and parsimonious model

University of Twente Research Information

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Parsimonious Language Models for a Terabyte of Text

Author: Hiemstra Djoerd
Kamps Jaap
Kaptein Rianne
Li Rongmei
Publication venue: US National Institute of Standards and Technology (NIST)
Publication date: 01/01/2008
Field of study

The aims of this paper are twofold. Our first aim\ud is to compare results of the earlier Terabyte tracks\ud to the Million Query track. We submitted a number\ud of runs using different document representations\ud (such as full-text, title-fields, or incoming\ud anchor-texts) to increase pool diversity. The initial\ud results show broad agreement in system rankings\ud over various measures on topic sets judged at both\ud Terabyte and Million Query tracks, with runs using\ud the full-text index giving superior results on\ud all measures, but also some noteworthy upsets.\ud Our second aim is to explore the use of parsimonious\ud language models for retrieval on terabyte-scale\ud collections. These models are smaller thus\ud more efficient than the standard language models\ud when used at indexing time, and they may also improve\ud retrieval performance. We have conducted\ud initial experiments using parsimonious models in\ud combination with pseudo-relevance feedback, for\ud both the Terabyte and Million Query track topic\ud sets, and obtained promising initial results

University of Twente Research Information

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Deriving implicit user feedback from partial URLs for effective web page retrieval

Author: Li Rongmei
van der Weide Theo
Publication venue: ACM Press
Publication date: 01/01/2010
Field of study

User click-throughs provide a search context for understanding the user need of complex information. This paper re-examines the effectiveness of this approach when based on partial clicked data using the language modeling framework. We expand the original query by topical terms derived from clicked Web pages and enhance early precision via a more compact document representation. Since our URLs of Web pages are stripped, we first reconstruct them at different levels based on different collections. Our experimental results on the GOV2 test collection and AOL query log show improvement by 31.7% and 28.3% significantly in statMAP for two sources of reconstruction and 153 ad-hoc queries. Our model also outperforms pseudo relevance feedback

University of Twente Research Information

Synthesis and antiviral activities of a novel class of thioflavone and flavonoid analogues

Author: Gao Rongmei
Ji Xingyue
Jiang Jiandong
Li Yuhuan
Li Zhuorong
Meng Shuai
Wang Huiqiang
Zhang Dajun
Zhong Zhaojin
Publication venue: Institute of Materia Medica, Chinese Academy of Medical Sciences and Chinese Pharmaceutical Association. Production and hosting by Elsevier B.V.
Publication date: 01/12/2012
Field of study

AbstractA novel class of thioflavone and flavonoid derivatives has been prepared and their antiviral activities against enterovirus 71 (EV71) and the coxsackievirus B3 (CVB3) and B6 (CVB6) were evaluated. Compounds 7d and 9b showed potent antiviral activities against EV71 with IC50 values of 8.27 and 5.48μM, respectively. Compound 7f, which has been synthesized for the first time in this work, showed the highest level of inhibitory activity against both CVB3 and CVB6 with an IC50 value of 0.62 and 0.87μM. Compounds 4b, 7a, 9c and 9e also showed strong inhibitory activities against both the CVB3 and CVB6 at low concentrations (IC50=1.42−7.15μM), whereas compounds 4d, 7c, 7e and 7g showed strong activity against CVB6 (IC50=2.91–3.77μM) together with low levels of activity against CVB3. Compound 7d exhibited stronger inhibitory activity against CVB3 (IC50=6.44μM) than CVB6 (IC50>8.29μM). The thioflavone derivatives 7a, 7c, 7d, 7e, 7f and 7g, represent a new class of lead compounds for the development of novel antiviral agents

Elsevier - Publisher Connector

Directory of Open Access Journals

PV2TEA: Patching Visual Modality to Textual-Established Information Extraction

Author: Cui Hejie
Li Xian
Lin Rongmei
Shang Jingbo
Yang Carl
Zalmout Nasser
Zhang Chenwei
Publication venue
Publication date: 01/06/2023
Field of study

Information extraction, e.g., attribute value extraction, has been extensively studied and formulated based only on text. However, many attributes can benefit from image-based extraction, like color, shape, pattern, among others. The visual modality has long been underutilized, mainly due to multimodal annotation difficulty. In this paper, we aim to patch the visual modality to the textual-established attribute information extractor. The cross-modality integration faces several unique challenges: (C1) images and textual descriptions are loosely paired intra-sample and inter-samples; (C2) images usually contain rich backgrounds that can mislead the prediction; (C3) weakly supervised labels from textual-established extractors are biased for multimodal training. We present PV2TEA, an encoder-decoder architecture equipped with three bias reduction schemes: (S1) Augmented label-smoothed contrast to improve the cross-modality alignment for loosely-paired image and text; (S2) Attention-pruning that adaptively distinguishes the visual foreground; (S3) Two-level neighborhood regularization that mitigates the label textual bias via reliability estimation. Empirical results on real-world e-Commerce datasets demonstrate up to 11.74% absolute (20.97% relatively) F1 increase over unimodal baselines.Comment: ACL 2023 Finding

arXiv.org e-Print Archive