130 research outputs found

    The State-of-the-arts in Focused Search

    Get PDF
    The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems

    Using Parsimonious Language Models on Web Data

    Get PDF
    In this paper we explore the use of parsimonious language models for web retrieval. These models are smaller thus more efficient than the standard language models and are therefore well suited for large-scale web retrieval. We have conducted experiments on four TREC topic sets, and found that the parsimonious language model results in improvement of retrieval effectiveness over the standard language model for all data-sets and measures. In all cases the improvement is significant, and more substantial than in earlier experiments\ud on newspaper/newswire data

    Exploring Topic-based Language Models for Effective Web Information Retrieval

    Get PDF
    The main obstacle for providing focused search is the relative opaqueness of search request -- searchers tend to express their complex information needs in only a couple of keywords. Our overall aim is to find out if, and how, topic-based language models can lead to more effective web information retrieval. In this paper we explore retrieval performance of a topic-based model that combines topical models with other language models based on cross-entropy. We first define our topical categories and train our topical models on the .GOV2 corpus by building parsimonious language models. We then test the topic-based model on TREC8 small Web data collection for ad-hoc search.Our experimental results show that the topic-based model outperforms the standard language model and parsimonious model

    Parsimonious Language Models for a Terabyte of Text

    Get PDF
    The aims of this paper are twofold. Our first aim\ud is to compare results of the earlier Terabyte tracks\ud to the Million Query track. We submitted a number\ud of runs using different document representations\ud (such as full-text, title-fields, or incoming\ud anchor-texts) to increase pool diversity. The initial\ud results show broad agreement in system rankings\ud over various measures on topic sets judged at both\ud Terabyte and Million Query tracks, with runs using\ud the full-text index giving superior results on\ud all measures, but also some noteworthy upsets.\ud Our second aim is to explore the use of parsimonious\ud language models for retrieval on terabyte-scale\ud collections. These models are smaller thus\ud more efficient than the standard language models\ud when used at indexing time, and they may also improve\ud retrieval performance. We have conducted\ud initial experiments using parsimonious models in\ud combination with pseudo-relevance feedback, for\ud both the Terabyte and Million Query track topic\ud sets, and obtained promising initial results

    Deriving implicit user feedback from partial URLs for effective web page retrieval

    Get PDF
    User click-throughs provide a search context for understanding the user need of complex information. This paper re-examines the effectiveness of this approach when based on partial clicked data using the language modeling framework. We expand the original query by topical terms derived from clicked Web pages and enhance early precision via a more compact document representation. Since our URLs of Web pages are stripped, we first reconstruct them at different levels based on different collections. Our experimental results on the GOV2 test collection and AOL query log show improvement by 31.7% and 28.3% significantly in statMAP for two sources of reconstruction and 153 ad-hoc queries. Our model also outperforms pseudo relevance feedback

    Synthesis and antiviral activities of a novel class of thioflavone and flavonoid analogues

    Get PDF
    AbstractA novel class of thioflavone and flavonoid derivatives has been prepared and their antiviral activities against enterovirus 71 (EV71) and the coxsackievirus B3 (CVB3) and B6 (CVB6) were evaluated. Compounds 7d and 9b showed potent antiviral activities against EV71 with IC50 values of 8.27 and 5.48μM, respectively. Compound 7f, which has been synthesized for the first time in this work, showed the highest level of inhibitory activity against both CVB3 and CVB6 with an IC50 value of 0.62 and 0.87μM. Compounds 4b, 7a, 9c and 9e also showed strong inhibitory activities against both the CVB3 and CVB6 at low concentrations (IC50=1.42−7.15μM), whereas compounds 4d, 7c, 7e and 7g showed strong activity against CVB6 (IC50=2.91–3.77μM) together with low levels of activity against CVB3. Compound 7d exhibited stronger inhibitory activity against CVB3 (IC50=6.44μM) than CVB6 (IC50>8.29μM). The thioflavone derivatives 7a, 7c, 7d, 7e, 7f and 7g, represent a new class of lead compounds for the development of novel antiviral agents

    PV2TEA: Patching Visual Modality to Textual-Established Information Extraction

    Full text link
    Information extraction, e.g., attribute value extraction, has been extensively studied and formulated based only on text. However, many attributes can benefit from image-based extraction, like color, shape, pattern, among others. The visual modality has long been underutilized, mainly due to multimodal annotation difficulty. In this paper, we aim to patch the visual modality to the textual-established attribute information extractor. The cross-modality integration faces several unique challenges: (C1) images and textual descriptions are loosely paired intra-sample and inter-samples; (C2) images usually contain rich backgrounds that can mislead the prediction; (C3) weakly supervised labels from textual-established extractors are biased for multimodal training. We present PV2TEA, an encoder-decoder architecture equipped with three bias reduction schemes: (S1) Augmented label-smoothed contrast to improve the cross-modality alignment for loosely-paired image and text; (S2) Attention-pruning that adaptively distinguishes the visual foreground; (S3) Two-level neighborhood regularization that mitigates the label textual bias via reliability estimation. Empirical results on real-world e-Commerce datasets demonstrate up to 11.74% absolute (20.97% relatively) F1 increase over unimodal baselines.Comment: ACL 2023 Finding
    corecore