4,614 research outputs found

    Exploring Topic-based Language Models for Effective Web Information Retrieval

    Get PDF
    The main obstacle for providing focused search is the relative opaqueness of search request -- searchers tend to express their complex information needs in only a couple of keywords. Our overall aim is to find out if, and how, topic-based language models can lead to more effective web information retrieval. In this paper we explore retrieval performance of a topic-based model that combines topical models with other language models based on cross-entropy. We first define our topical categories and train our topical models on the .GOV2 corpus by building parsimonious language models. We then test the topic-based model on TREC8 small Web data collection for ad-hoc search.Our experimental results show that the topic-based model outperforms the standard language model and parsimonious model

    Advanced language modeling approaches, case study: Expert search

    Get PDF
    This tutorial gives a clear and detailed overview of advanced language modeling approaches and tools, including the use of document priors, translation models, relevance models, parsimonious models and expectation maximization training. Expert search will be used as a case study to explain the consequences of modeling assumptions

    Language models and probability of relevance

    Get PDF
    this document; the equation then represents the probability that the document that the user had in mind was in fact this one. Hiemstra [1] gives the same equation a slightly di#erent justification. The basic assumption is the same (the user is assumed to have a specific document in mind and to generate the query on the basis of this document), but instead of smoothing, the user is assumed to assign a binary importance value to each term position in the query. An important term-position is filled with a term from the document; a non-important one is filled with a general language term. If we define # i = P(term position i is important), then we get P (D, T 1 , T 2 , . . . , T n ) = P (D) n # i=1 ((1 - # i )P (T i ) +&lt

    Joint Intermodal and Intramodal Label Transfers for Extremely Rare or Unseen Classes

    Full text link
    In this paper, we present a label transfer model from texts to images for image classification tasks. The problem of image classification is often much more challenging than text classification. On one hand, labeled text data is more widely available than the labeled images for classification tasks. On the other hand, text data tends to have natural semantic interpretability, and they are often more directly related to class labels. On the contrary, the image features are not directly related to concepts inherent in class labels. One of our goals in this paper is to develop a model for revealing the functional relationships between text and image features as to directly transfer intermodal and intramodal labels to annotate the images. This is implemented by learning a transfer function as a bridge to propagate the labels between two multimodal spaces. However, the intermodal label transfers could be undermined by blindly transferring the labels of noisy texts to annotate images. To mitigate this problem, we present an intramodal label transfer process, which complements the intermodal label transfer by transferring the image labels instead when relevant text is absent from the source corpus. In addition, we generalize the inter-modal label transfer to zero-shot learning scenario where there are only text examples available to label unseen classes of images without any positive image examples. We evaluate our algorithm on an image classification task and show the effectiveness with respect to the other compared algorithms.Comment: The paper has been accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence. It will apear in a future issu

    Network-state dependent effects in naming and learning

    Get PDF

    Tailored semantic annotation for semantic search

    Get PDF
    This paper presents a novel method for semantic annotation and search of a target corpus using several knowledge resources (KRs). This method relies on a formal statistical framework in which KR concepts and corpus documents are homogeneously represented using statistical language models. Under this framework, we can perform all the necessary operations for an efficient and effective semantic annotation of the corpus. Firstly, we propose a coarse tailoring of the KRs w.r.t the target corpus with the main goal of reducing the ambiguity of the annotations and their computational overhead. Then, we propose the generation of concept profiles, which allow measuring the semantic overlap of the KRs as well as performing a finer tailoring of them. Finally, we propose how to semantically represent documents and queries in terms of the KRs concepts and the statistical framework to perform semantic search. Experiments have been carried out with a corpus about web resources which includes several Life Sciences catalogs and Wikipedia pages related to web resources in general (e.g., databases, tools, services, etc.). Results demonstrate that the proposed method is more effective and efficient than state-of-the-art methods relying on either context-free annotation or keyword-based search.We thank anonymous reviewers for their very useful comments and suggestions. The work was supported by the CICYT project TIN2011-24147 from the Spanish Ministry of Economy and Competitiveness (MINECO)

    Parsimonious concept modeling

    Get PDF

    Phonographic neighbors, not orthographic neighbors, determine word naming latencies

    Get PDF
    The orthographic neighborhood size (N) of a word—the number of words that can be formed from that word by replacing one letter with another in its place—has been found to have facilitatory effects in word naming. The orthographic neighborhood hypothesis attributes this facilitation to interactive effects. A phonographic neighborhood hypothesis, in contrast, attributes the effect to lexical print-sound conversion. According to the phonographic neighborhood hypothesis, phonographic neighbors (words differing in one letter and one phoneme, e.g., stove and stone) should facilitate naming, and other orthographic neighbors (e.g., stove and shove) should not. The predictions of these two hypotheses are tested. Unique facilitatory phonographic N effects were found in four sets of word naming mega-study data, along with an absence of facilitatory orthographic N effects. These results implicate print-sound conversion—based on consistent phonology—in neighborhood effects rather than word-letter feedback
    • 

    corecore