Search CORE

4,614 research outputs found

Exploring Topic-based Language Models for Effective Web Information Retrieval

Author: Hiemstra Djoerd
Kamps Jaap
Kaptein Rianne
Li Rongmei
Publication venue: Neslia Paniculata
Publication date: 01/01/2008
Field of study

The main obstacle for providing focused search is the relative opaqueness of search request -- searchers tend to express their complex information needs in only a couple of keywords. Our overall aim is to find out if, and how, topic-based language models can lead to more effective web information retrieval. In this paper we explore retrieval performance of a topic-based model that combines topical models with other language models based on cross-entropy. We first define our topical categories and train our topical models on the .GOV2 corpus by building parsimonious language models. We then test the topic-based model on TREC8 small Web data collection for ad-hoc search.Our experimental results show that the topic-based model outperforms the standard language model and parsimonious model

University of Twente Research Information

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Advanced language modeling approaches, case study: Expert search

Author: Hiemstra D.
Publication venue: Springer Verlag
Publication date: 01/01/2008
Field of study

This tutorial gives a clear and detailed overview of advanced language modeling approaches and tools, including the use of document priors, translation models, relevance models, parsimonious models and expectation maximization training. Expert search will be used as a case study to explain the consequences of modeling assumptions

Radboud Repository

University of Twente Research Information

Language models and probability of relevance

Author: Hiemstra D.
Robertson S.E.
Publication venue: Carnegie Mellon University
Publication date: 01/01/2001
Field of study

this document; the equation then represents the probability that the document that the user had in mind was in fact this one. Hiemstra [1] gives the same equation a slightly di#erent justification. The basic assumption is the same (the user is assumed to have a specific document in mind and to generate the query on the basis of this document), but instead of smoothing, the user is assumed to assign a binary importance value to each term position in the query. An important term-position is filled with a term from the document; a non-important one is filled with a general language term. If we define # i = P(term position i is important), then we get P (D, T 1 , T 2 , . . . , T n ) = P (D) n # i=1 ((1 - # i )P (T i ) +&lt

CiteSeerX

University of Twente Research Information

Joint Intermodal and Intramodal Label Transfers for Extremely Rare or Unseen Classes

Author: Aggarwal Charu
Huang Thomas
Liu Wei
Qi Guo-Jun
Publication venue
Publication date: 22/03/2017
Field of study

In this paper, we present a label transfer model from texts to images for image classification tasks. The problem of image classification is often much more challenging than text classification. On one hand, labeled text data is more widely available than the labeled images for classification tasks. On the other hand, text data tends to have natural semantic interpretability, and they are often more directly related to class labels. On the contrary, the image features are not directly related to concepts inherent in class labels. One of our goals in this paper is to develop a model for revealing the functional relationships between text and image features as to directly transfer intermodal and intramodal labels to annotate the images. This is implemented by learning a transfer function as a bridge to propagate the labels between two multimodal spaces. However, the intermodal label transfers could be undermined by blindly transferring the labels of noisy texts to annotate images. To mitigate this problem, we present an intramodal label transfer process, which complements the intermodal label transfer by transferring the image labels instead when relevant text is absent from the source corpus. In addition, we generalize the inter-modal label transfer to zero-shot learning scenario where there are only text examples available to label unseen classes of images without any positive image examples. We evaluate our algorithm on an image classification task and show the effectiveness with respect to the other compared algorithms.Comment: The paper has been accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence. It will apear in a future issu

arXiv.org e-Print Archive

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Network-state dependent effects in naming and learning

Author: Payne Joshua
Publication venue: 'Frontiers Media SA'
Publication date: 03/06/2020
Field of study

Bangor University Research Portal

Tailored semantic annotation for semantic search

Author: Berlanga Llavori Rafael
Nebot Romero Victoria
Pérez Catalán María
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

This paper presents a novel method for semantic annotation and search of a target corpus using several knowledge resources (KRs). This method relies on a formal statistical framework in which KR concepts and corpus documents are homogeneously represented using statistical language models. Under this framework, we can perform all the necessary operations for an efficient and effective semantic annotation of the corpus. Firstly, we propose a coarse tailoring of the KRs w.r.t the target corpus with the main goal of reducing the ambiguity of the annotations and their computational overhead. Then, we propose the generation of concept profiles, which allow measuring the semantic overlap of the KRs as well as performing a finer tailoring of them. Finally, we propose how to semantically represent documents and queries in terms of the KRs concepts and the statistical framework to perform semantic search. Experiments have been carried out with a corpus about web resources which includes several Life Sciences catalogs and Wikipedia pages related to web resources in general (e.g., databases, tools, services, etc.). Results demonstrate that the proposed method is more effective and efficient than state-of-the-art methods relying on either context-free annotation or keyword-based search.We thank anonymous reviewers for their very useful comments and suggestions. The work was supported by the CICYT project TIN2011-24147 from the Spanish Ministry of Economy and Competitiveness (MINECO)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Repositori Institucional de la Universitat Jaume I

Parsimonious concept modeling

Author: de Rijke Maarten
Kraaij Wessel
Meij Edgar
Trieschnigg Rudolf Berend
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

University of Twente Research Information

Phonographic neighbors, not orthographic neighbors, determine word naming latencies

Author: Adelman James S.
Brown G. D. A. (Gordon D. A.)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

The orthographic neighborhood size (N) of a word—the number of words that can be formed from that word by replacing one letter with another in its place—has been found to have facilitatory effects in word naming. The orthographic neighborhood hypothesis attributes this facilitation to interactive effects. A phonographic neighborhood hypothesis, in contrast, attributes the effect to lexical print-sound conversion. According to the phonographic neighborhood hypothesis, phonographic neighbors (words differing in one letter and one phoneme, e.g., stove and stone) should facilitate naming, and other orthographic neighbors (e.g., stove and shove) should not. The predictions of these two hypotheses are tested. Unique facilitatory phonographic N effects were found in four sets of word naming mega-study data, along with an absence of facilitatory orthographic N effects. These results implicate print-sound conversion—based on consistent phonology—in neighborhood effects rather than word-letter feedback

Warwick Research Archives Portal Repository

A Cross-lingual Framework for Monolingual Biomedical Information Retrieval

Author: de Jong Franciska M.G.
Hiemstra Djoerd
Kraaij Wessel
Trieschnigg Rudolf Berend
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

University of Twente Research Information