Search CORE

41,180 research outputs found

Recommended from our members

Elements of latent learning in a maze environment

Author: Granger Richard H.
McNulty Dale
Publication venue: eScholarship, University of California
Publication date: 03/01/1985
Field of study

A general purpose learning program is described which demonstrates a latent learning ability by operating at two separate goal pursuit levels. At one level are the constant, implicit goals associated with the system's memory management mechanisms. At the higher level are the dynamic, explicit behavioral goals which the implicit goals enable by manipulating memory representations to conform to the external surroundings. The program is shown to negotiate a simulated maze environment by the step-wise refinement of its latently learned experiences

eScholarship - University of California

Topic based language models for ad hoc information retrieval

Author: Azzopardi L.
Girolami M.
Van Rijsbergen C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

We propose a topic based approach lo language modelling for ad-hoc Information Retrieval (IR). Many smoothed estimators used for the multinomial query model in IR rely upon the estimated background collection probabilities. In this paper, we propose a topic based language modelling approach, that uses a more informative prior based on the topical content of a document. In our experiments, the proposed model provides comparable IR performance to the standard models, but when combined in a two stage language model, it outperforms all other estimated models

University of Strathclyde Institutional Repository

UCL Discovery

Enlighten

Chi-square-based scoring function for categorization of MEDLINE citations

Author: Hristovski Dimitar
Kastrin Andrej
Peterlin Borut
Publication venue: 'Georg Thieme Verlag KG'
Publication date: 01/01/2010
Field of study

Objectives: Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interest. We developed a simple method that uses a chi-square-based scoring function to determine the likelihood of MEDLINE citations containing genetic relevant topic. Methods: Our procedure requires construction of a genetic and a nongenetic domain document corpus. We used MeSH descriptors assigned to MEDLINE citations for this categorization task. We compared frequencies of MeSH descriptors between two corpora applying chi-square test. A MeSH descriptor was considered to be a positive indicator if its relative observed frequency in the genetic domain corpus was greater than its relative observed frequency in the nongenetic domain corpus. The output of the proposed method is a list of scores for all the citations, with the highest score given to those citations containing MeSH descriptors typical for the genetic domain. Results: Validation was done on a set of 734 manually annotated MEDLINE citations. It achieved predictive accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method by comparing it to three machine learning algorithms (support vector machines, decision trees, na\"ive Bayes). Although the differences were not statistically significantly different, results showed that our chi-square scoring performs as good as compared machine learning algorithms. Conclusions: We suggest that the chi-square scoring is an effective solution to help categorize MEDLINE citations. The algorithm is implemented in the BITOLA literature-based discovery support system as a preprocessor for gene symbol disambiguation process.Comment: 34 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Investigating the relationship between language model perplexity and IR precision-recall measures

Author: Azzopardi L.
Girolami M.
Van Rijsbergen K.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

An empirical study has been conducted investigating the relationship between the performance of an aspect based language model in terms of perplexity and the corresponding information retrieval performance obtained. It is observed, on the corpora considered, that the perplexity of the language model has a systematic relationship with the achievable precision recall performance though it is not statistically significant

CiteSeerX

Crossref

Enlighten

Combining and selecting characteristics of information use

Author: Lalmas M.
Ruthven I.
van Rijsbergen C.J.
Publication venue
Publication date: 01/01/2002
Field of study

In this paper we report on a series of experiments designed to investigate the combination of term and document weighting functions in Information Retrieval. We describe a series of weighting functions, each of which is based on how information is used within documents and collections, and use these weighting functions in two types of experiments: one based on combination of evidence for ad-hoc retrieval, the other based on selective combination of evidence within a relevance feedback situation. We discuss the difficulties involved in predicting good combinations of evidence for ad-hoc retrieval, and suggest the factors that may lead to the success or failure of combination. We also demonstrate how, in a relevance feedback situation, the relevance assessments can provide a good indication of how evidence should be selected for query term weighting. The use of relevance information to guide the combination process is shown to reduce the variability inherent in combination of evidence

University of Strathclyde Institutional Repository