532 research outputs found
Probabilistic learning for selective dissemination of information
New methods and new systems are needed to filter or to selectively distribute the increasing volume of electronic information being produced nowadays. An effective information filtering system is one that provides the exact information that fulfills user's interests with the minimum effort by the user to describe it. Such a system will have to be adaptive to the user changing interest. In this paper we describe and evaluate a learning model for information filtering which is an adaptation of the generalized probabilistic model of information retrieval. The model is based on the concept of 'uncertainty sampling', a technique that allows for relevance feedback both on relevant and nonrelevant documents. The proposed learning model is the core of a prototype information filtering system called ProFile
Towards a belief revision based adaptive and context sensitive information retrieval system
In an adaptive information retrieval (IR) setting, the information seekers' beliefs about which terms are relevant or nonrelevant will naturally fluctuate. This article investigates how the theory of belief revision can be used to model adaptive IR. More specifically, belief revision logic provides a rich representation scheme to formalize retrieval contexts so as to disambiguate vague user queries. In addition, belief revision theory underpins the development of an effective mechanism to revise user profiles in accordance with information seekers' changing information needs. It is argued that information retrieval contexts can be extracted by means of the information-flow text mining method so as to realize a highly autonomous adaptive IR system. The extra bonus of a belief-based IR model is that its retrieval behavior is more predictable and explanatory. Our initial experiments show that the belief-based adaptive IR system is as effective as a classical adaptive IR system. To our best knowledge, this is the first successful implementation and evaluation of a logic-based adaptive IR model which can efficiently process large IR collections
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction
<p>Abstract</p> <p>Background</p> <p>This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second extraction scheme (GOEx) uses an automatic text categorizer to estimate the density of Gene Ontology categories in every sentence; thus providing a full ranking of all possible candidate GeneRiFs. A combination of the two approaches is proposed, which also aims at reducing the size of the selected segment by filtering out non-content bearing rhetorical phrases.</p> <p>Results</p> <p>Based on the TREC-2003 Genomics collection for GeneRiF identification, the LASt extraction strategy is already competitive (52.78%). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 57% (+10%).</p> <p>Conclusions</p> <p>Argumentative representation levels and conceptual density estimation using Gene Ontology contents appear complementary for functional annotation in proteomics.</p
Recommended from our members
Hierarchical classification for multiple, distributed web databases
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Our research aims to provide an alternative hierarchical categorization and search capability based on a Bayesian network learning algorithm. Our proposed approach, which is grounded on automatic textual analysis of subject content of online web databases, attempts to address the database selection problem by first classifying web databases into a hierarchy of topic categories. The experimental results reported demonstrate that such a classification approach not only effectively reduces the class search space, but also helps to significantly improve the accuracy of classification performance
Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval
Although more and more language pairs are covered by machine translation
services, there are still many pairs that lack translation resources.
Cross-language information retrieval (CLIR) is an application which needs
translation functionality of a relatively low level of sophistication since
current models for information retrieval (IR) are still based on a
bag-of-words. The Web provides a vast resource for the automatic construction
of parallel corpora which can be used to train statistical translation models
automatically. The resulting translation models can be embedded in several ways
in a retrieval model. In this paper, we will investigate the problem of
automatically mining parallel texts from the Web and different ways of
integrating the translation models within the retrieval process. Our
experiments on standard test collections for CLIR show that the Web-based
translation models can surpass commercial MT systems in CLIR tasks. These
results open the perspective of constructing a fully automatic query
translation device for CLIR at a very low cost.Comment: 37 page
- …