Search CORE

1,268 research outputs found

Retrieving with good sense

Author: Sanderson M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2000
Field of study

Although always present in text, word sense ambiguity only recently became regarded as a problem to information retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in disambiguation research. This paper first outlines this research and surveys the resulting efforts in information retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval

CiteSeerX

White Rose Research Online

Publindex: Aweb service to automatically evaluate research publications according to customized criteria

Author: Bobed C.
Juan A.
Mena E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

We introduce Publindex, a system that retrieves, classifies, and returns research publications of a given researcher according to the criteria and in the format predefined by the user

Repositorio Universidad de Zaragoza

Word sense disambiguation and information retrieval

Author: Sanderson Mark
Publication venue
Publication date: 01/01/1996
Field of study

Starting with a review of previous research that attempted to improve the representation of documents in IR systems, this research is reassessed in the light of word sense ambiguity. It will be shown that a number of the attempts' successes or failures were due to the noticing or ignoring of ambiguity. In the review of disambiguation research, many varied techniques for performing automatic disambiguities are introduced. Research on the disambiguating abilities of people is presented also. It has been found that people are inconsistent when asked to disambiguate words and this causes problems when testing the output of an automatic disambiguator. The first of two sets of experiments to investigate the relationship between ambiguity, disambiguation, and IR, involves a technique where ambiguity and disambiguation can be simulated in a document collection. The results of these experiments lead to the conclusions that query size plays an important role in the relationship between ambiguity and IR. Retrievals based on very small queries suffer particularly from ambiguity and benefit most from disambiguation. Other queries, however, contain a sufficient number of words to provide a form of context that implicitly resolves the query word's ambiguities. In general, ambiguity is found to be not as great a problem to IR systems as might have been thought and the errors made by a disambiguator can be more of a problem than the ambiguity it is trying to resolve. In the complementary second set of experiments, a disambiguator is built and tested, it is applied to a document test collection, and an IR system is adjusted to accommodate the sense information in the collection. The conclusions of these experiments are found to broadly confirm those of the previous set

Glasgow Theses Service

CiteSeerX

Implementation of a knowledge discovery and enhancement module from structured information gained from unstructured sources of information

Author: Costa Celso Ricardo Martins Maia
Publication venue
Publication date: 01/01/2010
Field of study

Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

Repositório Aberto da Universidade do Porto

Applying Wikipedia to Interactive Information Retrieval

Author: Milne David N.
Publication venue: 'University of Waikato'
Publication date: 15/09/2010
Field of study

There are many opportunities to improve the interactivity of information retrieval systems beyond the ubiquitous search box. One idea is to use knowledge bases—e.g. controlled vocabularies, classification schemes, thesauri and ontologies—to organize, describe and navigate the information space. These resources are popular in libraries and specialist collections, but have proven too expensive and narrow to be applied to everyday webscale search. Wikipedia has the potential to bring structured knowledge into more widespread use. This online, collaboratively generated encyclopaedia is one of the largest and most consulted reference works in existence. It is broader, deeper and more agile than the knowledge bases put forward to assist retrieval in the past. Rendering this resource machine-readable is a challenging task that has captured the interest of many researchers. Many see it as a key step required to break the knowledge acquisition bottleneck that crippled previous efforts. This thesis claims that the roadblock can be sidestepped: Wikipedia can be applied effectively to open-domain information retrieval with minimal natural language processing or information extraction. The key is to focus on gathering and applying human-readable rather than machine-readable knowledge. To demonstrate this claim, the thesis tackles three separate problems: extracting knowledge from Wikipedia; connecting it to textual documents; and applying it to the retrieval process. First, we demonstrate that a large thesaurus-like structure can be obtained directly from Wikipedia, and that accurate measures of semantic relatedness can be efficiently mined from it. Second, we show that Wikipedia provides the necessary features and training data for existing data mining techniques to accurately detect and disambiguate topics when they are mentioned in plain text. Third, we provide two systems and user studies that demonstrate the utility of the Wikipedia-derived knowledge base for interactive information retrieval

Research Commons@Waikato

Recommended from our members

Entity-based Enrichment for Information Extraction and Retrieval

Author: Dalton Jeffrey
Publication venue: ScholarWorks@UMass Amherst
Publication date: 29/08/2014
Field of study

The goal of this work is to leverage cross-document entity relationships for improved understanding of queries and documents. We define an entity to be a thing or concept that exists in the world, such as a politician, a battle, a film, or a color. Entity-based enrichment (EBE) is a new expansion model for both queries and documents using features from similar entitymentions in the document collection and external knowledge resources. It uses task-specific features from entities beyond words that include: name aliases, fine-grained entity types, categories, and relationships to other entities. EBE addresses the problem of sparse or noisy local evidence due to multiple topics, implicit context, or informal writing. With the ultimate goal of improving information retrieval effectiveness, we start from unstructured text and through information extraction build up rich entity-based representations linked to external knowledge resources. We study the application ofentity-based enrichment to each step in the pipeline: 1) Named entity recognition, 2) Entity linking, and 3) Ad hoc document retrieval. The empirical results for EBE in each of these tasks shows significant improvements. Our first application of entity-based enrichment is the task of detecting entities in named entity recognition. We enrich the representation of observed words likely to represent entities. We perform weighted feature copying of recognition features from similar tokens in the corpus and external collections. The evaluation shows statistically significant improvements on in-domain newswire accuracy and demonstrates that the models are more robust on out-of-domain data. In the second part of this work, we apply EBE to the task of entity linking. The proposed entity linking method for disambiguating the detected mentions to entries in an external knowledge base is based on information retrieval. Theneighborhood relevance model, an enrichment model, identifies salient associations between an entity mention and otherentity mentions in the document. The results show that the enrichment model outperforms other context models and results in a system that is competitive with leading methods. Using the constructed entity representation, the final task is ad hoc document retrieval. We enrich the query representation with entity features. Retrieval is performed over documents annotated with entities linked to Wikipedia and Freebase from our system as well as the publicly available Google FACC1 annotations. To effectively leverage linked entity features, we extend dependency-based retrieval models to include structured attributes. We also define a new query-specific entity context model which builds a model for disambiguated entities from retrieved documents. Our results demonstrate significant improvements over competitive query expansion baselines for several standard test collections

ScholarWorks@UMass Amherst