183,080 research outputs found
Concept-based Interactive Query Expansion Support Tool (CIQUEST)
This report describes a three-year project (2000-03) undertaken in the Information Studies
Department at The University of Sheffield and funded by Resource, The Council for
Museums, Archives and Libraries. The overall aim of the research was to provide user
support for query formulation and reformulation in searching large-scale textual resources
including those of the World Wide Web. More specifically the objectives were: to investigate
and evaluate methods for the automatic generation and organisation of concepts derived from
retrieved document sets, based on statistical methods for term weighting; and to conduct
user-based evaluations on the understanding, presentation and retrieval effectiveness of
concept structures in selecting candidate terms for interactive query expansion.
The TREC test collection formed the basis for the seven evaluative experiments conducted in
the course of the project. These formed four distinct phases in the project plan. In the first
phase, a series of experiments was conducted to investigate further techniques for concept
derivation and hierarchical organisation and structure. The second phase was concerned with
user-based validation of the concept structures. Results of phases 1 and 2 informed on the
design of the test system and the user interface was developed in phase 3. The final phase
entailed a user-based summative evaluation of the CiQuest system.
The main findings demonstrate that concept hierarchies can effectively be generated from
sets of retrieved documents and displayed to searchers in a meaningful way. The approach
provides the searcher with an overview of the contents of the retrieved documents, which in
turn facilitates the viewing of documents and selection of the most relevant ones. Concept
hierarchies are a good source of terms for query expansion and can improve precision. The
extraction of descriptive phrases as an alternative source of terms was also effective. With
respect to presentation, cascading menus were easy to browse for selecting terms and for
viewing documents. In conclusion the project dissemination programme and future work are
outlined
Recommended from our members
Hierarchical classification for multiple, distributed web databases
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Our research aims to provide an alternative hierarchical categorization and search capability based on a Bayesian network learning algorithm. Our proposed approach, which is grounded on automatic textual analysis of subject content of online web databases, attempts to address the database selection problem by first classifying web databases into a hierarchy of topic categories. The experimental results reported demonstrate that such a classification approach not only effectively reduces the class search space, but also helps to significantly improve the accuracy of classification performance
Beyond Keywords and Relevance: A Personalized Ad Retrieval Framework in E-Commerce Sponsored Search
On most sponsored search platforms, advertisers bid on some keywords for
their advertisements (ads). Given a search request, ad retrieval module
rewrites the query into bidding keywords, and uses these keywords as keys to
select Top N ads through inverted indexes. In this way, an ad will not be
retrieved even if queries are related when the advertiser does not bid on
corresponding keywords. Moreover, most ad retrieval approaches regard rewriting
and ad-selecting as two separated tasks, and focus on boosting relevance
between search queries and ads. Recently, in e-commerce sponsored search more
and more personalized information has been introduced, such as user profiles,
long-time and real-time clicks. Personalized information makes ad retrieval
able to employ more elements (e.g. real-time clicks) as search signals and
retrieval keys, however it makes ad retrieval more difficult to measure ads
retrieved through different signals. To address these problems, we propose a
novel ad retrieval framework beyond keywords and relevance in e-commerce
sponsored search. Firstly, we employ historical ad click data to initialize a
hierarchical network representing signals, keys and ads, in which personalized
information is introduced. Then we train a model on top of the hierarchical
network by learning the weights of edges. Finally we select the best edges
according to the model, boosting RPM/CTR. Experimental results on our
e-commerce platform demonstrate that our ad retrieval framework achieves good
performance
Substructure Discovery Using Minimum Description Length and Background Knowledge
The ability to identify interesting and repetitive substructures is an
essential component to discovering knowledge in structural data. We describe a
new version of our SUBDUE substructure discovery system based on the minimum
description length principle. The SUBDUE system discovers substructures that
compress the original data and represent structural concepts in the data. By
replacing previously-discovered substructures in the data, multiple passes of
SUBDUE produce a hierarchical description of the structural regularities in the
data. SUBDUE uses a computationally-bounded inexact graph match that identifies
similar, but not identical, instances of a substructure and finds an
approximate measure of closeness of two substructures when under computational
constraints. In addition to the minimum description length principle, other
background knowledge can be used by SUBDUE to guide the search towards more
appropriate substructures. Experiments in a variety of domains demonstrate
SUBDUE's ability to find substructures capable of compressing the original data
and to discover structural concepts important to the domain. Description of
Online Appendix: This is a compressed tar file containing the SUBDUE discovery
system, written in C. The program accepts as input databases represented in
graph form, and will output discovered substructures with their corresponding
value.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
Geographical information retrieval with ontologies of place
Geographical context is required of many information retrieval tasks in which the target of the search may be documents, images or records which are referenced to geographical space only by means of place names. Often there may be an imprecise match between the query name and the names associated with candidate sources of information. There is a need therefore for geographical information retrieval facilities that can rank the relevance of candidate information with respect to geographical closeness of place as well as semantic closeness with respect to the information of interest. Here we present an ontology of place that combines limited coordinate data with semantic and qualitative spatial relationships between places. This parsimonious model of geographical place supports maintenance of knowledge of place names that relate to extensive regions of the Earth at multiple levels of granularity. The ontology has been implemented with a semantic modelling system linking non-spatial conceptual hierarchies with the place ontology. An hierarchical spatial distance measure is combined with Euclidean distance between place centroids to create a hybrid spatial distance measure. This is integrated with thematic distance, based on classification semantics, to create an integrated semantic closeness measure that can be used for a relevance ranking of retrieved objects
Visualizing and Interacting with Concept Hierarchies
Concept Hierarchies and Formal Concept Analysis are theoretically well
grounded and largely experimented methods. They rely on line diagrams called
Galois lattices for visualizing and analysing object-attribute sets. Galois
lattices are visually seducing and conceptually rich for experts. However they
present important drawbacks due to their concept oriented overall structure:
analysing what they show is difficult for non experts, navigation is
cumbersome, interaction is poor, and scalability is a deep bottleneck for
visual interpretation even for experts. In this paper we introduce semantic
probes as a means to overcome many of these problems and extend usability and
application possibilities of traditional FCA visualization methods. Semantic
probes are visual user centred objects which extract and organize reduced
Galois sub-hierarchies. They are simpler, clearer, and they provide a better
navigation support through a rich set of interaction possibilities. Since probe
driven sub-hierarchies are limited to users focus, scalability is under control
and interpretation is facilitated. After some successful experiments, several
applications are being developed with the remaining problem of finding a
compromise between simplicity and conceptual expressivity
Towards Context Driven Modularization of Large Biomedical Ontologies
Formal knowledge about human anatomy, radiology or diseases is necessary to support clinical applications such as medical image search. This machine processable knowledge can be acquired from biomedical domain ontologies, which however, are typically very large and complex models. Thus, their straightforward incorporation into the software applications becomes difficult. In this paper we discuss first ideas on a statistical approach for modularizing large medical ontologies and we prioritize the practical applicability aspect. The underlying assumption is that the application relevant ontology fragments, i.e. modules, can be identified by the statistical analysis of the ontology concepts in the domain corpus. Accordingly, we argue that most frequently occurring concepts in the domain corpus define the application context and can therefore potentially yield the relevant ontology modules. We illustrate our approach on an example case that involves a large ontology on human anatomy and report on our first manual experiments
Enriching very large ontologies using the WWW
This paper explores the possibility to exploit text on the world wide web in
order to enrich the concepts in existing ontologies. First, a method to
retrieve documents from the WWW related to a concept is described. These
document collections are used 1) to construct topic signatures (lists of
topically related words) for each concept in WordNet, and 2) to build
hierarchical clusters of the concepts (the word senses) that lexicalize a given
word. The overall goal is to overcome two shortcomings of WordNet: the lack of
topical links among concepts, and the proliferation of senses. Topic signatures
are validated on a word sense disambiguation task with good results, which are
improved when the hierarchical clusters are used.Comment: 6 page
- …