24,109 research outputs found
Extending, trimming and fusing WordNet for technical documents
This paper describes a tool for the automatic
extension and trimming of a multilingual
WordNet database for cross-lingual retrieval
and multilingual ontology building in
intranets and domain-specific document
collections. Hierarchies, built from
automatically extracted terms and combined
with the WordNet relations, are trimmed
with a disambiguation method based on the
document salience of the words in the
glosses. The disambiguation is tested in a
cross-lingual retrieval task, showing
considerable improvement (7%-11%). The
condensed hierarchies can be used as
browse-interfaces to the documents
complementary to retrieval
Enriching very large ontologies using the WWW
This paper explores the possibility to exploit text on the world wide web in
order to enrich the concepts in existing ontologies. First, a method to
retrieve documents from the WWW related to a concept is described. These
document collections are used 1) to construct topic signatures (lists of
topically related words) for each concept in WordNet, and 2) to build
hierarchical clusters of the concepts (the word senses) that lexicalize a given
word. The overall goal is to overcome two shortcomings of WordNet: the lack of
topical links among concepts, and the proliferation of senses. Topic signatures
are validated on a word sense disambiguation task with good results, which are
improved when the hierarchical clusters are used.Comment: 6 page
Automatically organising images using concept hierarchies
In this paper we discuss the use of concept hierarchies, an approach to automatically organize a set of documents based upon a set of concepts derived from the documents themselves for image retrieval. Co-occurrence between terms associated with image captions and a statistical relation called subsumption are used to generate term clusters which are organized hierarchically. Previously, the approach has been studied for document retrieval and results have shown that automatically generating hierarchies can help users with their search task. In this paper we present an implementation of concept hierarchies for image retrieval, together with preliminary ad-hoc evaluation. Although our approach requires more investigation, initial results from a prototype system are promising and would appear to provide a useful summary of the search results
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata
Many social Web sites allow users to annotate the content with descriptive
metadata, such as tags, and more recently to organize content hierarchically.
These types of structured metadata provide valuable evidence for learning how a
community organizes knowledge. For instance, we can aggregate many personal
hierarchies into a common taxonomy, also known as a folksonomy, that will aid
users in visualizing and browsing social content, and also to help them in
organizing their own content. However, learning from social metadata presents
several challenges, since it is sparse, shallow, ambiguous, noisy, and
inconsistent. We describe an approach to folksonomy learning based on
relational clustering, which exploits structured metadata contained in personal
hierarchies. Our approach clusters similar hierarchies using their structure
and tag statistics, then incrementally weaves them into a deeper, bushier tree.
We study folksonomy learning using social metadata extracted from the
photo-sharing site Flickr, and demonstrate that the proposed approach addresses
the challenges. Moreover, comparing to previous work, the approach produces
larger, more accurate folksonomies, and in addition, scales better.Comment: 10 pages, To appear in the Proceedings of ACM SIGKDD Conference on
Knowledge Discovery and Data Mining(KDD) 201
On the use of clustering and the MeSH controlled vocabulary to improve MEDLINE abstract search
Databases of genomic documents contain substantial amounts of structured information in addition to the texts of titles and abstracts. Unstructured information retrieval techniques fail to take advantage of the structured information available. This paper describes a technique to
improve upon traditional retrieval methods by clustering the retrieval result set into two distinct clusters using additional structural information. Our hypothesis is that the relevant documents are to be found in the tightest cluster of the two, as suggested by van Rijsbergen's cluster
hypothesis. We present an experimental evaluation of these ideas based on the relevance judgments of the 2004 TREC workshop Genomics track, and the CLUTO software clustering
package
- …