3 research outputs found
Evaluating keyword selection methods for WEBSOM text archives
The WEBSOM methodology, proven effective for building very large text archives, includes a method that extracts labels for each document cluster assigned to nodes in the map. However, the WEBSOM method needs to retrieve all the words of all the documents associated to each node. Since maps may have more than 100,000 nodes and since the archive may contain up to seven million documents, the WEBSOM methodology needs a faster alternative method for keyword selection. Presented here is such an alternative method that is abie to quickly deduce meaningful labels per node in the map. It does this just by analyzing the relative weight distribution of the SOM weight vectors and by taking advantage of some characteristics of the random projection method used in dimensionality reduction. The effectiveness of this technique is demonstrated on news document collections
Evaluating keyword selection methods for WEBSOM text archives
10.1109/TKDE.2003.1262193IEEE Transactions on Knowledge and Data Engineering163380-383ITKE
Recommended from our members
Comparison of Explicit and Implicit Keywords to Characterize Geographic Information System Procedures
The author designs and implements an approach that exploits semantically important information that is not ordinarily included in traditional information retrieval approaches to improve the handling of Geographic Information System (GIS) procedural software. In this approach, what are termed here implicit keywords, descriptors designed to recognize characteristics not explicitly recorded within the GIS procedure source code, are created and used in an automated, inductive process to organize a large set of GIS procedures to reveal meaningful groupings. The process uses the Self-Organizing Maps (SOM), a specialized artificial neural network, to create a two-dimensional representation of an input data set wherein topological properties of the input data set are preserved. Such maps are important tools for helping visualize, browse, filter, and evaluate a set of GIS procedures . Browsing, filtering, and evaluation help to improve human understanding of available GIS resources. By facilitating mechanisms for improved software sharing and exchange, the methods described here may guide future researchers in the selection of more appropriate procedures for a given task. Through experiments of this dissertation, the author demonstrates that while using GIS commands as explicit keywords can produce helpful organizations of GIS procedures, development of implicit keywords can be used to moderate, improve, and specialize the results of the explicit keyword process. The results of the different experiments not only show the impacts of applying different keyword schemes, but bear witness to the fact that GIS functionality can be organized with consistent methodological rigor in potentially very different ways to reprioritize specific types of functionality