Search CORE

4 research outputs found

Iterative Optimization and Simplification of Hierarchical Clusterings

Author: Fisher D.
Publication venue
Publication date: 01/01/1995
Field of study

Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high quality, but be computationally inexpensive as well. In general, we cannot have it both ways, but we can partition the search so that a system inexpensively constructs a `tentative' clustering for initial examination, followed by iterative optimization, which continues to search in background for improved clusterings. Given this motivation, we evaluate an inexpensive strategy for creating initial clusterings, coupled with several control strategies for iterative optimization, each of which repeatedly modifies an initial clustering in search of a better one. One of these methods appears novel as an iterative optimization strategy in clustering contexts. Once a clustering has been constructed it is judged by analysts -- often according to task-specific criteria. Several authors have abstracted these criteria and posited a generic performance task akin to pattern completion, where the error rate over completed patterns is used to `externally' judge clustering utility. Given this performance task, we adapt resampling-based pruning strategies used by supervised learning systems to the task of simplifying hierarchical clusterings, thus promising to ease post-clustering analysis. Finally, we propose a number of objective functions, based on attribute-selection measures for decision-tree induction, that might perform well on the error rate and simplicity dimensions.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Incremental and Scalable Computation of Dynamic Topography Information Landscapes

Author: Gindl Stefan
Kroell Mark
Sabol Vedran
Scharl Arno
Syed Kamran A.A.
Publication venue: DLINE
Publication date: 01/01/2012
Field of study

Dynamic topography information landscapes are capable of visualizing longitudinal changes in large document repositories. Resembling tectonic processes in the natural world, dynamic rendering reflects both long-term trends and short-term fluctuations in such repositories. To visualize the rise and decay of topics, the mapping algorithm elevates and lowers related sets of concentric contour lines. Acknowledging the growing number of documents to be processed by state-of-the-art Web intelligence applications, we present a scalable, incremental approach for generating such landscapes. The processing pipeline includes a number of sequential tasks, from crawling, filtering and pre-processing Web content to projecting, labeling and rendering the aggregated information. Processing steps central to incremental processing are found in the projection stage which consists of document clustering, cluster force-directed placement, and fast document positioning. We introduce two different positioning methods and compare them in an incremental setting using two different quality measures. The evaluation is performed on a set of approximately 5000 documents taken from the environmental blog sample of the Media Watch on Climate Change (www.ecoresearch.net/climate), a Web content aggregator about climate change and related environmental issues that serves static versions of the information landscapes presented in this paper as part of a multiple coordinated view representation

webLyzard technology gmbh

An evaluation of machine learning techniques in intrusion detection

Author: Lee Christina Mei-Fang
Publication venue: VANDERBILT
Publication date
Field of study