4,883 research outputs found
Visualising the structure of document search results: A comparison of graph theoretic approaches
This is the post-print of the article - Copyright @ 2010 Sage PublicationsPrevious work has shown that distance-similarity visualisation or âspatialisationâ can provide a potentially useful context in which to browse the results of a query search, enabling the user to adopt a simple local foraging or âcluster growingâ strategy to navigate through the retrieved document set. However, faithfully mapping feature-space models to visual space can be problematic owing to their inherent high dimensionality and non-linearity. Conventional linear approaches to dimension reduction tend to fail at this kind of task, sacrificing local structural in order to preserve a globally optimal mapping. In this paper the clustering performance of a recently proposed algorithm called isometric feature mapping (Isomap), which deals with non-linearity by transforming dissimilarities into geodesic distances, is compared to that of non-metric multidimensional scaling (MDS). Various graph pruning methods, for geodesic distance estimation, are also compared. Results show that Isomap is significantly better at preserving local structural detail than MDS, suggesting it is better suited to cluster growing and other semantic navigation tasks. Moreover, it is shown that applying a minimum-cost graph pruning criterion can provide a parameter-free alternative to the traditional K-neighbour method, resulting in spatial clustering that is equivalent to or better than that achieved using an optimal-K criterion
VITALAS at TRECVID-2008
In this paper, we present our experiments in TRECVID 2008 about High-Level feature extraction task. This is the first year for our participation in TRECVID, our system adopts some popular approaches that other workgroups proposed before. We proposed 2 advanced low-level features NEW Gabor texture descriptor and the Compact-SIFT Codeword histogram. Our system applied well-known LIBSVM to train the SVM classifier for the basic classifier. In fusion step, some methods were employed such as the Voting, SVM-base, HCRF and Bootstrap Average AdaBoost(BAAB)
Improving performance through concept formation and conceptual clustering
Research from June 1989 through October 1992 focussed on concept formation, clustering, and supervised learning for purposes of improving the efficiency of problem-solving, planning, and diagnosis. These projects resulted in two dissertations on clustering, explanation-based learning, and means-ends planning, and publications in conferences and workshops, several book chapters, and journals; a complete Bibliography of NASA Ames supported publications is included. The following topics are studied: clustering of explanations and problem-solving experiences; clustering and means-end planning; and diagnosis of space shuttle and space station operating modes
Iterative Optimization and Simplification of Hierarchical Clusterings
Clustering is often used for discovering structure in data. Clustering
systems differ in the objective function used to evaluate clustering quality
and the control strategy used to search the space of clusterings. Ideally, the
search strategy should consistently construct clusterings of high quality, but
be computationally inexpensive as well. In general, we cannot have it both
ways, but we can partition the search so that a system inexpensively constructs
a `tentative' clustering for initial examination, followed by iterative
optimization, which continues to search in background for improved clusterings.
Given this motivation, we evaluate an inexpensive strategy for creating initial
clusterings, coupled with several control strategies for iterative
optimization, each of which repeatedly modifies an initial clustering in search
of a better one. One of these methods appears novel as an iterative
optimization strategy in clustering contexts. Once a clustering has been
constructed it is judged by analysts -- often according to task-specific
criteria. Several authors have abstracted these criteria and posited a generic
performance task akin to pattern completion, where the error rate over
completed patterns is used to `externally' judge clustering utility. Given this
performance task, we adapt resampling-based pruning strategies used by
supervised learning systems to the task of simplifying hierarchical
clusterings, thus promising to ease post-clustering analysis. Finally, we
propose a number of objective functions, based on attribute-selection measures
for decision-tree induction, that might perform well on the error rate and
simplicity dimensions.Comment: See http://www.jair.org/ for any accompanying file
Techniques for clustering gene expression data
Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered
Recommended from our members
A framework for knowledge discovery within business intelligence for decision support
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Business Intelligence (BI) techniques provide the potential to not only efficiently manage but further analyse and apply the collected information in an effective manner. Benefiting from research both within industry and academia, BI provides functionality for accessing, cleansing, transforming, analysing and reporting organisational datasets. This provides further opportunities for the data to be explored and assist organisations in the discovery of correlations, trends and patterns that exist hidden within the data. This hidden information can be employed to provide an insight into opportunities to make an organisation more competitive by allowing manager to make more informed decisions and as a result, corporate resources optimally utilised. This potential insight provides organisations with an unrivalled opportunity to remain abreast of market trends. Consequently, BI techniques provide significant opportunity for integration with Decision Support Systems (DSS). The gap which was identified within the current body of knowledge and motivated this research, revealed that currently no suitable framework for BI, which can be applied at a meta-level and is therefore tool, technology and domain independent, currently exists. To address the identified gap this study proposes a meta-level framework: - âKDDS-BIâ, which can be applied at an abstract level and therefore structure a BI investigation, irrespective of the end user. KDDS-BI not only facilitates the selection of suitable techniques for BI investigations, reducing the reliance upon ad-hoc investigative approaches which rely upon âtrial and errorâ, yet further integrates Knowledge Management (KM) principles to ensure the retention and transfer of knowledge due to a structured approach to provide DSS that are based upon the principles of BI.
In order to evaluate and validate the framework, KDDS-BI has been investigated through three distinct case studies. First KDDS-BI facilitates the integration of BI within âDirect Marketingâ to provide innovative solutions for analysis based upon the most suitable BI technique. Secondly, KDDS-BI is investigated within sales promotion, to facilitate the selection of tools and techniques for more focused in store marketing campaigns and increase revenue through the discovery of hidden data, and finally, operations management is analysed within a highly dynamic and unstructured environment of the London Underground Ltd. network through unique a BI solution to organise and manage resources, thereby increasing the efficiency of business processes. The three case studies provide insight into not only how KDDS-BI provides structure to the integration of BI within business process, but additionally the opportunity to analyse the performance of KDDS-BI within three independent environments for distinct purposes provided structure through KDDS-BI thereby validating and corroborating the proposed framework and adding value to business processes
Advanced Data Mining Techniques for Compound Objects
Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in large data collections. The most important step within the process of KDD is data mining which is concerned with the extraction of the valid patterns. KDD is necessary to analyze the steady growing amount of data caused by the enhanced performance of modern computer systems. However, with the growing amount of data the complexity of data objects increases as well. Modern methods of KDD should therefore examine more complex objects than simple feature vectors to solve real-world KDD applications adequately. Multi-instance and multi-represented objects are two important types of object representations for complex objects. Multi-instance objects consist of a set of object representations that all belong to the same feature space. Multi-represented objects are constructed as a tuple of feature representations where each feature representation belongs to a different feature space.
The contribution of this thesis is the development of new KDD methods for the classification and clustering of complex objects. Therefore, the thesis introduces solutions for real-world applications that are based on multi-instance and
multi-represented object representations. On the basis of these solutions, it is shown that a more general object representation often provides better results for many relevant KDD applications.
The first part of the thesis is concerned with two KDD problems for which employing multi-instance objects provides efficient and effective solutions. The first is the data mining in CAD parts, e.g. the use of hierarchic clustering for the automatic construction of product hierarchies. The introduced solution decomposes a single part into a set of feature vectors and compares them by using a metric on multi-instance objects. Furthermore, multi-step query processing using a novel filter step is employed, enabling the user to efficiently process similarity queries. On the basis of this similarity search system, it is possible to perform several distance based data mining algorithms like the hierarchical clustering algorithm OPTICS to derive product hierarchies.
The second important application is the classification and search for complete websites in the world wide web (WWW). A website is a set of HTML-documents that is published by the same person, group or organization and usually serves a common purpose. To perform data mining for websites, the thesis presents several methods to classify websites. After introducing naive methods modelling websites as webpages, two more sophisticated approaches to website classification are introduced. The first approach uses a preprocessing that maps single HTML-documents within each website to so-called page classes. The second approach directly compares websites as sets of word vectors and uses nearest neighbor classification. To search the WWW for new, relevant websites, a focused crawler is introduced that efficiently retrieves relevant websites. This crawler minimizes the number of HTML-documents and increases the accuracy of website retrieval.
The second part of the thesis is concerned with the data mining in multi-represented objects. An important example application for this kind of complex objects are proteins that can be represented as a tuple of a protein sequence and a text annotation. To analyze multi-represented objects, a clustering method for multi-represented objects is introduced that is based on the density based clustering algorithm DBSCAN. This method uses all representations that are provided to find a global clustering of the given data objects. However, in many applications there already exists a sophisticated class ontology for the given data objects, e.g. proteins. To map new objects into an ontology a new
method for the hierarchical classification of multi-represented objects is described. The system employs the hierarchical structure of the ontology to efficiently classify new proteins, using support vector machines
- âŚ