12,761 research outputs found
XML documents clustering using a tensor space model
The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information
Document Clustering with K-tree
This paper describes the approach taken to the XML Mining track at INEX 2008
by a group at the Queensland University of Technology. We introduce the K-tree
clustering algorithm in an Information Retrieval context by adapting it for
document clustering. Many large scale problems exist in document clustering.
K-tree scales well with large inputs due to its low complexity. It offers
promising results both in terms of efficiency and quality. Document
classification was completed using Support Vector Machines.Comment: 12 pages, INEX 200
Development and evaluation of clustering techniques for finding people
Typically in a large organisation much expertise and knowledge is held informally within employees' own memories. When employees leave an organisation many documented links that go through that person are broken and no mechanism is usually available to overcome these broken links. This match making problem is related to the problem of finding potential work partners in a large and distributed organisation. This paper reports a comparative investigation into using standard information retrieval techniques to group employees together based on their webpages. This information can, hopefully, be subsequently used to redirect broken links to people who worked closely with a departed employee or used to highlight people, say indifferent departments, who work on similar topics. The paper reports the design and positive results of an experiment conducted at Risø National Laboratory comparing four different IR searching and clustering approaches using real users' web pages
Comparing SVM and Naive Bayes classifiers for text categorization with Wikitology as knowledge enrichment
The activity of labeling of documents according to their content is known as
text categorization. Many experiments have been carried out to enhance text
categorization by adding background knowledge to the document using knowledge
repositories like Word Net, Open Project Directory (OPD), Wikipedia and
Wikitology. In our previous work, we have carried out intensive experiments by
extracting knowledge from Wikitology and evaluating the experiment on Support
Vector Machine with 10- fold cross-validations. The results clearly indicate
Wikitology is far better than other knowledge bases. In this paper we are
comparing Support Vector Machine (SVM) and Na\"ive Bayes (NB) classifiers under
text enrichment through Wikitology. We validated results with 10-fold cross
validation and shown that NB gives an improvement of +28.78%, on the other hand
SVM gives an improvement of +6.36% when compared with baseline results. Na\"ive
Bayes classifier is better choice when external enriching is used through any
external knowledge base.Comment: 5 page
- âŚ