12,761 research outputs found

    XML documents clustering using a tensor space model

    Get PDF
    The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information

    Document Clustering with K-tree

    Get PDF
    This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.Comment: 12 pages, INEX 200

    Development and evaluation of clustering techniques for finding people

    Get PDF
    Typically in a large organisation much expertise and knowledge is held informally within employees' own memories. When employees leave an organisation many documented links that go through that person are broken and no mechanism is usually available to overcome these broken links. This match making problem is related to the problem of finding potential work partners in a large and distributed organisation. This paper reports a comparative investigation into using standard information retrieval techniques to group employees together based on their webpages. This information can, hopefully, be subsequently used to redirect broken links to people who worked closely with a departed employee or used to highlight people, say indifferent departments, who work on similar topics. The paper reports the design and positive results of an experiment conducted at Risø National Laboratory comparing four different IR searching and clustering approaches using real users' web pages

    Comparing SVM and Naive Bayes classifiers for text categorization with Wikitology as knowledge enrichment

    Full text link
    The activity of labeling of documents according to their content is known as text categorization. Many experiments have been carried out to enhance text categorization by adding background knowledge to the document using knowledge repositories like Word Net, Open Project Directory (OPD), Wikipedia and Wikitology. In our previous work, we have carried out intensive experiments by extracting knowledge from Wikitology and evaluating the experiment on Support Vector Machine with 10- fold cross-validations. The results clearly indicate Wikitology is far better than other knowledge bases. In this paper we are comparing Support Vector Machine (SVM) and Na\"ive Bayes (NB) classifiers under text enrichment through Wikitology. We validated results with 10-fold cross validation and shown that NB gives an improvement of +28.78%, on the other hand SVM gives an improvement of +6.36% when compared with baseline results. Na\"ive Bayes classifier is better choice when external enriching is used through any external knowledge base.Comment: 5 page
    • …
    corecore