Article thumbnail

Scalable Clustering of Documents with Multiple Membership

By Jack Newton and Chris O&apos

Abstract

Document clustering has recently garnered a large amount of attention from the IR, data mining, and machine learning research communities as an effective way of not only organizing textual information, but also for discovering interesting patterns in that information. Most existing methods, however, suffer from two main drawbacks. First, most clustering algorithms are very restrictive, as documents are only allowed to participate in a single cluster. Allowing documents to participate in more than one cluster is important in the context of document clustering, since a document can often span more than two topics or concepts. Second, most existing methods cannot scale to very large document collections

Year: 2007
OAI identifier: oai:CiteSeerX.psu:10.1.1.18.4638
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.cs.ualberta.ca/~new... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.