269,786 research outputs found

    Study of document clustering using the k-means algorithm

    Full text link
    One of the most commonly used data mining techniques is document clustering or unsupervised document classification which deals with the grouping of documents based on some document similarity function; This thesis deals with research issues associated with categorizing documents using the k-means clustering algorithm which groups objects into K number of groups based on document representations and similarities; The proposed hypothesis of this thesis is to prove that unsupervised clustering of a set of documents produces similar results to that of their supervised categorization
    • …
    corecore