BlogHarvest: Blog mining and search framework

Abstract

Beyond serving as online diaries, weblogs have evolved into complex social structures. Blogging software allows users to publish opinions on any topic without any constraints on the predefined schema. Analysis of linkage between blogs has indicated that community forming in blogosphere is not a random process but is a result of shared interests binding bloggers together. Learning, analysis and usage of the user's interest and social linkage from the blog is therefore necessary to provide useful search faculty on the blogosphere to bloggers and revenue generation opportunities like advertising to the blog service providers. In this paper, we demonstrate BlogHarvest which is a blog mining and search framework that extracts the interests of the blogger, finds and recommends blogs with similar topics and provides blog oriented search functionality. BlogHarvest uses classification, linkage & topic similarity based clustering and POS tagging based opinion mining for providing these features. Novel search interface is built to provide related blogs for queries along with the usual result ranking. Association rules found from POS tags are used to get the context of search for providing query expansion to get targeted results. By crawling the blogosphere and extract & index blog posts and linkage metadata; we have analyzed around 50000 blogs to tune our algorithms

    Similar works

    Full text

    thumbnail-image

    Available Versions