2 research outputs found

    Online Data Preprocessing: A Case Study Approach

    Get PDF
    Besides the Internet search facility and e-mails, social networking is now one of the three best uses of the Internet. A tremendous number of volunteers every day write articles, share photos, videos and links at a scope and scale never imagined before. However, because social network data are huge and come from heterogeneous sources, the data are highly susceptible to inconsistency, redundancy, noise, and loss. For data scientists, preparing the data and getting it into a standard format is critical because the quality of data is going to directly affect the performance of mining algorithms that are going to be applied next. Low-quality data will certainly limit the analysis and lower the quality of mining results. To this end, the goal of this study is to provide an overview of the different phases involved in data preprocessing, with a focus on social network data. As a case study, we will show how we applied preprocessing to the data that we collected for the Malaysian Flight MH370 that disappeared in 2014

    The spread of media content through blogs

    Get PDF
    Blogs are a popular way to share personal journals, discuss matters of public opinion, pursue collaborative conversations, and aggregate content on similar topics. Blogs can be also used to disseminate new content and novel ideas to communities of interest. In this paper, we present an analysis of the topological structure and the patterns of popular media content that is shared in blogs. By analyzing 8.7 million posts of 1.1 million blogs across 15 major blog hosting sites, we find that the network structure of blogs is “less social” compared to other online social networks: most links are unidirectional and the network is sparsely connected. The type of content that was popularly shared on blogs was surprisingly different from that from the mainstream media: user generated content, often in the form of videos or photos, was the most common type of content disseminated in blogs. The user-generated content showed interesting viral-spreading patterns within blogs. Topical content such as news and political commentary spreads quickly by the hour and then quickly disappears, while non-topical content such as music and entertainment propagates slowly over a much long period of time
    corecore