4 research outputs found

    Community-based ranking of the social web

    Full text link

    Modeling the Evolving Structure of Social Text for Information Extraction and Topic Detection

    Get PDF
    The advent of “social media” has enabled millions of people to participate in discussions within communities on a global scale. These conversations take place in a myriad of venues, on or off the web, each with its particular approach to implement what we now call “social media” – blogs, bulletin boards, mailing lists. However, while the software powering these communities varies a great deal, and continues to evolve, all of them share a common set of features. When a user initiates a discussion, the message is not addressed to a specific person, but broadcast to any interested reader; such a message can generate replies from other users, and these replies can then generate their own, forming a network of connections between messages. There is a need for a system that can make connections between related pieces of social text, to group information into coherent units. Making use of the structure of the social text helps to determine which elements of the text to consider for a given topic. To do this, a system needs to consider the different contexts in which it can be understood. A post, text transmitted by a single author at the same point in time, may have a different topic than the whole thread, which is comprised of all the posts in the discussion following an initial post. Different passages in a post could also have separate topics. Therefore, it is useful to annotate the text with information about its social structure explicitly for use in automatic search and text mining

    Mining Web Dynamics for Search

    Get PDF
    Billions of web users collectively contribute to a dynamic web that preserves how information sources and descriptions change over time. This dynamic process sheds light on the quality of web content, and even indicates the temporal properties of information needs expressed via queries. However, existing commercial search engines typically utilize one crawl of web content (the latest) without considering the complementary information concealed in web dynamics. As a result, the generated rankings may be biased due to the efficiency of knowledge on page or hyperlink evolution, and the time-sensitive facet within search quality, e.g., freshness, has to be neglected. While previous research efforts have been focused on exploring the temporal dimension in retrieval process, few of them showed consistent improvements on large-scale real-world archival web corpus with a broad time span.We investigate how to utilize the changes of web pages and hyperlinks to improve search quality, in terms of freshness and relevance of search results. Three applications that I have focused on are: (1) document representation, in which the anchortext (short descriptive text associated with hyperlinks) importance is estimated by considering its historical status; (2) web authority estimation, in which web freshness is quantified and utilized for controlling the authority propagation; and (3) learning to rank, in which freshness and relevance are optimized simultaneously in an adaptive way depending on query type. The contributions of this thesis are: (1) incorporate web dynamics information into critical components within search infrastructure in a principled way; and (2) empirically verify the proposed methods by conducting experiments based on (or depending on) a large-scale real-world archival web corpus, and demonstrated their superiority over existing state-of-the-art
    corecore