29,407 research outputs found

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Improving Reachability and Navigability in Recommender Systems

    Full text link
    In this paper, we investigate recommender systems from a network perspective and investigate recommendation networks, where nodes are items (e.g., movies) and edges are constructed from top-N recommendations (e.g., related movies). In particular, we focus on evaluating the reachability and navigability of recommendation networks and investigate the following questions: (i) How well do recommendation networks support navigation and exploratory search? (ii) What is the influence of parameters, in particular different recommendation algorithms and the number of recommendations shown, on reachability and navigability? and (iii) How can reachability and navigability be improved in these networks? We tackle these questions by first evaluating the reachability of recommendation networks by investigating their structural properties. Second, we evaluate navigability by simulating three different models of information seeking scenarios. We find that with standard algorithms, recommender systems are not well suited to navigation and exploration and propose methods to modify recommendations to improve this. Our work extends from one-click-based evaluations of recommender systems towards multi-click analysis (i.e., sequences of dependent clicks) and presents a general, comprehensive approach to evaluating navigability of arbitrary recommendation networks

    Stay Awhile and Listen: User Interactions in a Crowdsourced Platform Offering Emotional Support

    Get PDF
    Internet and online-based social systems are rising as the dominant mode of communication in society. However, the public or semi-private environment under which most online communications operate under do not make them suitable channels for speaking with others about personal or emotional problems. This has led to the emergence of online platforms for emotional support offering free, anonymous, and confidential conversations with live listeners. Yet very little is known about the way these platforms are utilized, and if their features and design foster strong user engagement. This paper explores the utilization and the interaction features of hundreds of thousands of users on 7 Cups of Tea, a leading online platform offering online emotional support. It dissects the level of activity of hundreds of thousands of users, the patterns by which they engage in conversation with each other, and uses machine learning methods to find factors promoting engagement. The study may be the first to measure activities and interactions in a large-scale online social system that fosters peer-to-peer emotional support

    A fine grained heuristic to capture web navigation patterns

    Get PDF
    In previous work we have proposed a statistical model to capture the user behaviour when browsing the web. The user navigation information obtained from web logs is modelled as a hypertext probabilistic grammar (HPG) which is within the class of regular probabilistic grammars. The set of highest probability strings generated by the grammar corresponds to the user preferred navigation trails. We have previously conducted experiments with a Breadth-First Search algorithm (BFS) to perform the exhaustive computation of all the strings with probability above a specified cut-point, which we call the rules. Although the algorithm’s running time varies linearly with the number of grammar states, it has the drawbacks of returning a large number of rules when the cut-point is small and a small set of very short rules when the cut-point is high. In this work, we present a new heuristic that implements an iterative deepening search wherein the set of rules is incrementally augmented by first exploring trails with high probability. A stopping parameter is provided which measures the distance between the current rule-set and its corresponding maximal set obtained by the BFS algorithm. When the stopping parameter takes the value zero the heuristic corresponds to the BFS algorithm and as the parameter takes values closer to one the number of rules obtained decreases accordingly. Experiments were conducted with both real and synthetic data and the results show that for a given cut-point the number of rules induced increases smoothly with the decrease of the stopping criterion. Therefore, by setting the value of the stopping criterion the analyst can determine the number and quality of rules to be induced; the quality of a rule is measured by both its length and probability
    • 

    corecore