29,407 research outputs found
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
Improving Reachability and Navigability in Recommender Systems
In this paper, we investigate recommender systems from a network perspective
and investigate recommendation networks, where nodes are items (e.g., movies)
and edges are constructed from top-N recommendations (e.g., related movies). In
particular, we focus on evaluating the reachability and navigability of
recommendation networks and investigate the following questions: (i) How well
do recommendation networks support navigation and exploratory search? (ii) What
is the influence of parameters, in particular different recommendation
algorithms and the number of recommendations shown, on reachability and
navigability? and (iii) How can reachability and navigability be improved in
these networks? We tackle these questions by first evaluating the reachability
of recommendation networks by investigating their structural properties.
Second, we evaluate navigability by simulating three different models of
information seeking scenarios. We find that with standard algorithms,
recommender systems are not well suited to navigation and exploration and
propose methods to modify recommendations to improve this. Our work extends
from one-click-based evaluations of recommender systems towards multi-click
analysis (i.e., sequences of dependent clicks) and presents a general,
comprehensive approach to evaluating navigability of arbitrary recommendation
networks
Stay Awhile and Listen: User Interactions in a Crowdsourced Platform Offering Emotional Support
Internet and online-based social systems are rising as the dominant mode of
communication in society. However, the public or semi-private environment under
which most online communications operate under do not make them suitable
channels for speaking with others about personal or emotional problems. This
has led to the emergence of online platforms for emotional support offering
free, anonymous, and confidential conversations with live listeners. Yet very
little is known about the way these platforms are utilized, and if their
features and design foster strong user engagement. This paper explores the
utilization and the interaction features of hundreds of thousands of users on 7
Cups of Tea, a leading online platform offering online emotional support. It
dissects the level of activity of hundreds of thousands of users, the patterns
by which they engage in conversation with each other, and uses machine learning
methods to find factors promoting engagement. The study may be the first to
measure activities and interactions in a large-scale online social system that
fosters peer-to-peer emotional support
A fine grained heuristic to capture web navigation patterns
In previous work we have proposed a statistical model to capture the user behaviour when browsing the web. The user navigation information obtained from web logs is modelled as a hypertext probabilistic grammar (HPG) which
is within the class of regular probabilistic grammars. The set of highest probability strings generated by the grammar corresponds to the user preferred navigation trails. We have previously conducted experiments with a Breadth-First Search algorithm (BFS) to perform the exhaustive computation of all the strings with probability above a specified cut-point, which we call the rules. Although the algorithmâs running time varies linearly with the number of grammar states, it has the drawbacks of returning a large number of rules when the cut-point is small and a small set of very short rules when the cut-point is high.
In this work, we present a new heuristic that implements an iterative deepening search wherein the set of rules is incrementally augmented by first exploring trails with high probability. A stopping parameter is provided which measures the distance between the current rule-set and its corresponding maximal set obtained by the BFS algorithm. When the stopping parameter takes the value zero the heuristic corresponds to the BFS algorithm and as the parameter takes
values closer to one the number of rules obtained decreases accordingly.
Experiments were conducted with both real and synthetic data and the results show that for a given cut-point the number of rules induced increases smoothly with the decrease of the stopping criterion. Therefore, by setting the value of the stopping criterion the analyst can determine the number and quality of rules to be induced; the quality of a rule is measured by both its length and probability
- âŠ