37 research outputs found
Mining Missing Hyperlinks from Human Navigation Traces: A Case Study of Wikipedia
Hyperlinks are an essential feature of the World Wide Web. They are
especially important for online encyclopedias such as Wikipedia: an article can
often only be understood in the context of related articles, and hyperlinks
make it easy to explore this context. But important links are often missing,
and several methods have been proposed to alleviate this problem by learning a
linking model based on the structure of the existing links. Here we propose a
novel approach to identifying missing links in Wikipedia. We build on the fact
that the ultimate purpose of Wikipedia links is to aid navigation. Rather than
merely suggesting new links that are in tune with the structure of existing
links, our method finds missing links that would immediately enhance
Wikipedia's navigability. We leverage data sets of navigation paths collected
through a Wikipedia-based human-computation game in which users must find a
short path from a start to a target article by only clicking links encountered
along the way. We harness human navigational traces to identify a set of
candidates for missing links and then rank these candidates. Experiments show
that our procedure identifies missing links of high quality
Effective and Efficient Similarity Index for Link Prediction of Complex Networks
Predictions of missing links of incomplete networks like protein-protein
interaction networks or very likely but not yet existent links in evolutionary
networks like friendship networks in web society can be considered as a
guideline for further experiments or valuable information for web users. In
this paper, we introduce a local path index to estimate the likelihood of the
existence of a link between two nodes. We propose a network model with
controllable density and noise strength in generating links, as well as collect
data of six real networks. Extensive numerical simulations on both modeled
networks and real networks demonstrated the high effectiveness and efficiency
of the local path index compared with two well-known and widely used indices,
the common neighbors and the Katz index. Indeed, the local path index provides
competitively accurate predictions as the Katz index while requires much less
CPU time and memory space, which is therefore a strong candidate for potential
practical applications in data mining of huge-size networks.Comment: 8 pages, 5 figures, 3 table
Empirical analysis of web-based user-object bipartite networks
Understanding the structure and evolution of web-based user-object networks
is a significant task since they play a crucial role in e-commerce nowadays.
This Letter reports the empirical analysis on two large-scale web sites,
audioscrobbler.com and del.icio.us, where users are connected with music groups
and bookmarks, respectively. The degree distributions and degree-degree
correlations for both users and objects are reported. We propose a new index,
named collaborative clustering coefficient, to quantify the clustering behavior
based on the collaborative selection. Accordingly, the clustering properties
and clustering-degree correlations are investigated. We report some novel
phenomena well characterizing the selection mechanism of web users and outline
the relevance of these phenomena to the information recommendation problem.Comment: 6 pages, 7 figures and 1 tabl
Predicting Missing Links via Local Information
Missing link prediction of networks is of both theoretical interest and
practical significance in modern science. In this paper, we empirically
investigate a simple framework of link prediction on the basis of node
similarity. We compare nine well-known local similarity measures on six real
networks. The results indicate that the simplest measure, namely common
neighbors, has the best overall performance, and the Adamic-Adar index performs
the second best. A new similarity measure, motivated by the resource allocation
process taking place on networks, is proposed and shown to have higher
prediction accuracy than common neighbors. It is found that many links are
assigned same scores if only the information of the nearest neighbors is used.
We therefore design another new measure exploited information of the next
nearest neighbors, which can remarkably enhance the prediction accuracy.Comment: For International Workshop: "The Physics Approach To Risk:
Agent-Based Models and Networks", http://intern.sg.ethz.ch/cost-p10
Combining content and sentiment analysis on lyrics for a lightweight emotion-aware Chinese song recommendation system
Traditional music recommendation systems (RS) rely on collaborative filtering technique (CF) to recommend songs or artists in which recommendations are made based on the neighboring analysis of items/users. It is computationally efficient and performs well when the data is ideally full, when there are limited user inputs or few user/item inputs, it immediately lost its competitive advantage. Additionally, traditional RS techniques including content-based one heavily rely on explicit user feedback (e.g. user rating) to generate recommendations. In music/song recommendation, however, implicit feedbacks such as play frequency, play list prevail. Making recommendation on such implicit feedbacks requires efficient and accurate latent factor learning techniques to construct user or item feature space, which is inherently computationally costly. This paper presents a new and lightweight classification model for Chinese song RS based on computational analysis of the lingual part of song lyrics. Through extracting and combining the term frequency and inverse document frequency (tf∗idf) from song lyrics, we construct a composite emotion point matrix for each song which can then be used to further classify songs based on its inherent emotion and make recommendation accordingly